计算Tensorflow中的权重更新比率-Java 学习之路

我正在寻找一种方法来计算Tensorflow中优化器步骤的weight-update-ratio . 权重更新比率定义为每个步骤中的更新比例除以变量比例，并可用于检查网络训练 .

理想情况下，我想要一种非侵入性的方式来在单个会话运行中计算它，但无法完成我想要的 . 由于更新比例和参数比例与列车步骤无关，因此需要向图表添加显式依赖关系，以便在更新步骤之前和之后绘制变量比例图 . 不幸的是，似乎在TF依赖关系中只能为new nodes定义，这进一步使问题复杂化 .

到目前为止，我提出的最好的是一个上下文管理器，用于定义必要的操作 . 其用途如下

opt = tf.train.AdamOptimizer(1e0)
grads = tf.gradients(loss, tf.trainable_variables())
grads = list(zip(grads, tf.trainable_variables()))

with compute_weight_update_ratio('wur') as wur:
    train = opt.apply_gradients(grads_and_vars=grads)

# ...
with tf.Session() as sess:
    sess.run(wur.ratio)

compute_weight_update_ratio 的完整代码可以在下面找到 . 让我感到困惑的是，在当前状态下，重量更新率（至少 norm_before ）是在每个训练步骤中计算的，但出于性能原因，我宁愿选择性地进行（例如，仅在计算摘要时） .

关于如何改进的任何想法？

@contextlib.contextmanager
def compute_weight_update_ratio(name, var_scope=None):
    '''Injects training to compute weight-update-ratio.

    The weight-update-ratio is computed as the update scale divided
    by the variable scale before the update and should be somewhere in the 
    range 1e-2 or 1e-3.

    Params
    ------
    name : str
        Operation name

    Kwargs
    ------
    var_scope : str, optional
        Name selection of variables to compute weight-update-ration for. Defaults to all. Regex supported.
    '''

    class WeightUpdateRatio:
        def __init__(self):
            self.num_train = len(tf.get_collection(tf.GraphKeys.TRAIN_OP))
            self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=var_scope)
            self.norm_before = tf.norm(self.variables, name='norm_before')

        def compute_ratio(self,):
            train_ops = tf.get_collection(tf.GraphKeys.TRAIN_OP)
            assert len(train_ops) > self.num_train, 'Missing training op'

            with tf.control_dependencies(train_ops[self.num_train:]):
                self.norm_after = tf.norm(self.variables, name='norm_after')

            absdiff = tf.abs(tf.subtract(self.norm_after, self.norm_before), name='absdiff')
            self.ratio = tf.divide(absdiff, self.norm_before, name=name)

    with tf.name_scope(name) as scope:
        try:
            wur = WeightUpdateRatio()

            with tf.control_dependencies([wur.norm_before]):
                yield wur
        finally:
            wur.compute_ratio()

1 回答

0

你不需要太担心性能 . Tensorflow仅执行生成输出所需的子图 .

因此，在训练循环中，如果在迭代期间未调用 wur.ratio ，则不会执行为计算它而创建的额外节点 .

回复于 2024-05-02T04:31:41+08:00

计算Tensorflow中的权重更新比率

1 回答

相关问题