首页 文章

如何正确恢复OOP张量流模型?

提问于
浏览
0

为了当前项目,我决定在类实例中定义张量流模型 . 这一切都很顺利,直到我想恢复它以继续从最新的检查点进行训练 . 它是一个简单的线性回归模型,它 Build 在实例的初始化之上 . 它试图近似函数 f(x) = 3x + 1 .

逻辑是:如果还没有检查点,创建一个新模型,训练它20个时期,保存它 . 如果已经有一个检查点,请加载它,并继续训练它20个时期 .

现在,最初训练网络工作 . 但是在加载后尝试训练它时,会抛出以下错误:

文件“”,第1行,在runfile中('/ home / abc / tf_tests / restore_test / restoretest.py',wdir ='/ home / sku / tf_tests / restore_test')文件“/ home / abc / anaconda3 / envs / tensorflow / lib / python3.5 / site-packages / spyder / utils / site / sitecustomize.py“,第710行,在runfile execfile(文件名,命名空间)文件”/ home / abc / anaconda3 / envs / tensorflow / lib / python3 .5 / site-packages / spyder / utils / site / sitecustomize.py“,第101行,在execfile exec中(compile(f.read(),filename,'exec'),命名空间)文件”/ home / sku / tf_tests /restore_test/restoretest.py“,第71行,在model.run_training_step(sess,x,y)中NameError:名称'model'未定义

问题是:如何恢复并正确进行培训?我找到了一篇关于OOP here的有趣文章,但它没有涉及保存和恢复模型 .

我的代码如下 . 谢谢你帮助我!

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

class LinearModel(object):

    def __init__(self):
        self.build_model()

    def build_model(self):
        # x is input, y is output
        self.x = tf.placeholder(dtype=tf.float32, name='x')
        self.y = tf.placeholder(dtype=tf.float32, name='y')

        self.w = tf.Variable(0.0, name='w')
        self.b = tf.Variable(0.0, name='b')

        self.global_step = tf.Variable(0, trainable=False, name='global_step', dtype=tf.int32)

        self.y_pred = self.w * self.x + self.b

        # quadratic error as loss
        self.loss = tf.square(self.y - self.y_pred)

        self.train_op = tf.train.AdamOptimizer(0.001).minimize(self.loss)
        self.increment_global_step_op = tf.assign(self.global_step, self.global_step+1)

        return 

    # run a single (x, y) pair through the graph
    def run_training_step(self, sess, x, y):
        _, loss = sess.run([self.train_op, self.loss], feed_dict={self.x:x, self.y:y})
        return loss

    # convenience function for checking the values
    def get_vars(self, sess):
        return sess.run([self.w, self.b])


tf.reset_default_graph()

# training data generation, is a linear function of 3x+1 + noise
tr_input = np.linspace(-5.0, 5.0)
tr_output = 3*tr_input+1+np.random.randn(tr_input.shape[0])


with tf.Session() as sess:

    # check if there are checkpoints
    latest_checkpoint = tf.train.latest_checkpoint('./model_saves')

    # ADDED BY EDIT1
    model = LinearModel()

    # if there are, load them
    if latest_checkpoint:

        saver = tf.train.import_meta_graph('./model_saves/lin_model-20.meta')
        saver.restore(sess, latest_checkpoint)  

    # if not, create a new model
    else:

        ### REMOVED BY EDIT1
        ### model = LinearModel()
        sess.run(tf.global_variables_initializer())

        saver = tf.train.Saver()

    # show vars before doing the training
    w, b = model.get_vars(sess)       
    print("final weight: {}".format(w))
    print("final bias: {}".format(b))

    # train for 20 epochs and save it
    for epoch in range(20):
        for x, y in zip(tr_input, tr_output):
            model.run_training_step(sess, x, y)
        sess.run(model.increment_global_step_op)

    saver.save(sess, './model_saves/lin_model', global_step=model.global_step)       

    # show vars after doing the training
    w_opt, b_opt = model.get_vars(sess)       
    print("final weight: {}".format(w_opt))
    print("final bias: {}".format(b_opt))

EDIT1:

在检查是否存在检查点之前实例化模型时,会导致优化器变量的前提条件错误:

FailedPreconditionError:尝试使用未初始化的值beta1_power [[Node:beta1_power / read = IdentityT = DT_FLOAT,class = [“loc:@Adam / Assign”], device =“/ job:localhost / replica:0 / task:0 /设备:GPU:0“]] [[节点:Square / _25 = _Recvclient_terminated = false,recv_device =”/ job:localhost / replica:0 / task:0 / device:CPU:0“,send_device =”/ job:localhost / replica:0 / task:0 / device:GPU:0“,send_device_incarnation = 1,tensor_name =”edge_103_Square“,tensor_type = DT_FLOAT,_device =”/ job:localhost / replica:0 / task:0 / device:CPU: 0“]] ......

1 回答

  • 1

    当您尝试从检查点还原时,未实例化LinearModel类 . 这应该工作:

    ...
    latest_checkpoint = tf.train.latest_checkpoint('/home/sku/tf_tests/restore_test/model_saves')
    
    model = LinearModel()
    saver = tf.train.Saver()
    
    if latest_checkpoint:
        saver.restore(sess, latest_checkpoint)
    else:
        sess.run(tf.global_variables_initializer())
    ...
    

相关问题