首页 文章

如何在Estimator之外使用tensorflow.feature_column进行预测?

提问于
浏览
2

我想使用tensorflow feature_column并直接使用会话功能,绕过Estimator框架 . 我读了tensorflow's low level introduction on feature column . 问题是 tf.feature_column.input_layer 在构造时需要 features ,但是训练和预测时间之间的特征提要是不同的 . 查看 tf.Estimator 代码,似乎再次调用相同的构造回调函数来获取图形 . 我想出了下面的例子,但是如果我在第二次构建后跳过表init,它就会在未初始化的表上失败;或者如果我运行表init,它会抱怨表已经初始化 . 根据their research paper,这是设计的,因为他们总是希望从保存点重新加载新模型 . 但是对于像强化学习这样的情况来说,这将是非常低效的,我们希望在训练循环中同时进行更新和推理 . 目前还不清楚他们如何进行开发验证 .

为预测构建图形和馈送功能的正确方法是什么?

training_features = {
    'sales' : [[5], [10], [8], [9]],
    'department': ['sports', 'sports', 'gardening', 'gardening']}

test_features = {
    'sales' : [[10], [20], [16], [18]],
    'department': ['sports', 'sports', 'gardening', 'gardening']}

department_column = tf.feature_column.categorical_column_with_vocabulary_list(
        'department', ['sports', 'gardening'])
department_column = tf.feature_column.indicator_column(department_column)

columns = [
    tf.feature_column.numeric_column('sales'),
    department_column
]

# similar to a tf.Estimator's model_fn callback
def mkgraph(features):
    with tf.variable_scope('feature_test', reuse=tf.AUTO_REUSE):
        inputs = tf.feature_column.input_layer(features, columns)
        alpha = tf.placeholder(tf.float32, name='alpha')
        output = inputs * alpha
        return output, alpha

with tf.Graph().as_default() as g:
    output, alpha = mkgraph(training_features)
    print('output', output)
    print('alpha', alpha)
    var_init = tf.global_variables_initializer()
    table_init = tf.tables_initializer()
    with tf.Session(graph=g) as sess:
        sess.run([var_init, table_init])
        print(sess.run(output, feed_dict={alpha: 100.0})) # works here

        print('testing')
        output, alpha = mkgraph(test_features)
        print('output', output)
        print('alpha', alpha)
        table_init = tf.tables_initializer()
        # sess.run([table_init]) # with this, it fails on 'table already initialized'
        # without table_init run, it fails on 'table not initialized'
        print(sess.run(output, feed_dict={alpha: 200.0}))

1 回答

  • 0

    如果您有一个训练数据集和一个测试数据集,并且需要来回切换它们,您可以尝试使用 is_training 开关 . 对于您在问题中的具体示例:

    import tensorflow as tf
    
    training_features = {
        'sales' : [[5], [10], [8], [9]],
        'department': ['sports', 'sports', 'gardening', 'gardening']}
    test_features = {
        'sales' : [[10], [20], [16], [18]],
        'department': ['sports', 'sports', 'gardening', 'gardening']}
    
    department_column = tf.feature_column.categorical_column_with_vocabulary_list(
            'department', ['sports', 'gardening'])
    department_column = tf.feature_column.indicator_column(department_column)
    columns = [
        tf.feature_column.numeric_column('sales'),
        department_column
    ]
    
    with tf.variable_scope('feature_test', reuse=tf.AUTO_REUSE):
        alpha = tf.placeholder(tf.float32, name='alpha')
        is_training = tf.placeholder(tf.bool, name='is_training')
        training_inputs = tf.feature_column.input_layer(training_features, columns)
        test_inputs = tf.feature_column.input_layer(test_features, columns)
        output = tf.cond(is_training,
                         lambda: training_inputs * alpha,
                         lambda: test_inputs * alpha)
    
    var_init = tf.global_variables_initializer()
    table_init = tf.tables_initializer()
    
    with tf.Session() as sess:
        sess.run([var_init, table_init])
    
        print('training')
        print(sess.run(output, feed_dict={alpha: 100.0, is_training: True}))
    
        print('testing')
        print(sess.run(output, feed_dict={alpha: 200.0, is_training: False}))
    

    一个潜在的问题是两个_782865都被启动了 . 我认为他们不会只是加载所有东西并且会占用内存 . 但它们可能会花费超过必要的内存,并可能给您带来一些麻烦 .

相关问题