Problem

  • 训练自定义张量流1.11 tf.estimator.Estimatortf.data.Dataset 的运行速度比使用相同模型架构的 tf.keras 并直接输入数据要慢得多

  • 但是,它有时会快速运行(以global_step / sec表示),但在纪元开始和结束时速度很慢 .
    在"fast"批次期间

  • ,GPU util约为30% . 在慢速期间,约1%

Probable causes

  • 我的输入管道是't well-optimized enough so that GPU is idle, waiting on the CPU data processing. But I don' t了解为什么那只是在剧集结束前的最后几个小批量

What have I tried

按照Input Pipeline Performance Guide中的步骤操作 . 这加速了不接近时代边界的批次 . 我不确定如何进一步改进 .

Minimal working example

import numpy as np
import pandas as pd
import tensorflow as tf

train_data = pd.DataFrame(np.random.randn(1030255, 1021)).rename(columns={c:str(c) for c in range(1021)})
train_target = pd.DataFrame(np.round(np.random.rand(1030255, 15))).rename(columns={c:str(c) for c in range(15)})
val_data = pd.DataFrame(np.random.randn(491077, 1021)).rename(columns={c:str(c) for c in range(1021)})
val_target = pd.DataFrame(np.round(np.random.rand(491077, 15))).rename(columns={c:str(c) for c in range(15)})

def model_fn(features, labels, mode,params): 
    net = tf.feature_column.input_layer(features, params['feature_columns'])
    net = tf.layers.dense(net, units=params['hidden_units'], activation=tf.nn.tanh)
    logits = tf.layers.dense(net, params['outputs'], activation=None)
    loss = tf.losses.sigmoid_cross_entropy(labels, logits=logits)
    if mode == tf.estimator.ModeKeys.EVAL:
        return tf.estimator.EstimatorSpec(mode, loss=loss)
    elif mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.AdamOptimizer()
        train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

cols = [str(c) for c in np.sort(np.random.choice(1021, size=98, replace=False))]
target_cols = [str(c) for c in np.arange(11)]
VALIDATION_BATCH_SIZE = int(val_data.shape[0] / 4.0)
BATCH_SIZE = 2**10

def train_input_fn():
    features = {k: train_data[k].values for k in cols}
    dataset = tf.data.Dataset.from_tensor_slices((features, train_target[target_cols].values))
    dataset = dataset.repeat() \
        .shuffle(train_data.shape[0]) \
        .batch(BATCH_SIZE) \
        .prefetch(BATCH_SIZE)
    return dataset

def validation_input_fn():
    features = {k: val_data[k].values for k in cols}
    dataset = tf.data.Dataset.from_tensor_slices((features, val_target[target_cols].values))
    dataset = dataset.repeat() \
        .batch(VALIDATION_BATCH_SIZE) \
        .prefetch(VALIDATION_BATCH_SIZE)
    return dataset

feature_columns = [tf.feature_column.numeric_column(f) for f in cols]
run_cfg = tf.estimator.RunConfig(tf_random_seed=1, save_checkpoints_steps=1000, save_checkpoints_secs=None)
classifier = tf.estimator.Estimator(
        model_fn=model_fn,
        config=run_cfg,
        params={
            'feature_columns': feature_columns,
            'hidden_units': 128,
            'outputs': len(target_cols)
        })
tf.estimator.train_and_evaluate(
    classifier,
    train_spec=tf.estimator.TrainSpec(train_input_fn),
    eval_spec=tf.estimator.EvalSpec(validation_input_fn, steps=4, start_delay_secs=30, throttle_secs=30))

Sample output

(第一步所用的时间被误认为是巨大的,因为tensorflow与上一个训练步骤有所不同 . 但是,在开始eval之前的最后一个训练步骤不是这种情况 . )

INFO:tensorflow:Finished evaluation at 2018-10-31-08:17:35
INFO:tensorflow:Saving dict for global step 1000: global_step = 1000, loss = 0.694723
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 1000: /tmp/tmpvQq9DV/model.ckpt-1000
INFO:tensorflow:global_step/sec: 0.930553
INFO:tensorflow:loss = 0.69407356, step = 1000 (107.463 sec)
INFO:tensorflow:global_step/sec: 171.673
INFO:tensorflow:loss = 0.69456786, step = 1100 (0.583 sec)
INFO:tensorflow:global_step/sec: 166.964
INFO:tensorflow:loss = 0.69411445, step = 1200 (0.599 sec)
INFO:tensorflow:global_step/sec: 172.226
INFO:tensorflow:loss = 0.6940959, step = 1300 (0.579 sec)
INFO:tensorflow:global_step/sec: 170.882
INFO:tensorflow:loss = 0.69440323, step = 1400 (0.586 sec)
INFO:tensorflow:global_step/sec: 173.453
INFO:tensorflow:loss = 0.69332886, step = 1500 (0.577 sec)
INFO:tensorflow:global_step/sec: 167.078
INFO:tensorflow:loss = 0.6950055, step = 1600 (0.598 sec)
INFO:tensorflow:global_step/sec: 159.763
INFO:tensorflow:loss = 0.69460225, step = 1700 (0.626 sec)
INFO:tensorflow:global_step/sec: 161.674
INFO:tensorflow:loss = 0.6940766, step = 1800 (0.617 sec)
INFO:tensorflow:global_step/sec: 8.83793
INFO:tensorflow:loss = 0.6936994, step = 1900 (11.315 sec)
INFO:tensorflow:Saving checkpoints for 2000 into /tmp/tmpvQq9DV/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-10-31-08:18:11
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpvQq9DV/model.ckpt-2000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/4]
INFO:tensorflow:Evaluation [2/4]
INFO:tensorflow:Evaluation [3/4]
INFO:tensorflow:Evaluation [4/4]
INFO:tensorflow:Finished evaluation at 2018-10-31-08:19:36

Without eval step

我认为eval步骤(其本身非常缓慢)可能是问题所在 . 但是当我只训练时,它运行得更慢:

classifier.train(input_fn=train_input_fn, steps=5000)

[...]
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/tmpIJctNp/model.ckpt.
INFO:tensorflow:global_step/sec: 6.95414
INFO:tensorflow:loss = 0.6948092, step = 1000 (14.381 sec)
INFO:tensorflow:global_step/sec: 12.0491
INFO:tensorflow:loss = 0.6942879, step = 1100 (8.298 sec)
INFO:tensorflow:global_step/sec: 8.98388
INFO:tensorflow:loss = 0.6939402, step = 1200 (11.131 sec)
INFO:tensorflow:global_step/sec: 8.86219
INFO:tensorflow:loss = 0.6946343, step = 1300 (11.284 sec)
INFO:tensorflow:global_step/sec: 8.74248
INFO:tensorflow:loss = 0.694865, step = 1400 (11.439 sec)

在此先感谢您的帮助 .