首页 文章

如何训练带有LSTM细胞的RNN进行时间序列预测

提问于
浏览
19

我目前正在尝试构建一个用于预测时间序列的简单模型 . 目标是使用序列训练模型,以便模型能够预测未来值 .

我正在使用tensorflow和lstm单元格来执行此操作 . 该模型通过时间截断反向传播进行训练 . 我的问题是如何构建培训数据 .

例如,假设我们想要学习给定的序列:

[1,2,3,4,5,6,7,8,9,10,11,...]

我们将网络展开 num_steps=4 .

Option 1

input data               label     
1,2,3,4                  2,3,4,5
5,6,7,8                  6,7,8,9
9,10,11,12               10,11,12,13
...

Option 2

input data               label     
1,2,3,4                  2,3,4,5
2,3,4,5                  3,4,5,6
3,4,5,6                  4,5,6,7
...

Option 3

input data               label     
1,2,3,4                  5
2,3,4,5                  6
3,4,5,6                  7
...

Option 4

input data               label     
1,2,3,4                  5
5,6,7,8                  9
9,10,11,12               13
...

任何帮助,将不胜感激 .

3 回答

  • 3

    我正准备在TensorFlow中学习LSTM并尝试实现一个例子(幸运的是)试图预测一些由简单数学函数处理的时间序列/数字系列 .

    但是我正在使用一种不同的方式来构建训练数据,这是由Unsupervised Learning of Video Representations using LSTMs推动的:

    LSTM Future Predictor Model

    Option 5:

    input data               label     
    1,2,3,4                  5,6,7,8
    2,3,4,5                  6,7,8,9
    3,4,5,6                  7,8,9,10
    ...
    

    除了本文之外,我(尝试)通过给定的TensorFlow RNN示例获取灵感 . 我目前的完整解决方案如下所示:

    import math
    import random
    import numpy as np
    import tensorflow as tf
    
    LSTM_SIZE = 64
    LSTM_LAYERS = 2
    BATCH_SIZE = 16
    NUM_T_STEPS = 4
    MAX_STEPS = 1000
    LAMBDA_REG = 5e-4
    
    
    def ground_truth_func(i, j, t):
        return i * math.pow(t, 2) + j
    
    
    def get_batch(batch_size):
        seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
        tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)
    
        for b in xrange(batch_size):
            i = float(random.randint(-25, 25))
            j = float(random.randint(-100, 100))
            for t in xrange(NUM_T_STEPS):
                value = ground_truth_func(i, j, t)
                seq[b, t, 0] = value
    
            for t in xrange(NUM_T_STEPS):
                tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
        return seq, tgt
    
    
    # Placeholder for the inputs in a given iteration
    sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
    target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])
    
    fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
    fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))
    
    # ENCODER
    with tf.variable_scope('ENC_LSTM'):
        lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
        multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
        initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
        state = initial_state
        for t_step in xrange(NUM_T_STEPS):
            if t_step > 0:
                tf.get_variable_scope().reuse_variables()
    
            # state value is updated after processing each batch of sequences
            output, state = multi_lstm(sequence[:, t_step, :], state)
    
    learned_representation = state
    
    # DECODER
    with tf.variable_scope('DEC_LSTM'):
        lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
        multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
        state = learned_representation
        logits_stacked = None
        loss = 0.0
        for t_step in xrange(NUM_T_STEPS):
            if t_step > 0:
                tf.get_variable_scope().reuse_variables()
    
            # state value is updated after processing each batch of sequences
            output, state = multi_lstm(sequence[:, t_step, :], state)
            # output can be used to make next number prediction
            logits = tf.matmul(output, fc1_weight) + fc1_bias
    
            if logits_stacked is None:
                logits_stacked = logits
            else:
                logits_stacked = tf.concat(1, [logits_stacked, logits])
    
            loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE
    
    reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))
    
    train = tf.train.AdamOptimizer().minimize(reg_loss)
    
    with tf.Session() as sess:
        sess.run(tf.initialize_all_variables())
    
        total_loss = 0.0
        for step in xrange(MAX_STEPS):
            seq_batch, target_batch = get_batch(BATCH_SIZE)
    
            feed = {sequence: seq_batch, target: target_batch}
            _, current_loss = sess.run([train, reg_loss], feed)
            if step % 10 == 0:
                print("@{}: {}".format(step, current_loss))
            total_loss += current_loss
    
        print('Total loss:', total_loss)
    
        print('### SIMPLE EVAL: ###')
        seq_batch, target_batch = get_batch(BATCH_SIZE)
        feed = {sequence: seq_batch, target: target_batch}
        prediction = sess.run([logits_stacked], feed)
        for b in xrange(BATCH_SIZE):
            print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
            print(" `-> Prediction: {}".format(prediction[0][b]))
    

    此示例输出如下所示:

    ### SIMPLE EVAL: ###
    # [input seq] -> [target prediction]
    #  `-> Prediction: [model prediction]  
    [  33.   53.  113.  213.] -> [  353.   533.   753.  1013.])
     `-> Prediction: [ 19.74548721  28.3149128   33.11489105  35.06603241]
    [ -17.  -32.  -77. -152.] -> [-257. -392. -557. -752.])
     `-> Prediction: [-16.38951683 -24.3657589  -29.49801064 -31.58583832]
    [ -7.  -4.   5.  20.] -> [  41.   68.  101.  140.])
     `-> Prediction: [ 14.14126873  22.74848557  31.29668617  36.73633194]
    ...
    

    该模型是LSTM自动编码器,每个都有2层 .

    不幸的是,正如您在结果中看到的那样,此模型无法正确学习序列 . 我可能就是这样,我只是在某个地方犯了一个错误的错误,或者1000-10000的训练步骤对于LSTM来说只是少数几个 . 正如我所说,我也刚刚开始理解/正确使用LSTM . 但希望这可以为您提供有关实施的一些启发 .

  • 5

    阅读了几篇LSTM介绍博客,例如Jakob Aungiers',选项3似乎是无状态LSTM的正确选项 .

    如果您的LSTM需要比 num_steps 更久以前记住数据,那么您可以以有状态的方式训练 - 对于Keras示例,请参阅Philippe Remy's blog post "Stateful LSTM in Keras" . 但是,Philippe没有显示批量大于1的示例 . 我想在您的情况下,具有状态LSTM的批量大小为4可以与以下数据一起使用(写为 input -> label ):

    batch #0:
    1,2,3,4 -> 5
    2,3,4,5 -> 6
    3,4,5,6 -> 7
    4,5,6,7 -> 8
    
    batch #1:
    5,6,7,8 -> 9
    6,7,8,9 -> 10
    7,8,9,10 -> 11
    8,9,10,11 -> 12
    
    batch #2:
    9,10,11,12 -> 13
    ...
    

    由此,例如,批次#0中的第二个样本被正确地重复使用以继续使用批次#1的第二个样本进行训练 .

    这与您的选项4类似,但是您没有使用所有可用的标签 .

    Update:

    在我的建议的延伸 batch_size 等于 num_steps ,Alexis Huet gives an answerbatch_size 的情况是 num_steps 的除数,可以用于更大的 num_steps . 他describes it nicely在他的博客上 .

  • 1

    我认为选项1最接近/tensorflow/models/rnn/ptb/reader.py中的参考实现

    def ptb_iterator(raw_data, batch_size, num_steps):
      """Iterate on the raw PTB data.
    
      This generates batch_size pointers into the raw PTB data, and allows
      minibatch iteration along these pointers.
    
      Args:
        raw_data: one of the raw data outputs from ptb_raw_data.
        batch_size: int, the batch size.
        num_steps: int, the number of unrolls.
    
      Yields:
        Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
        The second element of the tuple is the same data time-shifted to the
        right by one.
    
      Raises:
        ValueError: if batch_size or num_steps are too high.
      """
      raw_data = np.array(raw_data, dtype=np.int32)
    
      data_len = len(raw_data)
      batch_len = data_len // batch_size
      data = np.zeros([batch_size, batch_len], dtype=np.int32)
      for i in range(batch_size):
        data[i] = raw_data[batch_len * i:batch_len * (i + 1)]
    
      epoch_size = (batch_len - 1) // num_steps
    
      if epoch_size == 0:
        raise ValueError("epoch_size == 0, decrease batch_size or num_steps")
    
      for i in range(epoch_size):
        x = data[:, i*num_steps:(i+1)*num_steps]
        y = data[:, i*num_steps+1:(i+1)*num_steps+1]
        yield (x, y)
    

    但是,另一个选项是为每个训练序列随机选择指向数据数组的指针 .

相关问题