了解Tensorflow LSTM输入形状-Java 学习之路

我有一个数据集X，其中包含 N = 4000 samples ，每个样本由 d = 2 features （连续值）组成，跨越 t = 10 time steps . 在时间步骤11，我还具有每个样本的相应的'labels'，它们也是连续值 .

目前，我的数据集的形状为X：[4000,20]，Y：[4000] .

考虑到d特征的10个先前输入，我想使用TensorFlow训练LSTM来预测Y（回归）的值，但是我很难在TensorFlow中实现它 .

我目前面临的主要问题是了解TensorFlow如何期望输入格式化 . 我见过各种例子，如this，但这些例子涉及一大串连续时间序列数据 . 我的数据是不同的样本，每个都是独立的时间序列 .

2 回答

9
documentation of tf.nn.dynamic_rnn说：

输入：RNN输入 . 如果time_major == False（默认值），则必须是形状张量：[batch_size，max_time，...]或此类元素的嵌套元组 .

在您的情况下，这意味着输入的形状应为 [batch_size, 10, 2] . 不是一次训练所有4000个序列，而是在每次训练迭代中只使用其中许多序列 . 类似下面的东西应该工作（为了清晰起见，添加了重塑）：
```
batch_size = 32
# batch_size sequences of length 10 with 2 values for each timestep
input = get_batch(X, batch_size).reshape([batch_size, 10, 2])
# Create LSTM cell with state size 256. Could also use GRUCell, ...
# Note: state_is_tuple=False is deprecated;
# the option might be completely removed in the future
cell = tf.nn.rnn_cell.LSTMCell(256, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell,
                                   input,
                                   sequence_length=[10]*batch_size,
                                   dtype=tf.float32)
```
从documentation开始， outputs 的形状为 [batch_size, 10, 256] ，即每个时间步长为256输出 . state 将是tuple形状 [batch_size, 256] . 您可以预测您的最终值，每个序列一个，来自：
```
predictions = tf.contrib.layers.fully_connected(state.h,
                                                num_outputs=1,
                                                activation_fn=None)
loss = get_loss(get_batch(Y).reshape([batch_size, 1]), predictions)
```
outputs 和 state 形状中的数字256由 cell.output_size resp确定 . cell.state_size . 在创建 LSTMCell 时，如上所述，这些是相同的 . 另见LSTMCell documentation .
回复于 2024-05-03T01:54:13+08:00
0
（当直接np.reshape（）没有按照我们的意愿组织最终数组时，这个问题回答了问题 . 如果我们想直接重塑为3D，那么np.reshape会做到这一点，但要注意输入的最终组织） .

在我个人的尝试中最终 resolve this problem of feeding input shape for RNN 并且不再混淆，我将为此给出我的"personal"解释 .

在我的情况下（我认为许多其他人可能在他们的功能矩阵中有这种组织方案），大多数外面的博客“没有帮助” . 让我们试试如何将2D特征矩阵转换为RNN的3D形状矩阵 .

假设我们有 organization type in our feature matrix ：我们有 5 observations （即行 - 对于惯例我认为它是最合乎逻辑的术语）并且在每一行中，我们都有 2 features for EACH timestep (and we have 2 timesteps) ，如下所示：

（ df 是为了更好地理解我的话语）
```
In [1]: import numpy as np                                                           

In [2]: arr = np.random.randint(0,10,20).reshape((5,4))                              

In [3]: arr                                                                          
Out[3]: 
array([[3, 7, 4, 4],
       [7, 0, 6, 0],
       [2, 0, 2, 4],
       [3, 9, 3, 4],
       [1, 2, 3, 0]])

In [4]: import pandas as pd                                                          

In [5]: df = pd.DataFrame(arr, columns=['f1_t1', 'f2_t1', 'f1_t2', 'f2_t2'])         

In [6]: df                                                                           
Out[6]: 
   f1_t1  f2_t1  f1_t2  f2_t2
0      3      7      4      4
1      7      0      6      0
2      2      0      2      4
3      3      9      3      4
4      1      2      3      0
```
我们现在将这些值与它们一起使用 . 这里的事情是 RNNs incorporate the "timestep" dimension to their input ，因为它们具有建筑性质 . 我们可以想象维度为 stacking 2D arrays one behind the other for the number of timesteps we have. 在这种情况下，我们有两个时间步长;所以我们将堆叠两个2D阵列：一个用于timestep1，后面用于timetep2 .

实际上，在我们需要制作的3D输入中，我们仍然有5个观察结果 . 问题是我们需要以不同的方式排列它们：RNN将采用第一个数组（即timetep1）的第一行（或指定批次 - 但我们将在此处保持简单）和第二个堆叠数组的第一行（即timestep2） . 然后是第二行......直到最后一行（在我们的例子中为第五行） . 因此，在每个时间步的每一行中，我们需要具有两个特征，当然，它们在不同的阵列中分开，每个阵列对应于它的时间步长 . 让我们看看数字 .

我将制作两个数组以便于理解 . 请记住，由于我们在df中的组织方案，你可能已经注意到 we need to take the first two columns (i.e. features 1 and 2 for the timestep1) as our FIRST ARRAY OF THE STACK and the last two columns, that is, the 3rd and the 4th, as our SECOND ARRAY OF THE STACK ，所以一切都有意义 .
```
In [7]: arrStack1 = arr[:,0:2]                                                       

In [8]: arrStack1                                                                    
Out[8]: 
array([[3, 7],
       [7, 0],
       [2, 0],
       [3, 9],
       [1, 2]])

In [9]: arrStack2 = arr[:,2:4]                                                       

In [10]: arrStack2                                                                   
Out[10]: 
array([[4, 4],
       [6, 0],
       [2, 4],
       [3, 4],
       [3, 0]])
```
最后，我们唯一需要做的就是堆叠两个数组（“一个接一个”），好像它们是同一个最终结构的一部分：
```
In [11]: arrfinal3D = np.stack([arrStack1, arrStack2])                               

In [12]: arrfinal3D                                                                  
Out[12]: 
array([[[3, 7],
        [7, 0],
        [2, 0],
        [3, 9],
        [1, 2]],

       [[4, 4],
        [6, 0],
        [2, 4],
        [3, 4],
        [3, 0]]])

In [13]: arrfinal3D.shape                                                            
Out[13]: (2, 5, 2)
```
就是这样：考虑到我们对2D特征矩阵的组织，我们已准备好将特征矩阵输入RNN单元 .

（对于所有这些你可以使用的一个班轮：
```
In [14]: arrfinal3D_1 = np.stack([arr[:,0:2], arr[:,2:4]])                           

In [15]: arrfinal3D_1                                                                
Out[15]: 
array([[[3, 7],
        [7, 0],
        [2, 0],
        [3, 9],
        [1, 2]],

       [[4, 4],
        [6, 0],
        [2, 4],
        [3, 4],
        [3, 0]]])
```
希望这可以帮助！
回复于 2024-05-03T01:54:13+08:00

了解Tensorflow LSTM输入形状

2 回答

相关问题