首页 文章

如何使用Keras / Theano for Regression配置一个非常简单的LSTM

提问于
浏览
6

我正在努力为简单的回归任务配置Keras LSTM . 官方页面上有一些非常基本的解释:Keras RNN documentation

但要完全理解,带有示例数据的示例配置将非常有用 .

我几乎没有找到使用Keras-LSTM进行回归的示例 . 大多数示例都是关于分类(文本或图像) . 我研究了Keras发行版附带的LSTM示例以及我通过Google搜索找到的一个示例:http://danielhnyk.cz/它提供了一些见解,尽管作者承认这种方法的内存效率非常高,因为数据样本必须非常冗余地存储 .

虽然评论员(Taha)引入了一项改进,但数据存储仍然是多余的,我怀疑这是Keras开发人员的意图 .

我已经下载了一些简单的示例顺序数据,这些数据恰好是来自雅虎财经的股票数据 . 雅虎财经免费提供Data

Date,       Open,      High,      Low,       Close,     Volume,   Adj Close
2016-05-18, 94.160004, 95.209999, 93.889999, 94.559998, 41923100, 94.559998
2016-05-17, 94.550003, 94.699997, 93.010002, 93.489998, 46507400, 93.489998
2016-05-16, 92.389999, 94.389999, 91.650002, 93.879997, 61140600, 93.879997
2016-05-13, 90.00,     91.669998, 90.00,     90.519997, 44188200, 90.519997

该表包含8900多条此类Apple股票数据 . 每天有7列=数据点 . 要预测的值是“AdjClose”,这是一天结束时的值

因此,目标是根据前几天的顺序预测第二天的AdjClose . (这可能几乎是不可能的,但总是很高兴看到工具在具有挑战性的条件下如何表现 . )

我认为这应该是LSTM非常标准的预测/回归情况,并且可以轻松转移到其他问题域 .

那么,如何格式化数据(X_train,y_train)以实现最小冗余,以及如何仅使用一个LSTM层和几个隐藏神经元来初始化Sequential模型?

亲切的问候,西奥

PS:我开始编码:

...
X_train
Out[6]: 
array([[  2.87500000e+01,   2.88750000e+01,   2.87500000e+01,
      2.87500000e+01,   1.17258400e+08,   4.31358010e-01],
   [  2.73750019e+01,   2.73750019e+01,   2.72500000e+01,
      2.72500000e+01,   4.39712000e+07,   4.08852011e-01],
   [  2.53750000e+01,   2.53750000e+01,   2.52500000e+01,
      2.52500000e+01,   2.64320000e+07,   3.78845006e-01],
   ..., 
   [  9.23899994e+01,   9.43899994e+01,   9.16500015e+01,
      9.38799973e+01,   6.11406000e+07,   9.38799973e+01],
   [  9.45500031e+01,   9.46999969e+01,   9.30100021e+01,
      9.34899979e+01,   4.65074000e+07,   9.34899979e+01],
   [  9.41600037e+01,   9.52099991e+01,   9.38899994e+01,
      9.45599976e+01,   4.19231000e+07,   9.45599976e+01]], dtype=float32)

y_train
Out[7]: 
array([  0.40885201,   0.37884501,   0.38822201, ...,  93.87999725,
   93.48999786,  94.55999756], dtype=float32)

到目前为止,数据准备就绪 . 没有引入冗余 . 现在的问题是,如何描述这个数据的Keras LSTM模型/培训过程 .

编辑3:

以下是具有循环网络所需的3D数据结构的更新代码 . (见Lorrit的回答) . 但它不起作用 .

编辑4:在激活('sigmoid')后删除额外的逗号,以正确的方式塑造Y_train . 还是一样的错误 .

import numpy as np

from keras.models import Sequential
from keras.layers import Dense,  Activation, LSTM

nb_timesteps    =  4
nb_features     =  5
batch_size      = 32

# load file
X_train = np.genfromtxt('table.csv', 
                        delimiter=',',  
                        names=None, 
                        unpack=False,
                        dtype=None)

# delete the first row with the names
X_train = np.delete(X_train, (0), axis=0)

# invert the order of the rows, so that the oldest
# entry is in the first row and the newest entry
# comes last
X_train = np.flipud(X_train)

# the last column is our Y
Y_train = X_train[:,6].astype(np.float32)

Y_train = np.delete(Y_train, range(0,6))
Y_train = np.array(Y_train)
Y_train.shape = (len(Y_train), 1)

# we don't use the timestamps. convert the rest to Float32
X_train = X_train[:, 1:6].astype(np.float32)

# shape X_train
X_train.shape = (1,len(X_train), nb_features)


# Now comes Lorrit's code for shaping the 3D-input-data
# http://stackoverflow.com/questions/36992855/keras-how-should-i-prepare-input-data-for-rnn
flag = 0

for sample in range(X_train.shape[0]):
    tmp = np.array([X_train[sample,i:i+nb_timesteps,:] for i in range(X_train.shape[1] - nb_timesteps + 1)])

    if flag==0:
        new_input = tmp
        flag = 1

    else:
        new_input = np.concatenate((new_input,tmp))

X_train = np.delete(new_input, len(new_input) - 1, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
# X successfully shaped

# free some memory
tmp = None
new_input = None


# split data for training, validation and test
# 50:25:25
X_train, X_test = np.split(X_train, 2, axis=0)
X_valid, X_test = np.split(X_test, 2, axis=0)

Y_train, Y_test = np.split(Y_train, 2, axis=0)
Y_valid, Y_test = np.split(Y_test, 2, axis=0)


print('Build model...')

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

model.compile(loss='mse',
              optimizer='RMSprop',
              metrics=['accuracy'])

print('Train...')
print(X_train.shape)
print(Y_train.shape)
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=15,
          validation_data=(X_test, Y_test))
score, acc = model.evaluate(X_test, Y_test,
                            batch_size=batch_size)

print('Test score:', score)
print('Test accuracy:', acc)

Keras说,数据似乎仍存在问题:

Using Theano backend.
Using gpu device 0: GeForce GTX 960 (CNMeM is disabled, cuDNN not available)Build model...

Traceback (most recent call last):

  File "<ipython-input-1-3a6e9e045167>", line 1, in <module>
    runfile('C:/Users/admin/Documents/pycode/lstm/lstm5.py', wdir='C:/Users/admin/Documents/pycode/lstm')

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "C:/Users/admin/Documents/pycode/lstm/lstm5.py", line 79, in <module>
    Activation('sigmoid')

  File "d:\git\keras\keras\models.py", line 93, in __init__
    self.add(layer)

  File "d:\git\keras\keras\models.py", line 146, in add
    output_tensor = layer(self.outputs[0])

  File "d:\git\keras\keras\engine\topology.py", line 441, in __call__
    self.assert_input_compatibility(x)

  File "d:\git\keras\keras\engine\topology.py", line 382, in assert_input_compatibility
    str(K.ndim(x)))

Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2

3 回答

  • 1

    在模型定义中,您在LSTM图层之前放置了一个Dense图层 . 您需要在Dense图层上使用TimeDistributed图层 .

    试着改变

    model = Sequential([
        Dense(8, input_dim=nb_features),
        Activation('softmax'),
        LSTM(4, dropout_W=0.2, dropout_U=0.2),
        Dense(1),
        Activation('sigmoid')
    ])
    

    model = Sequential([
        TimeDistributed(Dense(8, input_dim=nb_features, Activation='softmax')),
        LSTM(4, dropout_W=0.2, dropout_U=0.2),
        Dense(1),
        Activation('sigmoid')
    ])
    
  • 0

    在将数据提供给LSTM之前,您仍然缺少一个预处理步骤 . 您必须决定在计算当天的AdjClose时要包含的先前数据样本(前几天) . 请参阅我的回答here如何做到这一点 . 那么您的数据应该是三维形状(nb_samples,nb_included_previous_days,features) .

    然后,您可以使用一个输出将3D提供给标准LSTM图层 . 您可以将此值与y_train进行比较,并尝试将错误最小化 . 请记住选择适合回归的损失函数,例如:均方误差 .

  • 2

    不确定这是否仍然相关,但有一个很好的例子,说明如何使用LSTM网络预测Jason Brownlees博士的时间序列博客here

    我准备了三个具有不同幅度的噪声相移正弦波的例子 . 不是市场数据,但我认为,你假设一只股票会说另一种股票 .

    import numpy
    import matplotlib.pyplot as plt
    import pandas
    import math
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.layers import LSTM
    from keras.layers import Reshape
    from sklearn.preprocessing import MinMaxScaler
    from sklearn.metrics import mean_squared_error
    # generate sine wavepip
    def make_sine_with_noise(_start, _stop, _step, _phase_shift, gain):
        x = numpy.arange(_start, _stop, step = _step)
        noise = numpy.random.uniform(-0.1, 0.1, size = len(x))
        y = gain*0.5*numpy.sin(x+_phase_shift)
        y = numpy.add(noise, y)
        return x, y
    # convert an array of values into a dataset matrix
    def create_dataset(dataset, look_back=1, look_ahead=1):
        dataX, dataY = [], []
        for i in range(len(dataset) - look_back - look_ahead - 1):
            a = dataset[i:(i + look_back), :]
            dataX.append(a)
            b = dataset[(i + look_back):(i + look_back + look_ahead), :]
            dataY.append(b)
        return numpy.array(dataX), numpy.array(dataY)
    # fix random seed for reproducibility
    numpy.random.seed(7)
    # generate sine wave
    x1, y1 = make_sine_with_noise(0, 200, 1/24, 0, 1)
    x2, y2 = make_sine_with_noise(0, 200, 1/24, math.pi/4, 3)
    x3, y3 = make_sine_with_noise(0, 200, 1/24, math.pi/2, 20)
    # plt.plot(x1, y1)
    # plt.plot(x2, y2)
    # plt.plot(x3, y3)
    # plt.show()
    #transform to pandas dataframe
    dataframe = pandas.DataFrame({'y1': y1, 'y2': y2, 'x3': y3})
    dataset = dataframe.values
    dataset = dataset.astype('float32')
    # normalize the dataset
    scaler = MinMaxScaler(feature_range=(0, 1))
    dataset = scaler.fit_transform(dataset)
    #split into train and test sets
    train_size = int(len(dataset) * 0.67)
    test_size = len(dataset) - train_size
    train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
    # reshape into X=t and Y=t+1
    look_back = 10
    look_ahead = 5
    trainX, trainY = create_dataset(train, look_back, look_ahead)
    testX, testY = create_dataset(test, look_back, look_ahead)
    print(trainX.shape)
    print(trainY.shape)
    # reshape input to be [samples, time steps, features]
    trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], trainX.shape[2]))
    testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1], testX.shape[2]))
    # create and fit the LSTM network
    model = Sequential()
    model.add(LSTM(look_ahead, input_shape=(trainX.shape[1], trainX.shape[2]), return_sequences=True))
    model.add(LSTM(look_ahead, input_shape=(look_ahead, trainX.shape[2])))
    model.add(Dense(trainY.shape[1]*trainY.shape[2]))
    model.add(Reshape((trainY.shape[1], trainY.shape[2])))
    model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(trainX, trainY, epochs=1, batch_size=1, verbose=1)
    # make prediction
    trainPredict = model.predict(trainX)
    testPredict = model.predict(testX)
    
    #save model
    model.save('my_sin_prediction_model.h5')
    
    trainPredictPlottable = trainPredict[::look_ahead]
    trainPredictPlottable = [item for sublist in trainPredictPlottable for item in sublist]
    trainPredictPlottable = scaler.inverse_transform(numpy.array(trainPredictPlottable))
    # create single testPredict array concatenating every 'look_ahed' prediction array
    testPredictPlottable = testPredict[::look_ahead]
    testPredictPlottable = [item for sublist in testPredictPlottable for item in sublist]
    testPredictPlottable = scaler.inverse_transform(numpy.array(testPredictPlottable))
    # testPredictPlottable = testPredictPlottable[:-look_ahead]
    # shift train predictions for plotting
    trainPredictPlot = numpy.empty_like(dataset)
    trainPredictPlot[:, :] = numpy.nan
    trainPredictPlot[look_back:len(trainPredictPlottable)+look_back, :] = trainPredictPlottable
    # shift test predictions for plotting
    testPredictPlot = numpy.empty_like(dataset)
    testPredictPlot[:, :] = numpy.nan
    testPredictPlot[len(dataset)-len(testPredictPlottable):len(dataset), :] = testPredictPlottable
    # plot baseline and predictions
    dataset = scaler.inverse_transform(dataset)
    plt.plot(dataset, color='k')
    plt.plot(trainPredictPlot)
    plt.plot(testPredictPlot)
    plt.show()
    

相关问题