我正在尝试使用LSTM更熟悉Keras中的时间序列预测功能 . 在收集过去30天的价格数据后,我试图预测交易所交易基金(SPY)的收盘价 .

下面是我原始数据集中名为“spy”的前五行:

date        open    high    low     close   
2008-06-25  131.72  133.40  131.24  131.81
2008-06-26  130.57  131.42  128.08  128.23
2008-06-27  128.28  128.86  127.04  127.53
2008-06-30  127.89  128.91  127.30  127.98
2008-07-01  126.52  128.47  125.93  128.38

现在我使用Sklearn标准缩放器来扩展我的数据并将其放回到Pandas Dataframe中:

scaler = StandardScaler()
scaler.fit(spy)
spy = pd.DataFrame(scaler.transform(spy))

接下来,我使用以下函数来创建X_test,X_train,y_test和y_train数据集;

def load_data(stock, seq_len):

    amount_of_features = len(stock.columns)
    data               = stock.as_matrix() 
    sequence_length    = seq_len + 1
    result             = []

    for index in range(len(data) - sequence_length):
        result.append(data[index: index + sequence_length])

        result  = np.array(result)
        row     = round(0.9 * result.shape[0])
        train   = result[:int(row), :] 
        x_train = train[:, :-1] 
        y_train = train[:, -1][:,-1]
        x_test  = result[int(row):, :-1]
        y_test  = result[int(row):, -1][:,-1]

        x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], amount_of_features))
        x_test  = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], amount_of_features))  

   return [x_train, y_train, x_test, y_test]

接下来我创建数据集:

window = 30
X_train, y_train, X_test, y_test, train, result = load_data(spy, window)

现在定义模型:

def build_model(layers):
    d = 0.2
    model = Sequential()
    model.add(LSTM(128, input_shape=(layers[1], layers[0]), return_sequences=True))
    model.add(Dropout(d))
    model.add(LSTM(64, input_shape=(layers[1], layers[0]), return_sequences=False))
    model.add(Dropout(d))
    model.add(Dense(16,init='uniform',activation='relu'))        
    model.add(Dense(1,init='uniform',activation='linear'))
    model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
    return model

现在创建了该模型:

model = build_model([4,window])

然后该模型适合训练数据:

model.fit(
X_train,
y_train,
batch_size=512,
nb_epoch=200,
validation_split=0.1,
verbose=1)

这是我遇到问题的地方 . 当我在训练模型后检查性能时,我得到以下结果:

trainScore = model.evaluate(X_train, y_train, verbose=0)
testScore = model.evaluate(X_test, y_test, verbose=0)
Train Score: 0.00 MSE (0.07 RMSE)
Test Score: 0.29 MSE (0.54 RMSE)

我希望了解为什么结果如此偏离 . 我并不认为该模型将非常准确,但这些结果非常错误导致我认为我犯了一个错误 . 当我绘制y_test与y_predicted时,预测值几乎是一条直线 . 任何帮助,将不胜感激!