首页 文章

使用sklearn的线性回归进行预测

提问于
浏览
-3
data2 = pd.DataFrame(data1['kwh'])
data2
                         kwh
date    
2012-04-12 14:56:50     1.256400
2012-04-12 15:11:55     1.430750
2012-04-12 15:27:01     1.369910
2012-04-12 15:42:06     1.359350
2012-04-12 15:57:10     1.305680
2012-04-12 16:12:10     1.287750
2012-04-12 16:27:14     1.245970
2012-04-12 16:42:19     1.282280
2012-04-12 16:57:24     1.365710
2012-04-12 17:12:28     1.320130
2012-04-12 17:27:33     1.354890
2012-04-12 17:42:37     1.343680
2012-04-12 17:57:41     1.314220
2012-04-12 18:12:44     1.311970
2012-04-12 18:27:46     1.338980
2012-04-12 18:42:51     1.357370
2012-04-12 18:57:54     1.328700
2012-04-12 19:12:58     1.308200
2012-04-12 19:28:01     1.341770
2012-04-12 19:43:04     1.278350
2012-04-12 19:58:07     1.253170
2012-04-12 20:13:10     1.420670
2012-04-12 20:28:15     1.292740
2012-04-12 20:43:15     1.322840
2012-04-12 20:58:18     1.247410
2012-04-12 21:13:20     0.568352
2012-04-12 21:28:22     0.317865
2012-04-12 21:43:24     0.233603
2012-04-12 21:58:27     0.229524
2012-04-12 22:13:29     0.236929
2012-04-12 22:28:34     0.233806
2012-04-12 22:43:38     0.235618
2012-04-12 22:58:43     0.229858
2012-04-12 23:13:43     0.235132
2012-04-12 23:28:46     0.231863
2012-04-12 23:43:55     0.237794
2012-04-12 23:59:00     0.229634
2012-04-13 00:14:02     0.234484
2012-04-13 00:29:05     0.234189
2012-04-13 00:44:09     0.237213
2012-04-13 00:59:09     0.230483
2012-04-13 01:14:10     0.234982
2012-04-13 01:29:11     0.237121
2012-04-13 01:44:16     0.230910
2012-04-13 01:59:22     0.238406
2012-04-13 02:14:21     0.250530
2012-04-13 02:29:24     0.283575
2012-04-13 02:44:24     0.302299
2012-04-13 02:59:25     0.322093
2012-04-13 03:14:30     0.327600
2012-04-13 03:29:31     0.324368
2012-04-13 03:44:31     0.301869
2012-04-13 03:59:42     0.322019
2012-04-13 04:14:43     0.325328
2012-04-13 04:29:43     0.306727
2012-04-13 04:44:46     0.299012
2012-04-13 04:59:47     0.303288
2012-04-13 05:14:48     0.326205
2012-04-13 05:29:49     0.344230
2012-04-13 05:44:50     0.353484
...

65701 rows × 1 columns

我想使用sklearn的线性回归进行简单预测 . 如何将数据拆分为训练/测试集,如何将目标拆分为训练/测试集 . (我希望x值为时间和y值kwh )

1 回答

  • 1
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn import linear_model
    from sklearn.cross_validation import train_test_split
    
    #create x data
    data2['xraw'] = data2.index
    x  = data2['xraw'].astype(np.int64) // 10**9
    
    y  = data2['kwh']
    y = y.reshape((y.shape[0],1))
    
    #train-test split
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)
    
    
    # Create linear regression object
    regr = linear_model.LinearRegression()
    
    # Train the model using the training sets
    regr.fit(x_train, y_train)
    
    
    # The coefficients
    print('Coefficients: \n', regr.coef_)
    
    # The mean square error
    print("Residual sum of squares: %.2f" % np.mean((regr.predict(x_test) - y_test) ** 2))
    
    # Explained variance score: 1 is perfect prediction
    print('Variance score: %.2f' % regr.score(x_test, y_test))
    
    # Plot outputs
    plt.scatter(x_test, y_test,  color='black')
    plt.plot(x_test, regr.predict(x_test), color='blue', linewidth=3)
    
    plt.xticks(())
    plt.yticks(())
    
    plt.show()
    

相关问题