首页 文章

规范化数据后,sckit-learn fit()会导致错误

提问于
浏览
4

我一直在尝试这个:

  • 从数据集创建X要素和y

  • 拆分数据集

  • 规范化数据

  • 使用Scikit-learn的SVR训练

这是使用填充了随机值的pandas数据帧的代码

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(20,5), columns=["A","B","C","D", "E"])
a = list(df.columns.values)
a.remove("A")

X = df[a]
y = df["A"]

X_train = X.iloc[0: floor(2 * len(X) /3)]
X_test = X.iloc[floor(2 * len(X) /3):]
y_train = y.iloc[0: floor(2 * len(y) /3)]
y_test = y.iloc[floor(2 * len(y) /3):]

# normalise

from sklearn import preprocessing

X_trainS = preprocessing.scale(X_train)
X_trainN = pd.DataFrame(X_trainS, columns=a)

X_testS = preprocessing.scale(X_test)
X_testN = pd.DataFrame(X_testS, columns=a)

y_trainS = preprocessing.scale(y_train)
y_trainN = pd.DataFrame(y_trainS)

y_testS = preprocessing.scale(y_test)
y_testN = pd.DataFrame(y_testS)

import sklearn
from sklearn.svm import SVR

clf = SVR(kernel='rbf', C=1e3, gamma=0.1)

pred = clf.fit(X_trainN,y_trainN).predict(X_testN)

给出了这个错误:

C:\ Anaconda3 \ lib \ site-packages \ pandas \ core \ index.py:542:FutureWarning:使用iloc时的切片索引器应该是整数而不是浮点“而不是浮点”,FutureWarning)----- -------------------------------------------------- -------------------- ValueError Traceback(最近一次调用last)in()34 clf = SVR(kernel ='rbf',C = 1e3,gamma = 0.1) 35 ---> 36 pred = clf.fit(X_trainN,y_trainN).predict(X_testN)37 C:\ Anaconda3 \ lib \ site-packages \ sklearn \ svm \ base.py in fit(self,X,y,sample_weight )174 175 seed = rnd.randint(np.iinfo('i') . max) - > 176 fit(X,y,sample_weight,solver_type,kernel,random_seed = seed)177#看到对np的另一个调用的评论此文件中的.iinfo是_dense_fit中的178 C:\ Anaconda3 \ lib \ site-packages \ sklearn \ svm \ base.py(self,X,y,sample_weight,solver_type,kernel,random_seed)229 cache_size = self.cache_size,coef0 = self.coef0,230 gamma = self._gamma,epsilon = self.epsilon, - > 231 max_iter = self.max_iter,random_seed = random_seed)232 233 self._warn_from_fit_st atus()C:\ anaconda3 \ lib \ site-packages \ sklearn \ svm \ libsvm.pyd in sklearn.svm.libsvm.fit(sklearn \ svm \ libsvm.c:1864)()ValueError:Buffer的维数错误(预计1,得到2)

我不知道为什么 . 谁有人解释一下?我认为它可以在预处理后转换回数据帧 .

1 回答

  • 4

    这里的错误在您传递的标签中作为标签: y_trainN

    如果你与sample docs版本和你的代码进行比较:

    In [40]:
    
    n_samples, n_features = 10, 5
    np.random.seed(0)
    y = np.random.randn(n_samples)
    print(y)
    y_trainN.values
    [ 1.76405235  0.40015721  0.97873798  2.2408932   1.86755799 -0.97727788
      0.95008842 -0.15135721 -0.10321885  0.4105985 ]
    Out[40]:
    array([[-0.06680594],
           [ 0.23535043],
           [-1.49265082],
           [ 1.22537862],
           [-0.46499134],
           [-0.23744759],
           [ 1.40520679],
           [ 0.95882677],
           [ 1.66996413],
           [-0.37515955],
           [-0.75826444],
           [-1.45945337],
           [-0.63995369]])
    

    所以你可以调用 squeeze 来生成一个系列,或者选择df中唯一的列,以便没有错误:

    pred = clf.fit(X_trainN,y_trainN[0]).predict(X_testN)
    

    要么

    pred = clf.fit(X_trainN,y_trainN.squeeze()).predict(X_testN)
    

    所以我们可以争辩说,对于只有一列的df,它应该返回一些可以被强制转换为numpy数组的东西,或者numpy没有正确调用数组属性但实际上你应该传递一个系列或从df中选择列作为参数

相关问题