这是我的代码:
from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import KFold
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import datasets
import numpy as np
newsgroups = datasets.fetch_20newsgroups(
subset='all',
categories=['alt.atheism', 'sci.space']
)
X = newsgroups.data
y = newsgroups.target
TD_IF = TfidfVectorizer()
y_scaled = TD_IF.fit_transform(newsgroups, y)
grid = {'C': np.power(10.0, np.arange(-5, 6))}
cv = KFold(y_scaled.size, n_folds=5, shuffle=True, random_state=241)
clf = SVC(kernel='linear', random_state=241)
gs = GridSearchCV(estimator=clf, param_grid=grid, scoring='accuracy', cv=cv)
gs.fit(X, y_scaled)
我收到错误,我不明白为什么 . 追溯:
回溯(最近一次调用最后一次):文件“C:/Users/Roman/PycharmProjects/week_3/assignment_2.py”,第23行,在gs.fit(X,y_scaled)#TODO:检查此行文件“C:\用户\ Roman \ AppData \ Roaming \ Python \ Python35 \ site-packages \ sklearn \ grid_search.py“,第804行,在fit return self._fit(X,y,ParameterGrid(self.param_grid))文件”C:\ Users \ Roman \ AppData \ Roaming \ Python \ Python35 \ site-packages \ sklearn \ grid_search.py“,第525行,在_fit X中,y =可索引(X,y)文件”C:\ Users \ Roman \ AppData \ Roaming \ Python \ Python35 \ site-packages \ sklearn \ utils \ validation.py“,第201行,在可索引的check_consistent_length(* result)文件中”C:\ Users \ Roman \ AppData \ Roaming \ Python \ Python35 \ site-packages \ sklearn \ utils \ validation.py“,第176行,在check_consistent_length”%s“%str(uniques)中)ValueError:找到样本数不一致的数组:[6 1786]
有人能解释为什么会出现这种错误?
1 回答
我觉得你在这里与
X
和y
有些混淆了 . 你想把你X
转换成一个tf-idf向量,并使用它来对抗y
. 见下文