ValueError：找到样本数不一致的数组[6 1786]-Java 学习之路

这是我的代码：

from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import KFold
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import datasets
import numpy as np

newsgroups = datasets.fetch_20newsgroups(
                subset='all',
                categories=['alt.atheism', 'sci.space']
         )
X = newsgroups.data
y = newsgroups.target

TD_IF = TfidfVectorizer()
y_scaled = TD_IF.fit_transform(newsgroups, y)
grid = {'C': np.power(10.0, np.arange(-5, 6))}
cv = KFold(y_scaled.size, n_folds=5, shuffle=True, random_state=241) 
clf = SVC(kernel='linear', random_state=241)

gs = GridSearchCV(estimator=clf, param_grid=grid, scoring='accuracy', cv=cv)
gs.fit(X, y_scaled)

我收到错误，我不明白为什么 . 追溯：

回溯（最近一次调用最后一次）：文件“C：/Users/Roman/PycharmProjects/week_3/assignment_2.py”，第23行，在gs.fit（X，y_scaled）#TODO：检查此行文件“C：\用户\ Roman \ AppData \ Roaming \ Python \ Python35 \ site-packages \ sklearn \ grid_search.py“，第804行，在fit return self._fit（X，y，ParameterGrid（self.param_grid））文件”C：\ Users \ Roman \ AppData \ Roaming \ Python \ Python35 \ site-packages \ sklearn \ grid_search.py“，第525行，在_fit X中，y =可索引（X，y）文件”C：\ Users \ Roman \ AppData \ Roaming \ Python \ Python35 \ site-packages \ sklearn \ utils \ validation.py“，第201行，在可索引的check_consistent_length（* result）文件中”C：\ Users \ Roman \ AppData \ Roaming \ Python \ Python35 \ site-packages \ sklearn \ utils \ validation.py“，第176行，在check_consistent_length”％s“％str（uniques）中）ValueError：找到样本数不一致的数组：[6 1786]

有人能解释为什么会出现这种错误？

1 回答

我觉得你在这里与 X 和 y 有些混淆了 . 你想把你 X 转换成一个tf-idf向量，并使用它来对抗 y . 见下文

from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import KFold
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import datasets
import numpy as np

newsgroups = datasets.fetch_20newsgroups(
                subset='all',
                categories=['alt.atheism', 'sci.space']
         )
X = newsgroups.data
y = newsgroups.target

TD_IF = TfidfVectorizer()
X_scaled = TD_IF.fit_transform(X, y)
grid = {'C': np.power(10.0, np.arange(-1, 1))}
cv = KFold(y_scaled.size, n_folds=5, shuffle=True, random_state=241) 
clf = SVC(kernel='linear', random_state=241)

gs = GridSearchCV(estimator=clf, param_grid=grid, scoring='accuracy', cv=cv)
gs.fit(X_scaled, y)

回复于 2024-05-15T11:04:51+08:00

ValueError：找到样本数不一致的数组[6 1786]

1 回答

相关问题