稀疏矩阵的分类属性-Java 学习之路

首先，我是机器学习的新手 .

我试图预测二手车的价格 . 这车有品牌和型号，所以我使用MultiLabelBinarizer制作稀疏矩阵，处理分类属性，这里是代码：

from sklearn.preprocessing import MultiLabelBinarizer
encoder = MultiLabelBinarizer()
make_cat_1hot = encoder.fit_transform(make_cat)
model_cat_1hot = encoder.fit_transform(model_cat)
type_cat_1hot = encoder.fit_transform(type_cat)

print(type(make_cat_1hot))
carInfoModHot = carsInfoMod.copy()
carInfoModHot["makeHot"] = make_cat_1hot.tolist()
carInfoModHot["modelHot"] = model_cat_1hot.tolist()
carInfoModHot["typeHot"] = type_cat_1hot.tolist()



doors   km      make        year    makeHot                       modelHot  
5.0     78779   Mercedes    2012    [0, 0, 0,  0, 1, 0, 0, 0, ...[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, ...  
5.0     25463   Bmw         2015    [0, 1, 0, 0, 0, 0, 0, ...   [1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, ...

然后我用它来做预测并用线性回归得到均方误差：

lr = linear_model.LinearRegression()

carsInfoTrainHot = carInfoModHot.drop(["price"], axis=1) # drop labels for training set

df1 = carsInfoTrainHot.iloc[:30000, :]
carsLabels1 = carsInfoMod.iloc[:30000, 3]
print(carsInfoTrainHot.head())
df2 = carsInfoTrainHot.iloc[30001:60000, :]
carsLabels2 = carsInfoMod.iloc[30001:60000, 3]
df3 = carsInfoTrainHot.iloc[60001:, :]
carsLabels3 = carsInfoMod.iloc[60001:, 3]

lr.fit(df1, carsLabels1) 
print(carsInfoTrainHot.shape)
carPrediction = lr.predict(df2)

lin_mse = mean_squared_error(carsLabels2, carPrediction)

lin_rmse = np.sqrt(lin_mse)

但我得到这个错误：

ValueError Traceback（最近一次调用最后一次）in（）12 carsLabels3 = carsInfoMod.iloc [60001：，3] 13 ---> 14 lr.fit（df1，carsLabels1）15 print（carsInfoTrainHot.shape）16 carPrediction = lr . 预测（df2）/home/vagrant/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/base.py in fit（self，X，y，sample_weight）510 n_jobs_ = self.n_jobs 511 X，y = check_X_y（X，y，accept_sparse = ['csr'，'csc'，'coo']， - > 512 y_numeric = True，multi_output = True）513 514 if sample_weight不是None和np.atleast_1d（sample_weight）.ndim > 1：/home/vagrant/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in check_X_y（X，y，accept_sparse，dtype，order，copy，force_all_finite，ensure_2d，allow_nd，multi_output， ensure_min_samples，ensure_min_features，y_numeric，warn_on_dtype，estimator）519 X = check_array（X，accept_sparse，dtype，order，copy，force_all_finite，520 ensure_2d，allow_nd，ensure_min_samples， - > 521 ensure_min_features，warn_on_dtype，estimator）522 if multi_output：523 y = check_array（array，accept_sparse，dtype，order ，copy，force_all_finite，ensure_2d，allow_nd，ensure_min_samples，ensure_min_features，warn_on_dtype，estimator）400＃确保我们实际转换为数字：401如果dtype_numeric和array.dtype.kind ==“O”： - > 402 array = array . astype（np.float64）403如果不是allow_nd和array.ndim> = 3：404引发ValueError（“找到dim％d的数组 . ％s期望<= 2.“ValueError：使用序列设置数组元素 .

据我所知，我在分类属性中插入一个数组，但我怎样才能将分类值更改为稀疏矩阵？

谢谢 .

稀疏矩阵的分类属性

相关问题