在预测单个数据实例时，功能与OneHotEncoder不匹配-Java 学习之路

onehotencoder如何用于单个值预测

错误Msg- ValueError：模型的要素数必须与输入匹配 . 模型n_features为1261，输入n_features为16

我正在训练文本数据的随机森林分类器 . 我正在为每个文本数据实例计算16个功能 . 由于所有这16个变量都被分类，我使用 OneHotEncoder 来对这16个变量中的每一个进行编码 . 这导致训练矩阵的1261列 . 我还为这些做了功能缩放 . 我还完成了我的训练数据的分割，并应用预测器来获得混淆矩阵，分类报告 . 我还在本地磁盘上以pickle格式保存分类器，标准缩放器变量， onehotencoder 变量 .

现在我想在一个新的单独文件中创建预测器的服务（REST） . 此API将使用 .pkl 格式的已保存模型并预测新单个文本值的值 - 基本上给出其预测的类名和相应的置信度分数 .

我面临的问题是：当我对这个单个文本值进行编码时，我得到一个包含16个特征的向量 . 它不会被编码为1261功能 . 因此，当我在新文本上运行此分类器上的 predict() 函数时，它会给我以下错误：

％（self.n_features_，n_features））ValueError：模型的要素数必须与输入匹配 . 模型n_features为1261，输入n_features为16

当编码矩阵与先前训练的分类器的大小不匹配时，如何使用反序列化的 pkl 模型来预测单个实例？如何解决此问题 .

Edit: 发布代码片段和异常堆栈：

# Loading the .pkl files used in training
with open('model.pkl', 'rb') as f_model:
    classifier = pickle.load(f_model) # trained classifier model

with open('labelencoder_file.pkl', 'rb') as f_lblenc:
    label_encoder = pickle.load(f_lblenc) # label encoder object used in training

with open('encoder_file.pkl', 'rb') as f_onehotenc:
    onehotencoder = pickle.load(f_onehotenc) # onehotencoder object used in training

with open('sc_file.pkl', 'rb') as f_sc:
    scaler = pickle.load(f_sc) # standard scaler object used in training

X = df_features # df_features is the dataframe containing the computed feature values. It has 16 columns as 16 features have been computed for the new value
X.values[:, 0] = label_encoder.fit_transform(X.values[:, 0])
X.values[:, 1] = label_encoder.fit_transform(X.values[:, 1])
# This is repeated  till X.values[:, 15] as all features are categorical

X = onehotencoder.fit_transform(X).toarray()
X = scaler.fit_transform(X)
print(X.shape) # This prints (1, 16), thus showing that encoding has not worked properly

y_pred = classifier.predict(X) # This throws the exception

Traceback (most recent call last):

文件“/home/Test/api.py”，第256行，在api_func（）中y_pred = classifier.predict（X）

文件“/usr/local/lib/python3.6/dist-packages/sklearn/ensemble/forest.py”，第538行，预测proba = self.predict_proba（X）

文件“/usr/local/lib/python3.6/dist-packages/sklearn/ensemble/forest.py”，第578行，在predict_proba中X = self._validate_X_predict（X）

文件“/usr/local/lib/python3.6/dist-packages/sklearn/ensemble/forest.py”，第357行，在_validate_X_predict中返回self.estimators_ [0] ._ validate_X_predict（X，check_input = True）

文件“/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py”，第384行，在_validate_X_predict％（self.n_features_，n_features）中）

ValueError：模型的要素数必须与输入匹配 . 模型n_features为1261，输入n_features为16

1 回答

在此处发布修改后的代码以解决问题

'''Loading .pkl files that were persisted during training'''
with open('model.pkl', 'rb') as f_model:
    classifier = pickle.load(f_model) # trained classifier model

with open('labelencoder00.pkl', 'rb') as f_lblenc00:
    label_encoder00 = pickle.load(f_lblenc00) # LabelEncoder() object that was used for encoding the first categorical variable
with open('labelencoder01.pkl', 'rb') as f_lblenc01:
    label_encoder01 = pickle.load(f_lblenc01) # LabelEncoder() object that was used for encoding the second categorical variable

with open('onehotencoder.pkl', 'rb') as f_onehotenc:
    onehotencoder = pickle.load(f_onehotenc) # OneHotEncoder object that was used in training


X = df_features # df_features is the dataframe containing the computed feature values
X.values[:, 0] = label_encoder00.transform(X.values[:, 0])
X.values[:, 1] = label_encoder01.transform(X.values[:, 1])

X = onehotencoder.transform(X).toarray()

pred = classifier.predict(X)

回复于 2024-05-12T23:23:38+08:00

在预测单个数据实例时，功能与OneHotEncoder不匹配

1 回答

相关问题