首页 文章

在预测单个数据实例时,功能与OneHotEncoder不匹配

提问于
浏览
0

onehotencoder如何用于单个值预测

错误Msg- ValueError:模型的要素数必须与输入匹配 . 模型n_features为1261,输入n_features为16

我正在训练文本数据的随机森林分类器 . 我正在为每个文本数据实例计算16个功能 . 由于所有这16个变量都被分类,我使用 OneHotEncoder 来对这16个变量中的每一个进行编码 . 这导致训练矩阵的1261列 . 我还为这些做了功能缩放 . 我还完成了我的训练数据的分割,并应用预测器来获得混淆矩阵,分类报告 . 我还在本地磁盘上以pickle格式保存分类器,标准缩放器变量, onehotencoder 变量 .

现在我想在一个新的单独文件中创建预测器的服务(REST) . 此API将使用 .pkl 格式的已保存模型并预测新单个文本值的值 - 基本上给出其预测的类名和相应的置信度分数 .

我面临的问题是:当我对这个单个文本值进行编码时,我得到一个包含16个特征的向量 . 它不会被编码为1261功能 . 因此,当我在新文本上运行此分类器上的 predict() 函数时,它会给我以下错误:

%(self.n_features_,n_features))ValueError:模型的要素数必须与输入匹配 . 模型n_features为1261,输入n_features为16

当编码矩阵与先前训练的分类器的大小不匹配时,如何使用反序列化的 pkl 模型来预测单个实例?如何解决此问题 .

Edit: 发布代码片段和异常堆栈:

# Loading the .pkl files used in training
with open('model.pkl', 'rb') as f_model:
    classifier = pickle.load(f_model) # trained classifier model

with open('labelencoder_file.pkl', 'rb') as f_lblenc:
    label_encoder = pickle.load(f_lblenc) # label encoder object used in training

with open('encoder_file.pkl', 'rb') as f_onehotenc:
    onehotencoder = pickle.load(f_onehotenc) # onehotencoder object used in training

with open('sc_file.pkl', 'rb') as f_sc:
    scaler = pickle.load(f_sc) # standard scaler object used in training

X = df_features # df_features is the dataframe containing the computed feature values. It has 16 columns as 16 features have been computed for the new value
X.values[:, 0] = label_encoder.fit_transform(X.values[:, 0])
X.values[:, 1] = label_encoder.fit_transform(X.values[:, 1])
# This is repeated  till X.values[:, 15] as all features are categorical

X = onehotencoder.fit_transform(X).toarray()
X = scaler.fit_transform(X)
print(X.shape) # This prints (1, 16), thus showing that encoding has not worked properly

y_pred = classifier.predict(X) # This throws the exception

Traceback (most recent call last):

文件“/home/Test/api.py”,第256行,在api_func()中y_pred = classifier.predict(X)

文件“/usr/local/lib/python3.6/dist-packages/sklearn/ensemble/forest.py”,第538行,预测proba = self.predict_proba(X)

文件“/usr/local/lib/python3.6/dist-packages/sklearn/ensemble/forest.py”,第578行,在predict_proba中X = self._validate_X_predict(X)

文件“/usr/local/lib/python3.6/dist-packages/sklearn/ensemble/forest.py”,第357行,在_validate_X_predict中返回self.estimators_ [0] ._ validate_X_predict(X,check_input = True)

文件“/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py”,第384行,在_validate_X_predict%(self.n_features_,n_features)中)

ValueError:模型的要素数必须与输入匹配 . 模型n_features为1261,输入n_features为16

1 回答

  • 0

    在此处发布修改后的代码以解决问题

    '''Loading .pkl files that were persisted during training'''
    with open('model.pkl', 'rb') as f_model:
        classifier = pickle.load(f_model) # trained classifier model
    
    with open('labelencoder00.pkl', 'rb') as f_lblenc00:
        label_encoder00 = pickle.load(f_lblenc00) # LabelEncoder() object that was used for encoding the first categorical variable
    with open('labelencoder01.pkl', 'rb') as f_lblenc01:
        label_encoder01 = pickle.load(f_lblenc01) # LabelEncoder() object that was used for encoding the second categorical variable
    
    with open('onehotencoder.pkl', 'rb') as f_onehotenc:
        onehotencoder = pickle.load(f_onehotenc) # OneHotEncoder object that was used in training
    
    
    X = df_features # df_features is the dataframe containing the computed feature values
    X.values[:, 0] = label_encoder00.transform(X.values[:, 0])
    X.values[:, 1] = label_encoder01.transform(X.values[:, 1])
    
    X = onehotencoder.transform(X).toarray()
    
    pred = classifier.predict(X)
    

相关问题