Scikit Learn - 使用GridSearchCV训练新模型-Java 学习之路

如果我使用GridSearchCV和管道获得最佳参数，无论如何都要保存训练模型，那么将来我可以将整个管道调用到新数据并为其生成预测？例如，我有以下管道，后跟参数的gridsearchcv：

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(SVC(probability=True))),
])

parameters = {
    'vect__ngram_range': ((1, 1),(1, 2),(1,3)),  # unigrams or bigrams
    'clf__estimator__kernel': ('rbf','linear'),
    'clf__estimator__C': tuple([10**i for i in range(-10,11)]),
}

grid_search = GridSearchCV(pipeline,parameters,n_jobs=-1,verbose=1)

print("Performing grid search...")
print("pipeline:", [name for name, _ in pipeline.steps])
print("parameters:")
pprint(parameters)
t0 = time()
#Conduct the grid search
grid_search.fit(X,y)
print("done in %0.3fs" % (time() - t0))
print()

print("Best score: %0.3f" % grid_search.best_score_)
print("Best parameters set:")
#Obtain the top performing parameters
best_parameters = grid_search.best_estimator_.get_params()
#Print the results
for param_name in sorted(parameters.keys()):
    print("\t%s: %r" % (param_name, best_parameters[param_name]))

现在我想将所有这些步骤保存到一个流程中，以便我可以将它应用于一个新的，看不见的数据集，它将使用相同的参数，矢量化器和变换器来转换，实现和报告结果吗？

1 回答

您可以只选择 GridSearchCV 对象进行保存，然后在想要使用它来预测新数据时将其取消删除 .

import pickle

# Fit model and pickle fitted model
grid_search.fit(X,y)
with open('/model/path/model_pickle_file', "w") as fp:
    pickle.dump(grid_search, fp)

# Load model from file
with open('/model/path/model_pickle_file', "r") as fp:
    grid_search_load = pickle.load(fp)

# Predict new data with model loaded from disk
y_new = grid_search_load.best_estimator_.predict(X_new)

回复于 2024-04-30T07:05:12+08:00

Scikit Learn - 使用GridSearchCV训练新模型

1 回答

相关问题