首页 文章

Scikit Learn - 使用GridSearchCV训练新模型

提问于
浏览
1

如果我使用GridSearchCV和管道获得最佳参数,无论如何都要保存训练模型,那么将来我可以将整个管道调用到新数据并为其生成预测?例如,我有以下管道,后跟参数的gridsearchcv:

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(SVC(probability=True))),
])

parameters = {
    'vect__ngram_range': ((1, 1),(1, 2),(1,3)),  # unigrams or bigrams
    'clf__estimator__kernel': ('rbf','linear'),
    'clf__estimator__C': tuple([10**i for i in range(-10,11)]),
}

grid_search = GridSearchCV(pipeline,parameters,n_jobs=-1,verbose=1)

print("Performing grid search...")
print("pipeline:", [name for name, _ in pipeline.steps])
print("parameters:")
pprint(parameters)
t0 = time()
#Conduct the grid search
grid_search.fit(X,y)
print("done in %0.3fs" % (time() - t0))
print()

print("Best score: %0.3f" % grid_search.best_score_)
print("Best parameters set:")
#Obtain the top performing parameters
best_parameters = grid_search.best_estimator_.get_params()
#Print the results
for param_name in sorted(parameters.keys()):
    print("\t%s: %r" % (param_name, best_parameters[param_name]))

现在我想将所有这些步骤保存到一个流程中,以便我可以将它应用于一个新的,看不见的数据集,它将使用相同的参数,矢量化器和变换器来转换,实现和报告结果吗?

1 回答

  • 7

    您可以只选择 GridSearchCV 对象进行保存,然后在想要使用它来预测新数据时将其取消删除 .

    import pickle
    
    # Fit model and pickle fitted model
    grid_search.fit(X,y)
    with open('/model/path/model_pickle_file', "w") as fp:
        pickle.dump(grid_search, fp)
    
    # Load model from file
    with open('/model/path/model_pickle_file', "r") as fp:
        grid_search_load = pickle.load(fp)
    
    # Predict new data with model loaded from disk
    y_new = grid_search_load.best_estimator_.predict(X_new)
    

相关问题