通过网格搜索和sklearn中的管道获得正确的交叉验证分数-Java 学习之路

我的设置：我正在运行一个进程（=管道），在我选择相关变量之后运行回归（在标准化数据之后 - 我已经省略的步骤，因为它们在这个实例中是无关的），我将通过网格搜索进行优化，如下所示

fold = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=777)
regression_estimator = LogisticRegression(penalty='l2', random_state=777, max_iter=10000, tol=10, solver='newton-cg')
pipeline_steps = [('feature_selection', SelectKBest(f_regression)), ('regression', regression_estimator)]

pipe = Pipeline(steps=pipeline_steps)

feature_selection_k_options = np.arange(1, 33, 3)

param_grid = {'feature_selection__k': feature_selection_k_options}

gs = GridSearchCV(pipe, param_grid=param_grid, scoring='recall', cv=fold)
gs.fit(X, y)

因为默认 refit=True 在 GridSearchCV 中，我默认得到best_estimator，我很好 . 我所缺少的是，鉴于这个best_estimator，我如何仅在我在程序中预先拆分的TEST数据上得到交叉验证的分数 . 事实上，有 .score(X, Y) 方法，但是，正如文档所指示的（http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.predict_proba）"Returns the mean accuracy on the given test data and labels"而我希望通过cross_val_score（http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html）完成的工作 . 问题是这个程序重新运行所有内容并只保留那些结果（我希望得到这个过程中产生的所有数量） .

从本质上讲，我想从最佳估算器中提取测试数据上的交叉验证分数，并选择我所选择的（或在网格搜索中已经选择的分数）并使用已嵌入我的 Pipeline 中的CrossValidated算法（在这种情况下 StratifiedShuffleSplit ）

你知道怎么做吗？

1 回答

2
您可以通过 cv_results_ 属性访问交叉验证分数，该属性可以方便地读取到pandas DataFrame中：
```
import pandas as pd
df_result = pd.DataFrame(gs.cv_results_)
```
关于"with a measure of my choosing"，您可以查看this示例，其中显示了如何在GridSearchCV中一次计算多个得分手 .
回复于 2024-04-27T22:24:38+08:00

通过网格搜索和sklearn中的管道获得正确的交叉验证分数

1 回答

相关问题