GridSearchCV.best_score_表示评分设置为'accuracy'和CV时-Java 学习之路

我正在尝试找到应用于众所周知的威斯康星癌症数据集（569个样本，31个特征目标）的乳腺癌样本分类的最佳模型神经网络模型 . 我正在使用sklearn 0.18.1 . 到目前为止我还没有使用Normalization . 当我解决这个问题时，我会添加它 .

# some init code omitted
X_train, X_test, y_train, y_test = train_test_split(X, y)

为GridSearchCV定义params NN params

tuned_params = [{'solver': ['sgd'], 'learning_rate': ['constant'], "learning_rate_init" : [0.001, 0.01, 0.05, 0.1]},
                {"learning_rate_init" : [0.001, 0.01, 0.05, 0.1]}]

CV方法和模型

cv_method = KFold(n_splits=4, shuffle=True)
model = MLPClassifier()

应用网格

grid = GridSearchCV(estimator=model, param_grid=tuned_params, cv=cv_method, scoring='accuracy')
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)

如果我跑：

print(grid.best_score_)
print(accuracy_score(y_test, y_pred))

结果为0.746478873239和0.902097902098

根据文件"best_score_ : float, Score of best_estimator on the left out data" . 我认为在运行8种不同配置的那些中获得的最佳准确度是在tuned_params中指定的次数，由KFold指定的次数，在左边的数据中由KFold指定 . 我对吗？

还有一个问题 . 有没有一种方法可以找到在train_test_split中使用的最佳测试数据大小，默认为0.25？

非常感谢

参考

1 回答

6

grid.best_score_ 是您在 tuned_params 中指定的参数的单个组合的所有cv折叠的平均值 .

要访问有关网格搜索过程的其他相关详细信息，您可以查看 grid.cv_results_ 属性 .

来自documentation of GridSearchCV：

cv_results_：numpy（蒙面）ndarrays的dict一个dict，键为列 Headers ，值为列，
可以导入到pandas DataFrame中

它包含“split0_test_score”，“split1_test_score”，“mean_test_score”，“std_test_score”，“rank_test_score”，“split0_train_score”，“split1_train_score”，“mean_train_score”等键，提供有关整个执行的其他信息 .

回复于 2024-04-28T19:26:11+08:00

GridSearchCV.best_score_表示评分设置为'accuracy'和CV时

1 回答

相关问题