Scitkit-学习GridSearchCV best_features

我对我建的管道有些困惑 . 它非常简单 - 由我构建的一个名为 QueryQuality() 的变换器组成，它每次都以相同的方式转换我的数据，以及一个我试图通过使用 GridSearchCV 为随机森林选择最佳参数的RandomForestRegressor .

一切运行正常，但是当我检查 model.best_params_ 时，它表示我的随机森林只有1个功能效果最佳，而 model.best_estimator_.named_steps['rfr'].n_features_ 表示最佳随机森林有3个功能 . 是什么赋予了？（遗憾的是，我目前无法提供可重复的示例 . ）

np.random.seed(2016)
estimators = [('qq', QueryQuality()), ('rfr', RandomForestRegressor(n_estimators=50, n_jobs=-1))]
clf = Pipeline(estimators)
param_grid = {'rfr__max_features': [1,2,3], 'rfr__min_weight_fraction_leaf': [.01, .02, .04, .1]}
model = GridSearchCV(estimator=clf, param_grid=param_grid, n_jobs=-1, verbose=3, scoring=rmse_scorer, cv=6)
model.fit(train_X, train_y)

# Check performance
model.best_params_ # returns {'rfr__max_features': 1, 'rfr__min_weight_fraction_leaf': 0.04}
model.best_estimator_.named_steps['rfr'].n_features_ # returns 3

更新：阅读文档更接近，似乎 model.best_estimator_.named_steps['rfr'].n_features_ 返回随机森林适合的功能的数量，所以3有意义 . 真正让我失望的是，我检查了森林中的一些base_estimators_，其中许多显然在他们的决策规则中有多个功能 . 但是，DecisionTreeRegressor提到的文档

注意：搜索分割不会停止，直到找到节点样本的至少一个有效分区，即使它需要有效地检查超过max_features功能

我怀疑是这个原因 . 虽然这仍然有点令人困惑 .

Scitkit-学习GridSearchCV best_features_混淆

相关问题