Scikit-learn SelectFromModel - 实际获得基础预测变量的特征重要性分数-Java 学习之路

我试图估计我手头的分类任务的特征重要性 . 对我来说重要的是获得代表每个特征重要性的特定数字，而不仅仅是“选择最重要的X特征” .

明显的选择是使用基于树的方法，它提供了很好的feature_importances_方法来获得每个特征的重要性 . 但我对树基分类器的结果不满意 . 我了解到，SelectFromModel方法能够根据重要性得分消除不重要的特征，并成功地为SVM或线性模型做到了这一点 .

我想知道，有没有办法从SelectFromModel获取每个功能的特定重要性分数，而不仅仅是获取最重要的功能列表？

1 回答

通过GitHub source code，我发现了这段代码：

def _get_feature_importances(estimator):
    """Retrieve or aggregate feature importances from estimator"""
    importances = getattr(estimator, "feature_importances_", None)

    if importances is None and hasattr(estimator, "coef_"):
        if estimator.coef_.ndim == 1:
            importances = np.abs(estimator.coef_)

        else:
            importances = np.sum(np.abs(estimator.coef_), axis=0)

    elif importances is None:
        raise ValueError(
            "The underlying estimator %s has no `coef_` or "
            "`feature_importances_` attribute. Either pass a fitted estimator"
            " to SelectFromModel or call fit before calling transform."
            % estimator.__class__.__name__)

    return importances

因此，如果您使用的是线性模型，则代码只是将模型系数用作“重要性分数” .

您可以通过从传递给 SelectFromModel 的估算器中拉出 coef_ 属性来实现 .

例：

sfm = SelectFromModel(LassoCV(), 0.25)
sfm.fit(X, y)
print(sfm.estimator_.coef_)  # print "importance" scores

回复于 2024-04-26T16:48:13+08:00

Scikit-learn SelectFromModel - 实际获得基础预测变量的特征重要性分数

1 回答

相关问题