scikit-learn RandomForestClassifier中的特征重要性和森林结构如何相关？-Java 学习之路

以下是使用Iris数据集的问题的简单示例 . 当我试图理解如何计算特征重要性以及在使用 export_graphviz 可视化估算器的森林时这是如何可见时，我感到困惑 . 这是我的代码：

import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

data = load_iris()
X = pd.DataFrame(data=data.data,columns=['sepallength', 'sepalwidth', 'petallength','petalwidth'])
y = pd.DataFrame(data=data.target)

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=2,max_depth=1)
rf.fit(X_train,y_train.iloc[:,0])

分类器表现不佳（得分为0.68），因为森林中包含2棵深度为1的树 . 无论如何这在这里无关紧要 .

检索功能重要性如下：

importances = rf.feature_importances_
std = np.std([rf.feature_importances_ for tree in rf.estimators_],axis=0)
indices = np.argsort(importances)[::-1]

print("Feature ranking:")
for f in range(X.shape[1]):
    print("%d. feature %s (%f)" % (f + 1, X.columns.tolist()[f], importances[indices[f]]))

输出是：

Feature ranking:
1. feature sepallength (1.000000)
2. feature sepalwidth (0.000000)
3. feature petallength (0.000000)
4. feature petalwidth (0.000000)

现在显示使用以下代码构建的树的结构：

from sklearn.tree import export_graphviz
export_graphviz(rf.estimators_[0],
                feature_names=X.columns,
                filled=True,
                rounded=True)
!dot -Tpng tree.dot -o tree0.png
from IPython.display import Image
Image('tree0.png')

我得到了这两个数字

树＃0的导出：

enter image description here

树＃1的出口：

enter image description here

我无法理解 sepallength 如何在两个树中使用 importance=1 but not be used 进行节点分割（仅使用 petallength ），如图所示 .

1 回答

3
你有一个错误
```
for f in range(X.shape[1]):
    print("%d. feature %s (%f)" % (f + 1, X.columns.tolist()[f], importances[indices[f]]))
```
如果您使用 indices = np.argsort(importances)[::-1] 进行置换，则需要置换所有内容 - 不要根据一个顺序保留标签，并根据不同的顺序保留重要性 .

如果你替换上面的
```
for f in range(X.shape[1]):
    print("%d. feature %s (%f)" % (f + 1, X.columns.tolist()[f], importances[f]))
```
那么森林及其树木都同意索引2的特征是唯一具有任何重要性的特征 .
回复于 2024-05-06T01:20:47+08:00

scikit-learn RandomForestClassifier中的特征重要性和森林结构如何相关？

1 回答

相关问题