在Scikit Learn中运行SelectKBest后获取功能名称的最简单方法-Java 学习之路

我想进行有监督的学习 .

到现在为止，我知道要对所有功能进行有监督的学习 .

但是，我还想进行K最佳功能的实验 .

我阅读了文档并发现在Scikit中学习了SelectKBest方法 .

不幸的是，我不确定在找到这些最佳功能后如何创建新的数据帧：

让我们假设我想进行5个最佳功能的实验：

from sklearn.feature_selection import SelectKBest, f_classif
select_k_best_classifier = SelectKBest(score_func=f_classif, k=5).fit_transform(features_dataframe, targeted_class)

现在，如果我要添加下一行：

dataframe = pd.DataFrame(select_k_best_classifier)

我将收到一个没有功能名称的新数据帧（只有索引从0到4开始） .

我应该把它替换为：

dataframe = pd.DataFrame(fit_transofrmed_features, columns=features_names)

我的问题是如何创建features_names列表？

我知道我应该使用：select_k_best_classifier.get_support（）

返回布尔值数组 .

数组中的真值表示右列中的索引 .

我应该如何使用这个布尔数组与我可以通过该方法获得的所有功能名称的数组：

feature_names = list(features_dataframe.columns.values)

4 回答

这对我有用，不需要循环 .

# Create and fit selector
selector = SelectKBest(f_classif, k=5)
selector.fit(features_df, target)
# Get idxs of columns to keep
cols = selector.get_support(indices=True)
# Create new dataframe with only desired columns, or overwrite existing
features_df_new = features_df[cols]

回复于 2024-04-28T12:32:29+08:00

24
对我来说这个代码工作得很好，而且更“pythonic”：
```
mask = select_k_best_classifier.get_support()
new_features = features_dataframe.columns[mask]
```
回复于 2024-04-28T12:32:29+08:00

您可以执行以下操作：

mask = select_k_best_classifier.get_support() #list of booleans
new_features = [] # The list of your K best features

for bool, feature in zip(mask, feature_names):
    if bool:
        new_features.append(feature)

然后更改您的功能名称：

dataframe = pd.DataFrame(fit_transofrmed_features, columns=new_features)

回复于 2024-04-28T12:32:29+08:00

以下代码将帮助您找到具有F分数的前K个特征 . 设，X是pandas数据帧，其列是所有要素，y是类标签列表 .

import pandas as pd
from sklearn.feature_selection import SelectKBest, f_classif
#Suppose, we select 5 features with top 5 Fisher scores
selector = SelectKBest(f_classif, k = 5)
#New dataframe with the selected features for later use in the classifier. fit() method works too, if you want only the feature names and their corresponding scores
X_new = selector.fit_transform(X, y)
names = X.columns.values[selector.get_support()]
scores = selector.scores_[selector.get_support()]
names_scores = list(zip(names, scores))
ns_df = pd.DataFrame(data = names_scores, columns=['Feat_names', 'F_Scores'])
#Sort the dataframe for better visualization
ns_df_sorted = ns_df.sort_values(['F_Scores', 'Feat_names'], ascending = [False, True])
print(ns_df_sorted)

回复于 2024-04-28T12:32:29+08:00

在Scikit Learn中运行SelectKBest后获取功能名称的最简单方法

4 回答

相关问题