在Pandas中加入一个数据集和OneHotEncoder的结果-Java 学习之路

让我们从this example开始考虑房价的数据集 .

我将整个数据集存储在 housing 变量中：

housing.shape

（20640,10）

我也做了一个维度的OneHotEncoder编码并得到 housing_cat_1hot ，所以

housing_cat_1hot.toarray().shape

（20640,5）

My target is to join the two variables and store everything in just one dataset.

我试过Join with index tutorial但问题是第二个矩阵没有任何索引 . 如何在 housing 和 housing_cat_1hot 之间进行连接？

>>> left=housing
>>> right=housing_cat_1hot.toarray()
>>> result = left.join(right)

回溯（最近一次调用最后一次）：文件“”，第1行，结果= left.join（右）文件“/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/ LIB / python3.6 /熊猫/核心/ frame.py “线路5293，在加入rsuffix = rsuffix，排序=排序）文件” /usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions /3.6/lib/python3.6/pandas/core/frame.py“，第5323行，在_join_compat中can_concat = all（df.index.is_unique用于帧中的df）文件”/usr/local/Cellar/python3/3.6 . 3 /框架/ Python.framework /版本/ 3.6 / LIB / python3.6 /熊猫/核心/ frame.py”，线5323，在can_concat =所有（df.index.is_unique在帧DF）AttributeError的：“numpy的 . ndarray'对象没有属性'index'

3 回答

0
那么，取决于你如何创建一个热矢量 . 但如果它的排序方式与原始DataFrame相同，并且本身就是DataFrame，则可以在加入之前添加相同的索引：
```
housing_cat_1hot.index = range(len(housing_cat_1hot))
```
如果它不是DataFrame，请将其转换为一个 . 这很简单，只要两个对象的排序方式相同即可

编辑：如果它不是DataFrame，那么：housing_cat_1hot = pd.DataFrame（housing_cat_1hot）

已经为您创建了合适的索引
回复于 2024-05-02T13:39:15+08:00
1
如果你想加入两个数组（假设housing_cat_1hot和housing都是数组），你可以使用
```
housing = np.hstack((housing, housing_cat_1hot))
```
虽然OneHotEncode变量的最佳方法是在数组中选择该变量并进行编码 . 它为您节省了以后加入两者的麻烦

假设您希望在数组中编码的变量的索引是1，
```
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
le = LabelEncoder()  
X[:, 1] = le.fit_transform(X[:, 1])

onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
```
回复于 2024-05-02T13:39:15+08:00

感谢@ Elez-Shenhar回答我得到以下工作代码：

OneHot=housing_cat_1hot.toarray()
OneHot= pd.DataFrame(OneHot)
result = housing.join(OneHot)
result.shape

（20640,15）

回复于 2024-05-02T13:39:15+08:00

在Pandas中加入一个数据集和OneHotEncoder的结果

3 回答

相关问题