使用LabelEncoder 和 OneHotEncoder将csv文件列标签为A1,A4,A5,A6,A7,A9,A10,A11,A12,A13,A16的英文标签自动替换成不同数字
时间: 2023-09-17 17:14:32 浏览: 105
对python 数据处理中的LabelEncoder 和 OneHotEncoder详解
假设我们有一个csv文件,其中包含十一个列,分别以A1、A4、A5、A6、A7、A9、A10、A11、A12、A13和A16作为列标签,我们需要将这些列的英文标签自动替换成不同数字,可以使用如下代码:
```python
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
# 读取csv文件
df = pd.read_csv('your_file.csv')
# 实例化LabelEncoder和OneHotEncoder
le_A1 = LabelEncoder()
le_A4 = LabelEncoder()
le_A5 = LabelEncoder()
le_A6 = LabelEncoder()
le_A7 = LabelEncoder()
le_A9 = LabelEncoder()
le_A10 = LabelEncoder()
le_A11 = LabelEncoder()
le_A12 = LabelEncoder()
le_A13 = LabelEncoder()
le_A16 = LabelEncoder()
ohe = OneHotEncoder()
# 将英文标签替换成数字
df['A1'] = le_A1.fit_transform(df['A1'])
df['A4'] = le_A4.fit_transform(df['A4'])
df['A5'] = le_A5.fit_transform(df['A5'])
df['A6'] = le_A6.fit_transform(df['A6'])
df['A7'] = le_A7.fit_transform(df['A7'])
df['A9'] = le_A9.fit_transform(df['A9'])
df['A10'] = le_A10.fit_transform(df['A10'])
df['A11'] = le_A11.fit_transform(df['A11'])
df['A12'] = le_A12.fit_transform(df['A12'])
df['A13'] = le_A13.fit_transform(df['A13'])
df['A16'] = le_A16.fit_transform(df['A16'])
# 将数字编码转换成独热编码
A1_encoded = ohe.fit_transform(df['A1'].values.reshape(-1, 1)).toarray()
A4_encoded = ohe.fit_transform(df['A4'].values.reshape(-1, 1)).toarray()
A5_encoded = ohe.fit_transform(df['A5'].values.reshape(-1, 1)).toarray()
A6_encoded = ohe.fit_transform(df['A6'].values.reshape(-1, 1)).toarray()
A7_encoded = ohe.fit_transform(df['A7'].values.reshape(-1, 1)).toarray()
A9_encoded = ohe.fit_transform(df['A9'].values.reshape(-1, 1)).toarray()
A10_encoded = ohe.fit_transform(df['A10'].values.reshape(-1, 1)).toarray()
A11_encoded = ohe.fit_transform(df['A11'].values.reshape(-1, 1)).toarray()
A12_encoded = ohe.fit_transform(df['A12'].values.reshape(-1, 1)).toarray()
A13_encoded = ohe.fit_transform(df['A13'].values.reshape(-1, 1)).toarray()
A16_encoded = ohe.fit_transform(df['A16'].values.reshape(-1, 1)).toarray()
# 将独热编码添加到原数据中
df_encoded = pd.concat([
df.drop(['A1', 'A4', 'A5', 'A6', 'A7', 'A9', 'A10', 'A11', 'A12', 'A13', 'A16'], axis=1),
pd.DataFrame(A1_encoded),
pd.DataFrame(A4_encoded),
pd.DataFrame(A5_encoded),
pd.DataFrame(A6_encoded),
pd.DataFrame(A7_encoded),
pd.DataFrame(A9_encoded),
pd.DataFrame(A10_encoded),
pd.DataFrame(A11_encoded),
pd.DataFrame(A12_encoded),
pd.DataFrame(A13_encoded),
pd.DataFrame(A16_encoded),
], axis=1)
# 将转换后的数据保存到新的csv文件中
df_encoded.to_csv('new_file.csv', index=False)
```
与上一个问题的解决方案相似,不同之处在于我们需要对每一列都实例化一个LabelEncoder对象,并分别使用fit_transform函数进行转换。然后,我们使用OneHotEncoder对每一列进行独热编码,并将独热编码添加到原数据中。最后,我们使用concat函数将所有列连接起来,生成新的DataFrame对象,并使用to_csv函数保存到新的csv文件中。
阅读全文