数据预处理的相关python代码
时间: 2023-07-23 21:29:54 浏览: 94
以下是数据预处理的相关Python代码:
1. 数据清洗:
```
import pandas as pd
import numpy as np
# 读取数据
data = pd.read_csv('data.csv')
# 删除缺失值
data.dropna(inplace=True)
# 删除重复值
data.drop_duplicates(inplace=True)
# 删除异常值
data = data[(data['score'] >= 0) & (data['score'] <= 100)]
```
2. 特征处理:
```
# 对类别型特征进行独热编码
data = pd.get_dummies(data, columns=['gender', 'major'])
# 对数值型特征进行归一化
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data[['age', 'score']] = scaler.fit_transform(data[['age', 'score']])
# 特征降维
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
data_pca = pca.fit_transform(data[['age', 'score']])
data['pca_1'] = data_pca[:, 0]
data['pca_2'] = data_pca[:, 1]
```
3. 数据集划分:
```
# 划分训练集、验证集、测试集
from sklearn.model_selection import train_test_split
train_data, val_data, train_label, val_label = train_test_split(data[['age', 'score', 'pca_1', 'pca_2']], data['label'], test_size=0.3, random_state=0)
val_data, test_data, val_label, test_label = train_test_split(val_data, val_label, test_size=0.5, random_state=0)
```
以上代码仅是示例,具体的数据预处理方法需要根据具体问题进行选择。
阅读全文