使用数据挖掘实例python实现
时间: 2023-07-31 21:04:40 浏览: 138
常用数据挖掘算法总结及Python实现.pdf
以下是使用数据挖掘技术的应用实例,并提供使用Python实现的示例代码:
1. 电商网站用户购买预测
示例代码:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# 读取用户行为数据
user_behavior = pd.read_csv('user_behavior.csv', header=None, names=['user_id', 'item_id', 'behavior_type', 'timestamp'])
# 转化特征
user_behavior['hour'] = pd.to_datetime(user_behavior['timestamp'], unit='s').dt.hour
feature_data = user_behavior[['user_id', 'item_id', 'hour']]
label_data = user_behavior['behavior_type'].apply(lambda x: 1 if x == 4 else 0)
# 划分数据集
train_feature, test_feature, train_label, test_label = train_test_split(feature_data, label_data, test_size=0.2)
# 训练模型
model = DecisionTreeClassifier()
model.fit(train_feature, train_label)
# 预测并评估模型
test_predict = model.predict(test_feature)
score = accuracy_score(test_label, test_predict)
print('Accuracy:', score)
```
2. 社交媒体用户分类
示例代码:
```python
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
# 读取社交媒体数据
social_media_data = pd.read_csv('social_media_data.csv')
# 提取特征
vectorizer = TfidfVectorizer(stop_words='english')
feature_data = vectorizer.fit_transform(social_media_data['text'])
# 使用KMeans算法进行聚类
kmeans = KMeans(n_clusters=5)
kmeans.fit(feature_data)
# 输出每个用户所属的类别
for index, label in enumerate(kmeans.labels_):
print('User', index, 'belongs to cluster', label)
```
3. 医疗数据异常检测
示例代码:
```python
import pandas as pd
import seaborn as sns
from sklearn.ensemble import IsolationForest
# 读取医疗数据
medical_data = pd.read_csv('medical_data.csv')
# 使用Isolation Forest算法检测异常值
clf = IsolationForest()
clf.fit(medical_data[['age', 'income']])
medical_data['is_outlier'] = clf.predict(medical_data[['age', 'income']])
# 绘制散点图并标记异常值
sns.scatterplot(x='age', y='income', data=medical_data, hue='is_outlier')
plt.title('Outlier Detection')
plt.show()
```
4. 金融数据预测
示例代码:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# 读取金融数据
financial_data = pd.read_csv('financial_data.csv')
# 转化特征
feature_data = financial_data[['GDP', 'unemployment_rate', 'inflation_rate']]
label_data = financial_data['stock_price']
# 划分数据集
train_feature, test_feature, train_label, test_label = train_test_split(feature_data, label_data, test_size=0.2)
# 训练模型
model = LinearRegression()
model.fit(train_feature, train_label)
# 预测并评估模型
test_predict = model.predict(test_feature)
mse = mean_squared_error(test_label, test_predict)
print('MSE:', mse)
```
5. 交通数据可视化
示例代码:
```python
import pandas as pd
import folium
# 读取交通数据
traffic_data = pd.read_csv('traffic_data.csv')
# 创建地图对象
map = folium.Map(location=[traffic_data['latitude'].mean(), traffic_data['longitude'].mean()], zoom_start=12)
# 添加标记点
for index, row in traffic_data.iterrows():
folium.CircleMarker(location=[row['latitude'], row['longitude']], radius=row['speed']/10, color='red', fill=True, fill_color='red').add_to(map)
# 显示地图
map
```
以上是使用数据挖掘技术的应用实例,并提供了使用Python实现的示例代码。这些代码可以作为学习和实践的参考。
阅读全文