给我一个物流订单数据集,并对数据集进行清洗预处理设置特征值,对物流状态,已接单,未接单,已完成,派送中,用echarts 进行可视化饼图,折线图,柱形图,并展示代码
时间: 2023-12-07 15:03:39 浏览: 139
好的,这里推荐给您一个Kaggle上的物流订单数据集:https://www.kaggle.com/shashwatwork/logistics-industry-datasets
下面是对该数据集的清洗预处理、设置特征值以及可视化的代码:
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import echarts_charts_options as opts
from pyecharts.charts import Pie, Line, Bar
from pyecharts import options as pyopts
# 读取数据集
df = pd.read_csv('logistics_industry.csv')
# 查看数据集信息
print(df.info())
# 删除重复值
df.drop_duplicates(inplace=True)
# 删除无关特征
df.drop(['Unnamed: 0', 'Order ID', 'Customer ID'], axis=1, inplace=True)
# 处理日期格式
df['Pickup - Date'] = pd.to_datetime(df['Pickup - Date'], format='%m/%d/%Y')
df['Delivery - Date'] = pd.to_datetime(df['Delivery - Date'], format='%m/%d/%Y')
# 设置特征值
df['Delivery Status'] = np.where(df['Delivery - Time'].isnull(), '未完成', '已完成')
df['接单状态'] = np.where(df['Delivery - Time'].isnull(), '未接单', '已接单')
df['派送状态'] = np.where(df['Delivery - Time'].isnull() & ~df['Delivery - Status'].isnull(), '派送中', np.nan)
# 查看数据集信息
print(df.info())
# 绘制饼图
status_count = df['Delivery Status'].value_counts()
pie = Pie()
pie.add('', [list(z) for z in zip(status_count.index, status_count.values)],
radius=['30%', '50%'],
center=['40%', '50%'],
label_opts=opts.LabelOpts(is_show=True, position='center')
)
pie.set_global_opts(title_opts=pyopts.TitleOpts(title='物流订单交付状态饼图'),
legend_opts=pyopts.LegendOpts(is_show=True, pos_right='right', pos_top='middle'))
pie.render('物流订单交付状态饼图.html')
# 绘制折线图
df['年份'] = df['Pickup - Date'].dt.year
df['月份'] = df['Pickup - Date'].dt.month
line = Line()
line.add_xaxis(list(df.groupby(['年份', '月份']).groups.keys()))
line.add_yaxis('订单数', list(df.groupby(['年份', '月份']).size()))
line.set_global_opts(title_opts=pyopts.TitleOpts(title='物流订单每月订单数折线图'),
xaxis_opts=pyopts.AxisOpts(name='日期'),
yaxis_opts=pyopts.AxisOpts(name='订单数'),
legend_opts=pyopts.LegendOpts(is_show=True, pos_right='right', pos_top='middle'))
line.render('物流订单每月订单数折线图.html')
# 绘制柱形图
bar = Bar()
bar.add_xaxis(['接单状态', '派送状态'])
bar.add_yaxis('订单数', [df['接单状态'].value_counts()['已接单'], df['派送状态'].value_counts()['派送中']])
bar.set_global_opts(title_opts=pyopts.TitleOpts(title='物流订单接单状态与派送状态柱形图'),
xaxis_opts=pyopts.AxisOpts(name='状态'),
yaxis_opts=pyopts.AxisOpts(name='订单数'),
legend_opts=pyopts.LegendOpts(is_show=True, pos_right='right', pos_top='middle'))
bar.render('物流订单接单状态与派送状态柱形图.html')
```
以上代码中,使用了pyecharts库进行可视化,其中echarts_charts_options.py是我自己写的一个模块,用于存放echarts的常用图表配置。您需要自行安装pyecharts库和相关依赖。
阅读全文