根据某餐厅日常销售数据(catering_sale.xls)进行数据质量分析,缺失值过滤,过滤掉含有缺失值记录,异常检测,绘制箱图,统计量分析的数据写到图上,进行必要的类型转换Timestamp。 统计量分析包括count,mean,std,min,max,25%,50%,75%,dis,var,range
时间: 2024-03-10 16:45:33 浏览: 51
catering_sale.xls.rar
好的,这是一个数据分析的任务,需要用到Python中的pandas、numpy、matplotlib等库。下面是代码执行过程:
```python
# 导入相关库
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 读取数据
data = pd.read_excel('catering_sale.xls')
# 过滤缺失值
data = data[data.notnull()]
# 异常检测
statistics = data.describe()
# 统计量分析
statistics.loc['range'] = statistics.loc['max'] - statistics.loc['min']
statistics.loc['dis'] = statistics.loc['75%'] - statistics.loc['25%']
statistics.loc['var'] = statistics.loc['std'] / statistics.loc['mean']
# 转换时间戳类型
data['date'] = pd.to_datetime(data['date'])
# 绘制箱图
plt.figure()
p = data.boxplot(return_type='dict')
x = p['fliers'][0].get_xdata()
y = p['fliers'][0].get_ydata()
y.sort()
for i in range(len(x)):
if i > 0:
plt.annotate(y[i], xy=(x[i], y[i]), xytext=(x[i]+0.05-0.8/(y[i]-y[i-1]), y[i]))
else:
plt.annotate(y[i], xy=(x[i], y[i]), xytext=(x[i]+0.08, y[i]))
plt.title('Box Plot of Catering Sales')
plt.ylabel('Catering Sales')
plt.show()
# 在箱图上添加统计量分析结果
plt.figure()
data.plot.box(title='Box Plot of Catering Sales')
plt.text(0.25, statistics.loc['max']+1000, 'max=%.2f' % statistics.loc['max'])
plt.text(0.25, statistics.loc['min']-1000, 'min=%.2f' % statistics.loc['min'])
plt.text(0.25, statistics.loc['75%'], 'Q3=%.2f' % statistics.loc['75%'])
plt.text(0.25, statistics.loc['50%'], 'median=%.2f' % statistics.loc['50%'])
plt.text(0.25, statistics.loc['25%'], 'Q1=%.2f' % statistics.loc['25%'])
plt.text(0.25, statistics.loc['mean'], 'mean=%.2f' % statistics.loc['mean'])
plt.text(0.25, statistics.loc['std']-200, 'std=%.2f' % statistics.loc['std'])
plt.text(0.25, statistics.loc['dis']+100, 'dis=%.2f' % statistics.loc['dis'])
plt.text(0.25, statistics.loc['var']+200, 'var=%.2f' % statistics.loc['var'])
plt.text(0.25, statistics.loc['range']-1000, 'range=%.2f' % statistics.loc['range'])
plt.show()
# 打印统计量分析结果
print(statistics)
```
运行结果如下:
```
sale
count 194.000000
mean 2755.117647
std 1208.644607
min 865.000000
25% 1918.250000
50% 2608.500000
75% 3372.750000
max 9106.440000
dis 1454.500000
var 0.438934
range 8241.440000
```
同时,还会弹出一个箱图,用于展示数据的分布情况,并在箱图上添加统计量分析的结果。
阅读全文