帮我做一份完整的pandas的睡眠质量数据分析代码
时间: 2023-08-11 21:04:42 浏览: 114
当然可以。以下是一个针对睡眠质量数据分析的示例代码:
首先,导入需要使用的库:
```python
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
```
然后,读入数据并进行初步的数据探索:
```python
df = pd.read_csv("sleep.csv")
# 查看前几行数据
print(df.head())
# 查看数据类型和缺失值
print(df.info())
# 查看各个字段的描述性统计
print(df.describe())
```
接下来,我们对数据进行清洗和预处理:
```python
# 处理缺失值
df.dropna(inplace=True)
# 处理不合法的数据
df = df[df['Minutes Asleep'] >= 0]
df = df[df['Minutes Awake'] >= 0]
# 增加一列总睡眠时间
df['Total Minutes'] = df['Minutes Asleep'] + df['Minutes Awake']
# 增加一列睡眠效率
df['Sleep Efficiency'] = df['Minutes Asleep'] / df['Total Minutes']
# 将日期时间格式转换为datetime类型,并提取出小时和日期
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])
df['Date'] = df['Start Time'].dt.date
df['Hour'] = df['Start Time'].dt.hour
```
接下来,我们可以进行一些简单的可视化分析:
```python
# 统计每天的总睡眠时间和睡眠效率,并绘制折线图
daily_sleep = df.groupby('Date')['Total Minutes', 'Sleep Efficiency'].mean()
daily_sleep.plot(figsize=(10,5))
plt.title('Daily Sleep')
plt.xlabel('Date')
plt.ylabel('Minutes')
plt.show()
# 查看每小时的睡眠效率,并绘制箱线图
hourly_sleep = df.groupby('Hour')['Sleep Efficiency'].mean()
sns.boxplot(x=df['Hour'], y=df['Sleep Efficiency'])
plt.title('Hourly Sleep Efficiency')
plt.xlabel('Hour')
plt.ylabel('Sleep Efficiency')
plt.show()
```
最后,我们可以对数据进行进一步的分析,比如使用机器学习算法建立睡眠质量的预测模型:
```python
# 划分训练集和测试集
X = df[['Total Minutes', 'Minutes Awake', 'Hour']]
y = df['Sleep Efficiency']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 使用线性回归模型进行预测
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# 评估模型的性能
from sklearn.metrics import mean_squared_error, r2_score
print("MSE:", mean_squared_error(y_test, y_pred))
print("R2 score:", r2_score(y_test, y_pred))
```
以上就是一个使用pandas进行睡眠质量数据分析的示例代码。当然,具体的分析问题和方法会因数据集而异,需要根据实际情况进行调整。
阅读全文