df['timestamp'] = pd.to_datetime(df.loc[:, 'timestamp’], unit='ms', origin="1970-01-0108:00:00") # 把时间转成想要的字符串格式 df['timestamp'] = df['timestamp'].dt.strftime('%Y%m%d’) # 写入csv文件 df.to_csv('SH603019.csv', index=False)
这段代码是将获取到的帖子信息存入到一个名为“df”的DataFrame对象中,并对其中的时间戳进行处理。将时间戳转换为以毫秒为单位的时间格式,并指定时间起点为“1970-01-01 08:00:00”。然后将时间戳转换为想要的字符串格式,即“年月日”的形式。最后将DataFrame对象写入到一个名为“SH603019.csv”的CSV格式文件中,其中不包括索引列。
为以下代码注释import pandas as pd import numpy as np from matplotlib import pyplot as plt df = pd.read_csv("./911.csv") df["timeStamp"] = pd.to_datetime(df["timeStamp"]) temp_list = df["title"].str.split(": ").tolist() cate_list = [i[0] for i in temp_list] # print(np.array(cate_list).reshape((df.shape[0],1))) df["cate"] = pd.DataFrame(np.array(cate_list).reshape((df.shape[0], 1))) df.set_index("timeStamp", inplace=True) print(df.head(1)) plt.figure(figsize=(20, 8), dpi=80) for group_name, group_data in df.groupby(by="cate"): # 对不同的分类都进行绘图 count_by_month = group_data.resample("M").count()["title"] _x = count_by_month.index print(_x) _y = count_by_month.values _x = [i.strftime("%Y%m%d") for i in _x] plt.plot(range(len(_x)), _y, label=group_name) plt.xticks(range(len(_x)), _x, rotation=45) plt.legend(loc="best") plt.show()
# 导入pandas和numpy模块
import pandas as pd
import numpy as np
# 导入pyplot模块
from matplotlib import pyplot as plt
# 读取csv文件
df = pd.read_csv("./911.csv")
# 将时间戳转换为日期时间格式
df["timeStamp"] = pd.to_datetime(df["timeStamp"])
# 将title列按照冒号分割,并转换为列表
temp_list = df["title"].str.split(": ").tolist()
# 取出列表中每个元素的第一个值,即为该条数据的分类
cate_list = [i[0] for i in temp_list]
# 将分类作为新的一列添加到df中
df["cate"] = pd.DataFrame(np.array(cate_list).reshape((df.shape[0], 1)))
# 将时间戳列设置为df的索引
df.set_index("timeStamp", inplace=True)
# 输出df的第一行数据
print(df.head(1))
# 绘制图表
plt.figure(figsize=(20, 8), dpi=80)
for group_name, group_data in df.groupby(by="cate"):
# 对不同的分类都进行绘图
count_by_month = group_data.resample("M").count()["title"]
_x = count_by_month.index
_y = count_by_month.values
# 将日期格式化为年月日形式
_x = [i.strftime("%Y%m%d") for i in _x]
plt.plot(range(len(_x)), _y, label=group_name)
# 设置x轴刻度和标签
plt.xticks(range(len(_x)), _x, rotation=45)
# 添加图例并显示图表
plt.legend(loc="best")
plt.show()
这段代码主要实现的功能是读取名为“911.csv”的文件,并对其中的数据进行分析和可视化。首先使用pandas库中的read_csv函数读取csv文件,然后将时间戳转换为日期时间格式,并将title列按照冒号分割为列表。接着,将每个列表元素的第一个值作为该条数据的分类,并将分类作为新的一列添加到df中。然后将时间戳列设置为df的索引,方便后续的时序分析。最后,使用matplotlib库中的pyplot模块绘制图表,对不同分类的数据分别进行时序分析,并将结果可视化。
/var/folders/gk/ryl0f4y10m9ccnhw_1vlpjzh0000gn/T/ipykernel_35021/1920266051.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy device_df['cluster_label'] = db.labels_ /var/folders/gk/ryl0f4y10m9ccnhw_1vlpjzh0000gn/T/ipykernel_35021/1920266051.py:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy device_df['hour'] = device_df['timestamp'].map(lambda x: time.localtime(x).tm_hour) /var/folders/gk/ryl0f4y10m9ccnhw_1vlpjzh0000gn/T/ipykernel_35021/1920266051.py:9: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy device_df['is_night'] = device_df['hour'].map(lambda x: 1 if x >= 22 or x < 6 else 0) /var/folders/gk/ryl0f4y10m9ccnhw_1vlpjzh0000gn/T/ipykernel_35021/1920266051.py:10: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy device_df['is_daytime'] = device_df['hour'].map(lambda x: 1 if x >= 10 or x < 17 else 0) /var/folders/gk/ryl0f4y10m9ccnhw_1vlpjzh0000gn/T/ipykernel_35021/1920266051.py:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy device_df['is_weekend'] = device_df['timestamp'].map(lambda x: 1 if datetime.datetime.utcfromtimestamp(x).weekday() >= 5 else 0) /var/folders/gk/ryl0f4y10m9ccnhw_1vlpjzh0000gn/T/ipykernel_35021/1920266051.py:18: UserWarning: Boolean Series key will be reindexed to match DataFrame index. night_cnt = device_cluster_df[device_df['is_night'] == 1]['event_day'].drop_duplicates().count() /var/folders/gk/ryl0f4y10m9ccnhw_1vlpjzh0000gn/T/ipykernel_35021/1920266051.py:19: UserWarning: Boolean Series key will be reindexed to match DataFrame index. daytime_cnt = device_cluster_df[device_df['is_daytime'] == 1]['event_day'].drop_duplicates().count() /var/folders/gk/ryl0f4y10m9ccnhw_1vlpjzh0000gn/T/ipykernel_35021/1920266051.py:20: UserWarning: Boolean Series key will be reindexed to match DataFrame index. weekend_cnt = device_cluster_df[device_df['is_weekend'] == 1]['event_day'].drop_duplicates().count() /var/folders/gk/ryl0f4y10m9ccnhw_1vlpjzh0000gn/T/ipykernel_35021/1920266051.py:21: UserWarning: Boolean Series key will be reindexed to match DataFrame index. weekday_cnt = device_cluster_df[device_df['is_weekend'] == 0]['event_day'].drop_duplicates().count()jupyter notebook出现这段报错的原因
这段报错是由于在对 DataFrame 进行操作时,使用了不安全的方法。具体来说,这段报错是由于对 DataFrame 进行切片操作时,尝试在切片的副本上进行赋值操作,而不是在原始的 DataFrame 上进行赋值操作。为了避免这个警告和潜在的错误,建议使用.loc
方法来进行赋值操作,例如使用device_df.loc[row_indexer, col_indexer] = value
。这样可以确保在原始 DataFrame 上进行操作,而不是在切片的副本上进行操作。你可以参考报错信息中提供的链接,了解更多关于如何正确使用 DataFrame 的索引和赋值操作的信息。