6. 从原始数据集中生成一个新数据表df_loc并显示前5行。该表有以下属性:station id(它的数据来自start station id和end station id)、station name(它的数据来自start station name和end station name)、station latitude(它的数据来自start station latitude和end station latitude)、station longitude(它的数据来自start station longitude和end station longitude)(提示:转换后会出现重复记录,可用dataframe.drop_duplicates(subset=‘列1’) 表示删除dataframe中列1中有重复的值所在的行并保留重复值中第一行)
时间: 2023-06-09 18:05:05 浏览: 251
数据结构考研试题数组和广义表(有答案).pdf
好的,以下是回答:
首先要将原始数据集中的start station id、end station id、start station name、end station name、start station latitude、end station latitude、start station longitude、end station longitude这几个属性获取出来。然后,将start station id和end station id合并成一个新的station id属性,将start station name和end station name合并成一个新的station name属性,将start station latitude和end station latitude合并成一个新的station latitude属性,将start station longitude和end station longitude合并成一个新的station longitude属性。接着,按照station id排序去除重复记录并保留第一行,得到新的数据表df_loc。
具体的实现方法如下:
```
# 读取原始数据集
import pandas as pd
df = pd.read_csv("citibike_tripdata.csv")
# 获取需要的属性
start_id = df["start station id"]
end_id = df["end station id"]
start_name = df["start station name"]
end_name = df["end station name"]
start_lat = df["start station latitude"]
end_lat = df["end station latitude"]
start_lon = df["start station longitude"]
end_lon = df["end station longitude"]
# 合并属性
station_id = pd.concat([start_id, end_id], ignore_index=True)
station_name = pd.concat([start_name, end_name], ignore_index=True)
station_lat = pd.concat([start_lat, end_lat], ignore_index=True)
station_lon = pd.concat([start_lon, end_lon], ignore_index=True)
# 将属性合并成一个新的数据表
df_loc = pd.DataFrame({"station id": station_id, "station name": station_name,
"station latitude": station_lat, "station longitude": station_lon})
# 按照station id排序并去重
df_loc = df_loc.sort_values(by=["station id"])
df_loc = df_loc.drop_duplicates(subset=["station id"], keep="first")
# 显示前5行
print(df_loc.head(5))
```
阅读全文