我的四个字段,分别为年份,电影编号,电影名字,评价平均分,现在我有一个需求,按照年份进行分组。计算每部电影的平均评分,平均评分保留小数点后一位,并按评分大小进行排序,评分一样,按照电影名排序。相同年份的输出到一个文件中
时间: 2023-10-21 20:03:33 浏览: 203
好的,这是一个数据处理的任务。您可以使用Python来完成这个任务。首先,您需要读取包含电影数据的文件。假设数据保存在一个名为“movies.csv”的CSV文件中,您可以使用以下代码读取它:
```python
import csv
# 读取CSV文件
with open('movies.csv', newline='', encoding='utf-8') as csvfile:
reader = csv.reader(csvfile)
# 跳过标题行
next(reader)
# 遍历每一行数据
for row in reader:
# 处理数据
pass
```
然后,您需要将电影数据按照年份进行分组,并计算每部电影的平均评分。您可以使用Python的字典来实现这个功能。代码可能如下所示:
```python
import csv
# 读取CSV文件
with open('movies.csv', newline='', encoding='utf-8') as csvfile:
reader = csv.reader(csvfile)
# 跳过标题行
next(reader)
# 创建一个字典来保存每个年份的电影数据
movies_by_year = {}
# 遍历每一行数据
for row in reader:
# 解析数据
year = int(row[0])
movie_id = int(row[1])
movie_name = row[2]
rating = float(row[3])
# 将电影数据按照年份进行分组
if year not in movies_by_year:
movies_by_year[year] = {}
# 计算每部电影的平均评分
if movie_id not in movies_by_year[year]:
movies_by_year[year][movie_id] = {'name': movie_name, 'ratings': [rating]}
else:
movies_by_year[year][movie_id]['ratings'].append(rating)
# 遍历每个年份的电影数据
for year in movies_by_year:
# 计算每部电影的平均评分
for movie_id in movies_by_year[year]:
ratings = movies_by_year[year][movie_id]['ratings']
avg_rating = round(sum(ratings) / len(ratings), 1)
movies_by_year[year][movie_id]['avg_rating'] = avg_rating
```
最后,您需要将结果按照评分大小进行排序,并输出到文件中。您可以使用Python的sorted()函数来排序,代码可能如下所示:
```python
import csv
# 读取CSV文件
with open('movies.csv', newline='', encoding='utf-8') as csvfile:
reader = csv.reader(csvfile)
# 跳过标题行
next(reader)
# 创建一个字典来保存每个年份的电影数据
movies_by_year = {}
# 遍历每一行数据
for row in reader:
# 解析数据
year = int(row[0])
movie_id = int(row[1])
movie_name = row[2]
rating = float(row[3])
# 将电影数据按照年份进行分组
if year not in movies_by_year:
movies_by_year[year] = {}
# 计算每部电影的平均评分
if movie_id not in movies_by_year[year]:
movies_by_year[year][movie_id] = {'name': movie_name, 'ratings': [rating]}
else:
movies_by_year[year][movie_id]['ratings'].append(rating)
# 遍历每个年份的电影数据
for year in movies_by_year:
# 计算每部电影的平均评分
for movie_id in movies_by_year[year]:
ratings = movies_by_year[year][movie_id]['ratings']
avg_rating = round(sum(ratings) / len(ratings), 1)
movies_by_year[year][movie_id]['avg_rating'] = avg_rating
# 按照评分大小进行排序
sorted_movies = sorted(movies_by_year[year].values(), key=lambda x: (-x['avg_rating'], x['name']))
# 输出到文件中
with open(f'{year}.txt', 'w', encoding='utf-8') as outfile:
for movie in sorted_movies:
outfile.write(f"{movie['name']},{movie['avg_rating']}\n")
```
这段代码将结果按照年份写入不同的文本文件中,文件名为年份,文件内容为每部电影的名字和平均评分,按照评分大小进行排序。
阅读全文