现有txt文件“电影信息.txt”,其中三列分别为电影名称、导演和演员表,之间用分号分隔(同一个电影可能会有多个演员,每个演员姓名之间使用逗号分隔;同一部电影也可能有多个导演,多个导演之间用逗号分隔) 使用openpyxl和matplotlab实现电影数据分析任务:任务四:导演作品统计及可视化 对电影数据中的导演信息进行统计并生成柱状图,具体要求如下: 1. 在“电影信息统计.xlsx”文件中新增一个名为“导演作品统计”的sheet,统计每个导演执导的电影数量和执导电影列表。 2. 第一列填写导演姓名,第二列填写导演执导电影数量,第三列填写导演执导电影列表。如果同一导演中有多部电影,则多部电影之间以逗号(,)分隔。 3. 绘制横向柱状图,以电影数量为横坐标,以导演姓名为纵坐标。纵坐标从上到下按电影数量降序排列。 4. 在每个柱状图顶部标注出与其合作最多的演员姓名,若有多个演员,则演员名称以逗号(,)分隔。
时间: 2024-02-15 21:03:59 浏览: 78
C++_C++_读取txt文档_txt内容读取_按空格分隔_按逗号分隔_
5星 · 资源好评率100%
好的,这是一个关于Python开发的数据分析任务。我会为您提供一些代码示例和思路:
首先,我们需要读取“电影信息.txt”文件,并将数据存储到Python的数据结构中,例如列表或字典等。我们可以使用Python内置的open函数来打开文件,然后使用readlines方法逐行读取数据。接下来,我们可以使用字符串的split方法和strip方法来将数据分割和清洗处理,最终将数据存储到一个字典中:
```python
movie_dict = {}
with open('电影信息.txt', 'r') as f:
for line in f.readlines():
line = line.strip()
movie_info = line.split(';')
movie_name = movie_info[0]
directors = movie_info[1].split(',')
actors = movie_info[2].split(',')
for director in directors:
if director not in movie_dict:
movie_dict[director] = {'movies': [movie_name], 'actors': {}}
else:
movie_dict[director]['movies'].append(movie_name)
for actor in actors:
for director in directors:
if actor not in movie_dict[director]['actors']:
movie_dict[director]['actors'][actor] = 1
else:
movie_dict[director]['actors'][actor] += 1
```
接下来,我们需要使用openpyxl库来创建一个新的Excel文件,并在其中创建一个名为“导演作品统计”的sheet。我们可以使用Python的for循环来遍历movie_dict字典中的数据,并将导演的名称、执导电影数量和执导电影列表写入到Excel文件中的相应单元格中:
```python
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws.title = '导演作品统计'
ws.cell(row=1, column=1, value='导演姓名')
ws.cell(row=1, column=2, value='执导电影数量')
ws.cell(row=1, column=3, value='执导电影列表')
row_num = 2
for director, data in movie_dict.items():
movie_list = ','.join(data['movies'])
movie_count = len(data['movies'])
ws.cell(row=row_num, column=1, value=director)
ws.cell(row=row_num, column=2, value=movie_count)
ws.cell(row=row_num, column=3, value=movie_list)
row_num += 1
wb.save('电影信息统计.xlsx')
```
最后,我们需要使用matplotlib库来绘制横向柱状图,并在柱状图顶部标注与其合作最多的演员姓名。我们可以使用Python的sorted函数来对导演的执导电影数量进行降序排列,然后使用for循环遍历排序后的导演数据,并将导演的名称和执导电影数量存储到两个列表中。接下来,我们可以使用matplotlib的barh函数来绘制横向柱状图,并使用annotate函数在柱状图顶部标注演员姓名:
```python
import matplotlib.pyplot as plt
director_list = []
movie_count_list = []
for director, data in sorted(movie_dict.items(), key=lambda x: len(x[1]['movies']), reverse=True):
director_list.append(director)
movie_count_list.append(len(data['movies']))
fig, ax = plt.subplots()
ax.barh(director_list, movie_count_list)
for i, director in enumerate(director_list):
actors = sorted(movie_dict[director]['actors'].items(), key=lambda x: x[1], reverse=True)
max_actor = actors[0][0]
ax.annotate(max_actor, xy=(movie_count_list[i], i), xytext=(movie_count_list[i]+1, i),
ha='left', va='center')
ax.set_xlabel('执导电影数量')
ax.set_ylabel('导演姓名')
ax.invert_yaxis()
plt.show()
```
完整代码如下:
```python
from openpyxl import Workbook
import matplotlib.pyplot as plt
# 读取数据并存储到字典中
movie_dict = {}
with open('电影信息.txt', 'r') as f:
for line in f.readlines():
line = line.strip()
movie_info = line.split(';')
movie_name = movie_info[0]
directors = movie_info[1].split(',')
actors = movie_info[2].split(',')
for director in directors:
if director not in movie_dict:
movie_dict[director] = {'movies': [movie_name], 'actors': {}}
else:
movie_dict[director]['movies'].append(movie_name)
for actor in actors:
for director in directors:
if actor not in movie_dict[director]['actors']:
movie_dict[director]['actors'][actor] = 1
else:
movie_dict[director]['actors'][actor] += 1
# 将数据写入Excel文件
wb = Workbook()
ws = wb.active
ws.title = '导演作品统计'
ws.cell(row=1, column=1, value='导演姓名')
ws.cell(row=1, column=2, value='执导电影数量')
ws.cell(row=1, column=3, value='执导电影列表')
row_num = 2
for director, data in movie_dict.items():
movie_list = ','.join(data['movies'])
movie_count = len(data['movies'])
ws.cell(row=row_num, column=1, value=director)
ws.cell(row=row_num, column=2, value=movie_count)
ws.cell(row=row_num, column=3, value=movie_list)
row_num += 1
wb.save('电影信息统计.xlsx')
# 绘制柱状图并标注演员姓名
director_list = []
movie_count_list = []
for director, data in sorted(movie_dict.items(), key=lambda x: len(x[1]['movies']), reverse=True):
director_list.append(director)
movie_count_list.append(len(data['movies']))
fig, ax = plt.subplots()
ax.barh(director_list, movie_count_list)
for i, director in enumerate(director_list):
actors = sorted(movie_dict[director]['actors'].items(), key=lambda x: x[1], reverse=True)
max_actor = actors[0][0]
ax.annotate(max_actor, xy=(movie_count_list[i], i), xytext=(movie_count_list[i]+1, i),
ha='left', va='center')
ax.set_xlabel('执导电影数量')
ax.set_ylabel('导演姓名')
ax.invert_yaxis()
plt.show()
```
希望我的回答能够对您有所帮助!
阅读全文