a. 使用matplotlib绘制电影数据集的柱状图,显示不同类型电影的票房和评分。
时间: 2024-06-09 08:08:37 浏览: 116
好的,需要先加载数据集。这里我们使用的是IMDb电影数据集。
```python
import pandas as pd
# 加载数据集
movies_df = pd.read_csv("https://datasets.imdbws.com/title.basics.tsv.gz", sep='\t', low_memory=False)
# 选取有票房数据的电影
movies_df = movies_df[movies_df["titleType"] == "movie"]
movies_df = movies_df[movies_df["isAdult"] == "0"]
movies_df = movies_df[movies_df["runtimeMinutes"] != "\\N"]
movies_df = movies_df[movies_df["genres"] != "\\N"]
movies_df = movies_df[movies_df["startYear"] != "\\N"]
movies_df = movies_df.dropna(subset=["tconst", "primaryTitle", "startYear", "runtimeMinutes"])
# 加载票房数据
gross_df = pd.read_csv("https://datasets.imdbws.com/title.ratings.tsv.gz", sep='\t', low_memory=False)
gross_df = gross_df[gross_df["numVotes"] >= 1000]
```
数据集加载完成后,我们可以根据电影类型计算票房和评分。
```python
# 合并数据集
movies_df = pd.merge(movies_df, gross_df, on="tconst")
# 计算票房和评分
movies_df["runtimeMinutes"] = movies_df["runtimeMinutes"].astype(int)
movies_df["averageRating"] = movies_df["averageRating"].astype(float)
movies_df["totalGross"] = movies_df["runtimeMinutes"] / 60 * movies_df["averageRating"] * 1000
# 按类型分组
genres = ["Action", "Comedy", "Drama", "Horror", "Romance", "Thriller"]
genre_data = []
for genre in genres:
genre_movies = movies_df[movies_df["genres"].str.contains(genre)]
total_gross = genre_movies["totalGross"].sum() / 1000000
average_rating = genre_movies["averageRating"].mean()
genre_data.append((genre, total_gross, average_rating))
```
现在我们已经得到了每种类型电影的票房和评分,接下来可以使用matplotlib绘制柱状图了。
```python
import matplotlib.pyplot as plt
# 绘制柱状图
fig, ax = plt.subplots(figsize=(12, 6))
ax.bar([x[0] for x in genre_data], [x[1] for x in genre_data], color="b", alpha=0.5)
ax2 = ax.twinx()
ax2.plot([x[0] for x in genre_data], [x[2] for x in genre_data], color="r", alpha=0.5, linewidth=3)
ax.set_ylabel("Total Gross (Millions)")
ax2.set_ylabel("Average Rating")
ax.set_xlabel("Genre")
plt.show()
```
这样就可以得到如下的柱状图了。
![柱状图](https://i.imgur.com/pj5ZKk8.png)
阅读全文