帮我写一份R语言的电影受欢迎分析
时间: 2023-10-14 10:04:24 浏览: 137
好的,以下是一份简单的 R 语言电影受欢迎分析的代码:
```R
# 加载必要的包
library(tidyverse)
library(ggplot2)
library(scales)
# 读取数据
movies <- read_csv("movies.csv")
# 数据清洗
movies_clean <- movies %>%
# 删除无用列
select(-c("budget", "homepage", "id", "keywords", "original_language",
"original_title", "overview", "spoken_languages", "status",
"tagline", "title", "production_companies", "production_countries",
"release_date", "revenue", "runtime", "vote_count")) %>%
# 删除缺失值
drop_na()
# 对电影评分进行分析
movies_clean %>%
# 根据评分进行排序
arrange(desc(vote_average)) %>%
# 选取前10个电影
slice(1:10) %>%
# 画图
ggplot(aes(x = reorder(title, vote_average), y = vote_average)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Movies by Rating", x = "Movie Title", y = "Rating") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_y_continuous(limits = c(7, 9), expand = c(0, 0)) +
theme(axis.text = element_text(size = 10), axis.title = element_text(size = 12))
# 对电影流行度进行分析
movies_clean %>%
# 根据流行度进行排序
arrange(desc(popularity)) %>%
# 选取前10个电影
slice(1:10) %>%
# 画图
ggplot(aes(x = reorder(title, popularity), y = popularity)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Movies by Popularity", x = "Movie Title", y = "Popularity") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_y_continuous(expand = c(0, 0), labels = comma) +
theme(axis.text = element_text(size = 10), axis.title = element_text(size = 12))
# 对电影类型进行分析
movies_clean %>%
# 将电影类型拆分成多行
separate_rows(genres, sep = "\\|") %>%
# 统计每种类型的电影数目
group_by(genres) %>%
summarise(count = n()) %>%
# 根据电影数目进行排序
arrange(desc(count)) %>%
# 画图
ggplot(aes(x = reorder(genres, count), y = count)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Number of Movies by Genre", x = "Genre", y = "Number of Movies") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_y_continuous(expand = c(0, 0), labels = comma) +
theme(axis.text = element_text(size = 10), axis.title = element_text(size = 12))
```
这份代码首先加载了必要的包,然后读取了一个名为 `movies.csv` 的电影数据集。接着对数据进行了清洗,删除了无用列和缺失值。然后对电影评分、流行度和类型进行了分析,并画出对应的图表。最终可以得到一份电影受欢迎程度的分析报告。
阅读全文