编写一个python程序,爬取豆瓣平台的电影信息,并进行数据分析,然后通过可视化界面展示。
时间: 2024-03-07 22:51:26 浏览: 150
基于Pythonweb的豆瓣电影数据分析及可视化系统
好的,下面是一个爬取豆瓣电影信息并进行数据分析并通过可视化界面展示的Python程序:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
import tkinter as tk
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
# 爬取豆瓣电影TOP250的信息
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
# 解析网页内容
soup = BeautifulSoup(response.text, 'html.parser')
movies = soup.select('.item')
# 存储电影信息
movie_list = []
for movie in movies:
name = movie.select_one('.title').text.strip()
rating = movie.select_one('.rating_num').text.strip()
director = movie.select_one('.bd p:nth-of-type(1)').text.strip().split(' ')[1]
actors = movie.select_one('.bd p:nth-of-type(1)').text.strip().split(' ')[2:]
actors = '/'.join(actors)
year = movie.select_one('.bd p:nth-of-type(1)').text.strip().split(' ')[-2]
country = movie.select_one('.bd p:nth-of-type(1)').text.strip().split(' ')[-1].replace('(', '').replace(')', '')
summary = movie.select_one('.bd p:nth-of-type(2)').text.strip()
movie_list.append({'name': name, 'rating': float(rating), 'director': director, 'actors': actors,
'year': year, 'country': country, 'summary': summary})
# 将电影信息存储到CSV文件
movies_df = pd.DataFrame(movie_list)
movies_df.to_csv('movies.csv', index=False)
# 对电影评分进行统计分析
rating_counts = movies_df['rating'].value_counts()
rating_counts = rating_counts.reset_index()
rating_counts.columns = ['rating', 'count']
rating_counts = rating_counts.sort_values('rating')
# 对电影导演进行统计分析
director_counts = movies_df['director'].value_counts()[:10]
director_counts = director_counts.reset_index()
director_counts.columns = ['director', 'count']
director_counts = director_counts.sort_values('count', ascending=False)
# 创建GUI界面
root = tk.Tk()
root.title('Douban Movie Analysis')
# 创建电影评分的图表
fig1 = plt.figure(figsize=(6, 4), dpi=100)
ax1 = fig1.add_subplot(111)
ax1.bar(rating_counts['rating'], rating_counts['count'])
ax1.set_title('Douban Movie Rating Counts')
ax1.set_xlabel('Rating')
ax1.set_ylabel('Count')
canvas1 = FigureCanvasTkAgg(fig1, master=root)
canvas1.draw()
canvas1.get_tk_widget().pack(side=tk.TOP, fill=tk.BOTH, expand=1)
# 创建电影导演的图表
fig2 = plt.figure(figsize=(6, 4), dpi=100)
ax2 = fig2.add_subplot(111)
ax2.bar(director_counts['director'], director_counts['count'])
ax2.set_title('Top 10 Movie Directors in Douban TOP250')
ax2.set_xlabel('Director')
ax2.set_ylabel('Count')
canvas2 = FigureCanvasTkAgg(fig2, master=root)
canvas2.draw()
canvas2.get_tk_widget().pack(side=tk.TOP, fill=tk.BOTH, expand=1)
root.mainloop()
```
这个程序使用了Requests库和BeautifulSoup库来爬取豆瓣电影TOP250的信息,并使用Pandas库对电影评分和导演进行统计分析。接着,使用Matplotlib库将统计结果绘制成图表,并使用Tkinter库创建一个GUI界面,将图表展示出来。
注意:在运行程序之前,需要安装所需要的库,如Requests、BeautifulSoup、Pandas、Matplotlib和Tkinter等。
阅读全文