pycharm爬取豆瓣电影250可视化代码并存入csv
时间: 2023-07-19 15:42:24 浏览: 149
好的,下面是完整的爬取豆瓣电影Top250并可视化的代码,并将数据存入csv文件中:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
# 请求URL
url = 'https://movie.douban.com/top250'
# 得到页面的内容
def get_page_content(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
page = requests.get(url, headers=headers).content
return page
# 解析页面内容,获取电影信息
def parse_page_content(page):
soup = BeautifulSoup(page, 'html.parser')
movie_list_soup = soup.find('ol', attrs={'class': 'grid_view'})
movie_name_list = []
movie_rating_list = []
movie_info_list = []
for movie_li in movie_list_soup.find_all('li'):
detail = movie_li.find('div', attrs={'class': 'hd'})
movie_name = detail.find('span', attrs={'class': 'title'}).getText()
movie_rating = movie_li.find('span', attrs={'class': 'rating_num'}).getText()
movie_info = detail.find('span', attrs={'class': 'inq'}).getText() if detail.find('span', attrs={'class': 'inq'}) else ''
movie_name_list.append(movie_name)
movie_rating_list.append(movie_rating)
movie_info_list.append(movie_info)
return movie_name_list, movie_rating_list, movie_info_list
# 爬取豆瓣电影Top250
def get_movie_top250(url):
movie_name_list = []
movie_rating_list = []
movie_info_list = []
for i in range(0, 10):
page = get_page_content(url + '?start=' + str(i * 25))
movie_name, movie_rating, movie_info = parse_page_content(page)
movie_name_list += movie_name
movie_rating_list += movie_rating
movie_info_list += movie_info
# 将数据存入csv文件
data = {'电影名称': movie_name_list, '电影评分': movie_rating_list, '电影简介': movie_info_list}
df = pd.DataFrame(data)
df.to_csv('douban_movie_top250.csv', index=False, encoding='utf-8-sig')
# 数据可视化
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.figure(figsize=(10, 10))
plt.scatter(movie_rating_list, movie_name_list, s=100)
plt.xlabel('电影评分')
plt.ylabel('电影名称')
plt.title('豆瓣电影Top250')
plt.savefig('douban_movie_top250.png', dpi=300)
print('豆瓣电影Top250数据已保存到douban_movie_top250.csv,图片已保存到douban_movie_top250.png')
if __name__ == '__main__':
get_movie_top250(url)
```
如果想要运行代码,需要安装以下库:
- requests
- BeautifulSoup
- pandas
- matplotlib
可使用以下命令安装:
```
pip install requests
pip install beautifulsoup4
pip install pandas
pip install matplotlib
```
代码执行完毕后,会在当前目录下生成douban_movie_top250.csv和douban_movie_top250.png文件,分别是豆瓣电影Top250的数据和数据可视化结果。
阅读全文