python爬取豆瓣电影top250存入csv并数据可视化
时间: 2023-08-05 22:50:25 浏览: 73
python爬取电影Top250数据并进行可视化分析.zip
5星 · 资源好评率100%
好的,我可以为您提供一个大致的步骤:
1. 导入必要的库
```python
import requests
from bs4 import BeautifulSoup
import csv
import matplotlib.pyplot as plt
```
2. 发送HTTP请求并解析HTML
```python
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
```
3. 提取电影信息并存入CSV文件
```python
movies = soup.find_all('div', class_='info')
with open('douban_top250.csv', 'w', newline='', encoding='utf-8') as csvfile:
fieldnames = ['rank', 'title', 'star', 'quote']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for i, movie in enumerate(movies):
title = movie.find('span', class_='title').text
star = movie.find('span', class_='rating_num').text
quote = movie.find('span', class_='inq').text if movie.find('span', class_='inq') else ''
writer.writerow({'rank': i+1, 'title': title, 'star': star, 'quote': quote})
```
4. 数据可视化
```python
import pandas as pd
df = pd.read_csv('douban_top250.csv')
df['star'] = df['star'].astype(float)
df.plot(kind='scatter', x='star', y='rank')
plt.show()
```
这样就可以将数据可视化为散点图了,其中x轴为评分,y轴为排名。
希望这些能够帮到您。
阅读全文