bs4爬取豆瓣电影top250的信息并制成csv
时间: 2024-12-02 08:15:38 浏览: 8
Python爬取豆瓣top250电影数据,并导入MySQL,写入excel
5星 · 资源好评率100%
在Python中,我们可以使用BeautifulSoup (bs4)库配合requests库来爬取豆瓣电影Top250的信息。首先,你需要安装这两个库,可以使用pip进行安装:
```bash
pip install beautifulsoup4 requests pandas
```
接下来,是一个简单的步骤指南:
1. 导入需要的库:
```python
import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd
```
2. 发送请求获取网页内容,这里以豆瓣电影Top250的HTML页面为例:
```python
url = "https://movie.douban.com/top250"
response = requests.get(url)
html_content = response.text
```
3. 使用BeautifulSoup解析HTML内容:
```python
soup = BeautifulSoup(html_content, 'lxml')
# 找到包含电影信息的部分,通常在<div id="content"></div>标签内
movie_list = soup.find('ol', {'class': 'grid_view'})
```
4. 遍历电影列表,提取每个电影的数据(如电影名、评分等):
```python
movies_data = []
for movie in movie_list.find_all('li'):
title = movie.h3.a['title']
rating = movie.strong.string
link = movie.h3.a['href']
movies_data.append([title, rating, link])
```
5. 将数据写入CSV文件:
```python
with open('douban_movies.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(['电影名称', '评分', '链接'])
writer.writerows(movies_data)
```
阅读全文