python爬虫爬取豆瓣读书top250书评
时间: 2023-11-06 21:08:22 浏览: 228
python爬取豆瓣top250
5星 · 资源好评率100%
Python爬虫可以用于爬取豆瓣读书Top250书评。具体步骤如下:
1. 导入所需的库,包括requests、beautifulsoup和pandas。
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
```
2. 构造请求头和URL,使用requests库发送请求并获取响应。
```python
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
url = 'https://book.douban.com/top250'
response = requests.get(url, headers=headers)
```
3. 使用beautifulsoup库解析响应内容,获取书籍名称、作者、评分和书评等信息。
```python
soup = BeautifulSoup(response.text, 'html.parser')
book_list = soup.find('div', {'class': 'article'}).find_all('table')
data = []
for book in book_list:
book_name = book.find('div', {'class': 'pl2'}).find('a')['title']
book_author = book.find('p', {'class': 'pl'}).get_text()
book_rating = book.find('span', {'class': 'rating_nums'}).get_text()
book_comment = book.find('span', {'class': 'inq'}).get_text()
data.append([book_name, book_author, book_rating, book_comment])
```
4. 将获取到的数据存储到pandas的DataFrame中,并输出到csv文件中。
```python
df = pd.DataFrame(data, columns=['书名', '作者', '评分', '书评'])
df.to_csv('douban_book_top250.csv', index=False, encoding='utf-8-sig')
```
阅读全文