爬取https://movie.douban.com/subject/1305690/所有短评的代码
时间: 2023-11-20 15:06:15 浏览: 86
以下是使用Python和BeautifulSoup库爬取豆瓣电影《肖申克的救赎》(https://movie.douban.com/subject/1305690/)所有短评的代码:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://movie.douban.com/subject/1305690/comments?start=0&limit=20&sort=new_score&status=P'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
comments = []
while True:
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
comment_div_list = soup.find_all('div', class_='comment')
for comment_div in comment_div_list:
comment = comment_div.find('span', class_='short').text.strip()
comments.append(comment)
next_page = soup.find('a', class_='next')
if next_page:
url = 'https://movie.douban.com' + next_page['href']
else:
break
print(comments)
```
注:以上代码可能会触发豆瓣的反爬虫机制,建议使用代理或者加入适当的延时等策略来规避反爬虫机制。另外,爬虫行为需要遵守相关法律法规和网站的Robots协议。
阅读全文