首页爬取豆瓣电影《战狼2》短评代码

爬取豆瓣电影《战狼2》短评代码

时间: 2023-09-24 12:13:49 浏览: 122

下面是一个简单的 Python 爬虫示例，用于爬取豆瓣电影《战狼2》的短评： ```python import requests from bs4 import BeautifulSoup url = 'https://movie.douban.com/subject/26363254/comments?start=0&limit=20&sort=new_score&status=P' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36', 'Referer': 'https://movie.douban.com/subject/26363254/' } response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') comments = soup.find_all('span', class_='short') for comment in comments: print(comment.text) ``` 在这个示例中，我们使用了 `requests` 库和 `BeautifulSoup` 库来获取网页内容和解析网页内容。首先设置了请求头，然后使用 `requests.get()` 函数获取网页内容，并使用 `BeautifulSoup()` 函数将其解析为 `soup` 对象。然后，我们使用 `soup.find_all()` 函数找到所有短评的标签，并将它们的文本保存到 `comments` 列表中。最后使用 `for` 循环遍历 `comments` 列表，输出每个短评的文本。需要注意的是，为了避免被封禁 IP，我们可以设置代理或者使用一些反反爬虫的技巧。并且豆瓣也有反爬虫机制，如果爬取过于频繁，可能会被封禁 IP 或者需要输入验证码。

阅读全文