首页python爬虫爬取豆瓣评论页

python爬虫爬取豆瓣评论页

时间: 2023-10-11 22:11:21 浏览: 123

要爬取豆瓣评论页，可以使用Python的爬虫技术结合相关库来实现。可以按照以下步骤进行操作： 1. 首先，导入所需的库，如requests、BeautifulSoup和pandas。 2. 使用requests库向豆瓣评论页发送HTTP请求，获取页面的HTML内容。 3. 使用BeautifulSoup库解析HTML内容，提取所需的评论数据。 4. 将评论数据保存到Excel文件中，可以使用pandas库来处理和导出数据。下面是一个示例代码，展示了如何爬取豆瓣评论页并保存到Excel文件中： ```python import requests from bs4 import BeautifulSoup import pandas as pd # 发送HTTP请求获取页面内容 url = 'https://movie.douban.com/subject/1292052/comments?status=P' headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) html_content = response.text # 解析HTML内容，提取评论数据 soup = BeautifulSoup(html_content, 'html.parser') comments = soup.find_all(class_='comment-item') data = [] for comment in comments: username = comment.find(class_='comment-info').find('a').text.strip() rating = comment.find(class_='rating').attrs['class'][0].replace('allstar', '') content = comment.find(class_='short').text.strip() data.append([username, rating, content]) # 将数据保存到Excel文件中 df = pd.DataFrame(data, columns=['Username', 'Rating', 'Content']) df.to_excel('douban_comments.xlsx', index=False) ```

阅读全文