首页运用Xpath爬取豆瓣音乐top250的评论

运用Xpath爬取豆瓣音乐top250的评论

时间: 2023-10-23 19:14:22 浏览: 169

以下是运用XPath爬取豆瓣音乐top250评论的步骤： 1. 打开豆瓣音乐top250页面，复制页面链接。 2. 使用Python中的requests库发送HTTP请求，获取页面源代码。 ```python import requests url = 'https://music.douban.com/top250' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) html = response.text ``` 3. 使用lxml库解析页面源代码，并使用XPath表达式获取评论内容。 ```python from lxml import etree selector = etree.HTML(html) comments = selector.xpath('//div[@class="comment"]/p/text()') ``` 4. 打印评论内容。 ```python for comment in comments: print(comment) ``` 注意：在爬取豆瓣网站时，需要设置合适的请求头部信息，否则会被识别为爬虫而被封禁。同时，爬虫也需要尊重网站的爬虫规则，不要过于频繁地发送请求，以免对网站造成不必要的负担。

阅读全文