用lxml爬取豆瓣电影
时间: 2023-12-18 09:04:15 浏览: 83
爬取豆瓣电影
好的,以下是使用lxml解析库爬取豆瓣电影Top250的示例代码:
```python
import requests
from lxml import etree
url = 'https://movie.douban.com/top250'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
html = response.text
tree = etree.HTML(html)
movies = tree.xpath('//ol[@class="grid_view"]/li')
for movie in movies:
title = movie.xpath('.//span[@class="title"]/text()')[0]
score = movie.xpath('.//span[@class="rating_num"]/text()')[0]
print(title, score)
```
这段代码与前面的示例代码相似,使用lxml库来解析HTML页面。它将爬取豆瓣电影Top250的电影标题和评分,并将它们打印出来。您也可以根据自己的需求进行修改和定制。请确保您已安装好requests和lxml库。
阅读全文