python爬虫豆瓣电影1000
时间: 2023-11-11 21:04:53 浏览: 98
以下是使用Python爬虫爬取豆瓣电影Top250的示例代码:
```python
import requests
from lxml import etree
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
movie_list = []
def get_movies(url):
html = requests.get(url, headers=headers)
selector = etree.HTML(html.text)
movies = selector.xpath('//div[@class="info"]')
for movie in movies:
title = movie.xpath('div[@class="hd"]/a/span[@class="title"]/text()')[0]
director = movie.xpath('div[@class="bd"]/p/text()')[0]
rate = movie.xpath('div[@class="bd"]/div[@class="star"]/span[@class="rating_num"]/text()')[0]
movie_list.append({'title': title, 'director': director, 'rate': rate})
for i in range(0, 250, 25):
url = 'https://movie.douban.com/top250?start=' + str(i) + '&filter='
get_movies(url)
for movie in movie_list:
print(movie)
```
该代码使用requests和lxml库来获取和解析HTML页面。首先,我们定义了一个URL和请求头。然后,我们定义了一个空列表来存储电影信息。接下来,我们定义了一个函数来获取每一页的电影信息。我们使用XPath表达式来提取电影标题,导演和评分,并将其添加到电影列表中。最后,我们循环遍历每一页,并打印出每部电影的信息。
阅读全文