python爬取豆瓣top250数据代码
时间: 2023-09-05 14:11:57 浏览: 44
以下是用 Python 爬取豆瓣 Top250 电影数据的代码示例:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
def get_movies(url):
movie_list = []
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
movies = soup.find_all('div', class_='item')
for movie in movies:
movie_name = movie.find('span', class_='title').text
movie_rating = movie.find('span', class_='rating_num').text
movie_list.append((movie_name, movie_rating))
next_page = soup.find('span', class_='next').find('a')
if next_page:
return movie_list, url + next_page['href']
return movie_list, None
while url:
movies, url = get_movies(url)
for movie in movies:
print(movie[0], movie[1])
```
这段代码会从豆瓣 Top250 页面开始,使用 BeautifulSoup 解析网页内容,并提取出电影的名称和评分信息。然后,它会找到下一页的链接,并继续爬取下一页的数据,直到所有页面都被爬取完毕。
请注意,爬取网页内容时,需要设置合适的 User-Agent,以避免被网站屏蔽。
阅读全文