爬虫爬取豆瓣电影top100电影评分代码
时间: 2024-10-17 10:12:30 浏览: 25
爬虫爬取豆瓣电影TOP250源代码
爬取豆瓣电影Top100电影评分的代码通常涉及网络请求库(如Python的requests)、HTML解析库(如BeautifulSoup或 lxml)以及数据存储库(比如Pandas)。下面是一个简单的Python示例,假设我们只关注基本的HTTP请求和页面结构:
```python
import requests
from bs4 import BeautifulSoup
# 豆瓣电影Top250的URL
url = 'https://movie.douban.com/top250'
def get_movie_ratings(url):
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'lxml')
# 找到电影标题和评分的元素
movie_ratings = soup.select('.item .title > a span:nth-child(1)') # 标题
ratings_list = soup.select('.rating_num') # 评分
data = []
for title, rating in zip(movie_ratings, ratings_list):
title_text = title.get_text().strip()
rating_text = rating.get_text()
data.append({'电影名': title_text, '评分': rating_text})
return data
else:
print(f"请求失败,状态码:{response.status_code}")
return []
top_movies = get_movie_ratings(url)
for i, movie in enumerate(top_movies[:10]):
print(f"{i+1}. {movie['电影名']} - 评分:{movie['评分']}")
阅读全文