python爬取豆瓣电影top100的代码
时间: 2024-10-12 15:17:24 浏览: 24
Python爬取豆瓣电影Top100的数据通常会涉及到网络请求、HTML解析以及数据存储等步骤。这里是一个简化的例子,使用了`requests`库获取网页内容,`BeautifulSoup`库解析HTML,假设我们要获取电影名称和评分:
```python
import requests
from bs4 import BeautifulSoup
# 豆瓣电影Top250的URL
url = "https://movie.douban.com/top250"
def get_douban_movies():
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
# 找到包含电影信息的部分
movie_containers = soup.find_all('div', class_='hd')
top_100_movies = []
for container in movie_containers[:100]:
title = container.find('span', class_='title').text
rating = container.find('span', class_='rating_num').text.strip()
movie = {'title': title, 'rating': rating}
top_100_movies.append(movie)
return top_100_movies
movies_data = get_douban_movies()
for i, movie in enumerate(movies_data):
print(f"({i+1}) {movie['title']} - 评分: {movie['rating']}")
# 数据保存
with open("douban_top100_movies.txt", "w", encoding="utf-8") as f:
for movie in movies_data:
f.write(f"{movie['title']}, {movie['rating']}\n")
阅读全文