python爬电影排名用os bs4_编写python脚本利用requests+bs4模块爬取豆瓣电影top250的数据...
时间: 2024-01-08 09:03:52 浏览: 76
基于Python的bs4库爬取电影排名
好的,您可以参考以下代码实现:
```python
import requests
from bs4 import BeautifulSoup
import os
# 获取豆瓣电影top250的页面信息
def get_html(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
try:
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.text
except requests.RequestException:
return None
# 解析页面信息,获取电影信息
def parse_html(html):
soup = BeautifulSoup(html, 'lxml')
movies = []
for item in soup.find_all('div', class_='item'):
movie = {}
movie['rank'] = item.em.text
movie['title'] = item.find('span', class_='title').text
movie['score'] = item.find('span', class_='rating_num').text
movie['quote'] = item.find('span', class_='inq').text
movies.append(movie)
return movies
# 将电影信息保存到本地文件中
def save_to_file(movies):
if not os.path.exists('./douban_movies'):
os.mkdir('./douban_movies')
with open('./douban_movies/top250.txt', 'w', encoding='utf-8') as f:
for movie in movies:
f.write('排名:{}\n电影名称:{}\n评分:{}\n简介:{}\n\n'.format(movie['rank'], movie['title'], movie['score'], movie['quote']))
# 爬取豆瓣电影top250的数据
def main():
url = 'https://movie.douban.com/top250'
html = get_html(url)
movies = parse_html(html)
save_to_file(movies)
if __name__ == '__main__':
main()
```
该脚本会爬取豆瓣电影top250的排名、电影名称、评分和简介,并将其保存到本地文件中。您可以根据自己的需求进行修改。
阅读全文