编写一个python程序,爬取豆瓣top250的内容,并以{"movie_name":"肖申克的救赎","movie_category":"犯罪 剧情"}的信息储存格式保存至"list_data.json"文件
时间: 2024-02-13 07:59:10 浏览: 76
编写一个Python程序来爬取豆瓣Top250的内容并保存为JSON格式的文件可以分为以下几个步骤:
1. 导入所需的库:
```python
import requests
import json
from bs4 import BeautifulSoup
```
2. 定义一个函数来获取页面内容:
```python
def get_page_content(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36'
}
response = requests.get(url, headers=headers)
return response.text
```
3. 定义一个函数来解析页面内容并提取电影信息:
```python
def parse_page_content(content):
soup = BeautifulSoup(content, 'html.parser')
movie_list = []
items = soup.find_all('div', class_='item')
for item in items:
movie_name = item.find('span', class_='title').text
movie_category = item.find('span', class_='other').text.strip().replace('\xa0', ' ')
movie_info = {
'movie_name': movie_name,
'movie_category': movie_category
}
movie_list.append(movie_info)
return movie_list
```
4. 定义一个函数来保存电影信息到JSON文件:
```python
def save_to_json(data, filename):
with open(filename, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=4)
```
5. 主程序部分:
```python
if __name__ == '__main__':
url = 'https://movie.douban.com/top250'
page_content = get_page_content(url)
movie_list = parse_page_content(page_content)
save_to_json(movie_list, 'list_data.json')
```
这样,程序就会爬取豆瓣Top250的内容,并将电影名称和分类以JSON格式保存到"list_data.json"文件中。
阅读全文