python爬虫豆瓣top250统计地区上映电影数
时间: 2023-07-11 08:58:43 浏览: 94
计算机网络课程爬虫实验.docx
好的,我可以帮你回答这个问题。首先,你需要先安装 `requests` 和 `BeautifulSoup` 两个库。然后,你可以按照以下步骤进行:
1. 定义一个函数,用于获取指定页面的 HTML 代码:
```python
import requests
def get_html(url):
response = requests.get(url)
return response.text
```
2. 定义一个函数,用于解析 HTML 代码并获取电影信息:
```python
from bs4 import BeautifulSoup
def parse_html(html):
soup = BeautifulSoup(html, 'html.parser')
movie_list = soup.find(class_='grid_view').find_all('li')
movies = []
for movie in movie_list:
title = movie.find('span', class_='title').text
info = movie.find('div', class_='bd').p.text.strip()
rating = movie.find('span', class_='rating_num').text
link = movie.find('div', class_='hd').a['href']
movies.append({
'title': title,
'info': info,
'rating': rating,
'link': link
})
return movies
```
3. 定义一个函数,用于统计每个地区上映电影数:
```python
def count_country(movies):
country_count = {}
for movie in movies:
info = movie['info']
country = info.split('/')[0].strip()
if country in country_count:
country_count[country] += 1
else:
country_count[country] = 1
return country_count
```
4. 最后,你可以按照以下步骤进行调用:
```python
url = 'https://movie.douban.com/top250'
html = get_html(url)
movies = parse_html(html)
country_count = count_country(movies)
for country, count in country_count.items():
print(country, count)
```
这样就可以得到每个地区上映电影数的统计结果了。
阅读全文