用爬虫爬取豆瓣top250电影名称和评分。爬250条数据,存在csv文件中
时间: 2024-02-29 18:57:33 浏览: 110
好的,您需要安装Python和一些Python库,比如requests、beautifulsoup4和csv。接下来,我将为您提供一段代码来实现您的需求。
```python
import requests
from bs4 import BeautifulSoup
import csv
def get_movies():
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
movie_list = []
for i in range(10):
link = url + '?start=' + str(i * 25)
r = requests.get(link, headers=headers, timeout=10)
soup = BeautifulSoup(r.text, "html.parser")
div_list = soup.find_all('div', class_='hd')
for each in div_list:
movie = each.a.span.text.strip()
movie_list.append(movie)
return movie_list
def get_ratings():
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
rating_list = []
for i in range(10):
link = url + '?start=' + str(i * 25)
r = requests.get(link, headers=headers, timeout=10)
soup = BeautifulSoup(r.text, "html.parser")
div_list = soup.find_all('div', class_='star')
for each in div_list:
rating = each.find_all('span')[1].text.strip()
rating_list.append(rating)
return rating_list
if __name__ == '__main__':
movies = get_movies()
ratings = get_ratings()
with open('top250.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['电影名称', '评分'])
for i in range(250):
writer.writerow([movies[i], ratings[i]])
print('数据已成功写入csv文件!')
```
上述代码会爬取豆瓣电影top250的电影名称和评分,并将结果存储到`top250.csv`文件中。您只需要将以上代码复制到您的Python编辑器中,运行即可。
阅读全文