爬取豆瓣电影top250电影文字简述
时间: 2023-08-31 19:06:59 浏览: 126
java项目-第21期基于ssm在线电影票购买系统.zip
5星 · 资源好评率100%
好的,以下是爬取豆瓣电影Top250电影文字简述的方法:
1. 首先,您需要按照上面提供的方法,安装Python和必要的库,如requests、beautifulsoup4和pandas。
2. 接下来,您可以使用以下代码来获取豆瓣电影Top250的电影信息和简述:
```
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://movie.douban.com/top250'
def get_movie_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
movie_list = soup.find_all('div', class_='info')
movie_info_list = []
for movie in movie_list:
movie_dic = {}
movie_dic['title'] = movie.find('span', class_='title').text
movie_dic['rating'] = movie.find('span', class_='rating_num').text
movie_dic['quote'] = movie.find('span', class_='inq').text
movie_dic['link'] = movie.find('a')['href']
movie_info_list.append(movie_dic)
return movie_info_list
def get_movie_summary(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
summary = soup.find('span', property='v:summary').text.strip()
return summary
movie_info_list = []
for i in range(0, 250, 25):
url = f'https://movie.douban.com/top250?start={i}&filter='
movie_list = get_movie_info(url)
for movie in movie_list:
movie['summary'] = get_movie_summary(movie['link'])
movie_info_list.append(movie)
df = pd.DataFrame(movie_info_list)
df.to_csv('douban_top250_summary.csv', index=False)
```
3. 运行以上代码后,您将得到一个名为"douban_top250_summary.csv"的CSV文件,其中包含了豆瓣电影Top250的电影信息和简述,包括电影名称、评分、引用、链接和文字简述。
希望这能帮助您获取您需要的信息!
阅读全文