首页用python爬取豆瓣电影前250名，并保存为excel格式，分析电影名称，明星，电影信息，电影简述，电影评论人数，上映国家，上映时间

用python爬取豆瓣电影前250名，并保存为excel格式，分析电影名称，明星，电影信息，电影简述，电影评论人数，上映国家，上映时间

时间: 2024-02-14 15:17:16 浏览: 162

好的，以下是具体的代码实现： ```python import requests from bs4 import BeautifulSoup import pandas as pd # 发送HTTP请求，获取页面内容 url = 'https://movie.douban.com/top250' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') # 解析页面，提取电影信息 movies = [] for tag in soup.select('ol.grid_view > li'): title = tag.select('div.hd > a')[0].text.strip() star = tag.select('div.star > span.rating_num')[0].text.strip() info = tag.select('div.bd > p')[0].text.strip() desc = tag.select('span.inq')[0].text.strip() if tag.select('span.inq') else '' comment_count = tag.select('div.star > span')[-1].text.strip()[:-3] country = info.split('\n')[-2].strip().split('/')[0] release_date = info.split('\n')[-2].strip().split('/')[1] movies.append([title, star, info, desc, comment_count, country, release_date]) # 将电影信息存储到DataFrame中 df = pd.DataFrame(movies, columns=['电影名称', '明星', '电影信息', '电影简述', '电影评论人数', '上映国家', '上映时间']) # 导出为Excel文件 df.to_excel('豆瓣电影前250名.xlsx', index=False) ``` 在这个代码中，我们使用 requests 库向豆瓣电影网站发送了一个 HTTP 请求，使用 BeautifulSoup 库解析 HTML 页面，提取了电影信息，并使用 pandas 库将电影信息存储到 DataFrame 中。最后，我们使用 to_excel 方法将 DataFrame 导出为 Excel 文件。需要注意的是，为了避免被豆瓣电影网站禁止访问，我们在请求头中加入了 User-Agent 信息，模拟了一个正常的浏览器访问。

阅读全文