jupter爬取猫眼电影评论完整代码
时间: 2023-07-03 09:25:06 浏览: 112
好的,以下是爬取猫眼电影评论的完整代码:
```python
import requests
from lxml import etree
import pandas as pd
url = 'https://maoyan.com/films/1299372'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get(url, headers=headers)
html = etree.HTML(response.text)
movie_name = html.xpath("//h1[@class='name']/text()")[0]
comments_list = []
for i in range(1, 11):
url = f'https://m.maoyan.com/mmdb/comments/movie/1299372.json?_v_=yes&offset={i}'
response = requests.get(url, headers=headers)
data = response.json()['cmts']
for item in data:
comment = item['content']
comments_list.append(comment)
df = pd.DataFrame(comments_list, columns=['comments'])
df.to_csv(f'{movie_name}_comments.csv', index=False, encoding='utf-8-sig')
```
其中,我们通过 requests 库发送 HTTP 请求,获取猫眼电影详情页的 HTML 代码,然后使用 lxml 库解析 HTML 代码,获取电影名称。接着,我们循环遍历每一页的评论,通过 requests 库发送 JSON 请求,获取评论数据,并将评论内容存入一个列表中。最后,我们使用 pandas 库将评论列表转换为 DataFrame,并将 DataFrame 保存为 CSV 文件。
阅读全文