python爬取b站评论_用python 抓取B站视频评论,制作词云
时间: 2023-12-02 13:32:26 浏览: 112
好的,很高兴回答您的问题!
首先,我们需要安装一些必要的库,如requests、BeautifulSoup、jieba、wordcloud等。你可以使用以下命令进行安装:
```
pip install requests
pip install BeautifulSoup4
pip install jieba
pip install wordcloud
```
接下来,我们需要先获取视频的av号(或者BV号),然后通过B站的API接口来获取评论数据。代码如下:
```python
import requests
import json
# 获取B站视频评论
def get_bilibili_comment(aid):
url = "https://api.bilibili.com/x/v2/reply?pn={}&type=1&oid={}&sort=2"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
comments = []
for i in range(1, 6): # 抓取前5页评论
response = requests.get(url.format(i, aid), headers=headers)
data = json.loads(response.text)
for comment in data["data"]["replies"]:
comments.append(comment["content"]["message"])
return comments
# 获取B站视频的av号
def get_bilibili_av(bv):
url = "https://api.bilibili.com/x/web-interface/view?bvid={}"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
response = requests.get(url.format(bv), headers=headers)
data = json.loads(response.text)
return data["data"]["aid"]
```
接着,我们需要使用jieba库对评论内容进行分词,并将分词结果存储到一个列表中。代码如下:
```python
import jieba
# 分词
def cut_words(comments):
words = []
for comment in comments:
words += jieba.lcut(comment)
return words
```
最后,我们可以使用wordcloud库来制作词云图。代码如下:
```python
import matplotlib.pyplot as plt
from wordcloud import WordCloud
# 制作词云图
def make_wordcloud(words):
text = " ".join(words)
wc = WordCloud(background_color="white", max_words=200, font_path="msyh.ttc")
wc.generate(text)
plt.imshow(wc)
plt.axis("off")
plt.show()
```
完整代码如下:
```python
import requests
import json
import jieba
import matplotlib.pyplot as plt
from wordcloud import WordCloud
# 获取B站视频评论
def get_bilibili_comment(aid):
url = "https://api.bilibili.com/x/v2/reply?pn={}&type=1&oid={}&sort=2"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
comments = []
for i in range(1, 6): # 抓取前5页评论
response = requests.get(url.format(i, aid), headers=headers)
data = json.loads(response.text)
for comment in data["data"]["replies"]:
comments.append(comment["content"]["message"])
return comments
# 获取B站视频的av号
def get_bilibili_av(bv):
url = "https://api.bilibili.com/x/web-interface/view?bvid={}"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
response = requests.get(url.format(bv), headers=headers)
data = json.loads(response.text)
return data["data"]["aid"]
# 分词
def cut_words(comments):
words = []
for comment in comments:
words += jieba.lcut(comment)
return words
# 制作词云图
def make_wordcloud(words):
text = " ".join(words)
wc = WordCloud(background_color="white", max_words=200, font_path="msyh.ttc")
wc.generate(text)
plt.imshow(wc)
plt.axis("off")
plt.show()
if __name__ == "__main__":
bv = input("请输入B站视频的BV号:")
aid = get_bilibili_av(bv)
comments = get_bilibili_comment(aid)
words = cut_words(comments)
make_wordcloud(words)
```
这样,我们就可以使用Python来抓取B站视频评论,并制作词云图了。
阅读全文