python爬取微博关键词代码
时间: 2023-11-16 18:59:48 浏览: 144
以下是Python爬取微博关键词的代码示例:
```python
import requests
from bs4 import BeautifulSoup
import jieba
from wordcloud import WordCloud
# 设置请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
# 获取搜索结果页面
def get_page(keyword, page):
url = 'https://s.weibo.com/weibo?q=' + keyword + '&page=' + str(page)
response = requests.get(url, headers=headers)
return response.text
# 解析搜索结果页面,获取微博内容
def get_content(html):
soup = BeautifulSoup(html, 'html.parser')
content_list = soup.find_all('p', class_='txt')
content = ''
for item in content_list:
content += item.text.strip()
return content
# 对微博内容进行分词
def cut_words(content):
word_list = jieba.cut(content)
words = ' '.join(word_list)
return words
# 生成词云图片
def generate_wordcloud(words):
wc = WordCloud(background_color='white', width=800, height=600, max_words=200, font_path='msyh.ttc')
wc.generate(words)
wc.to_file('wordcloud.png')
# 主函数
def main():
keyword = input('请输入要搜索的关键词:')
page = int(input('请输入要搜索的页数:'))
content = ''
for i in range(1, page+1):
html = get_page(keyword, i)
content += get_content(html)
words = cut_words(content)
generate_wordcloud(words)
if __name__ == '__main__':
main()
```
以上代码可以实现输入关键词和页数,然后爬取对应页数的微博内容,对内容进行分词并生成词云图片。需要注意的是,爬取微博数据需要登录账号并获取cookie,否则会被反爬虫机制拦截。
阅读全文