首页python爬取知乎上“发现”页面的“热门话题”部分，将其问题和答案统一保存成文本形式完整代码

python爬取知乎上“发现”页面的“热门话题”部分，将其问题和答案统一保存成文本形式完整代码

时间: 2024-09-09 22:03:02 浏览: 103

在Python中，我们可以使用`requests`库来发送HTTP请求获取网页内容，然后利用如`BeautifulSoup`或`lxml`等库解析HTML结构。下面是一个基本的例子，展示如何爬取知乎"发现"页面的热门话题及其问题和答案，并将结果保存为文本文件： ```python import requests from bs4 import BeautifulSoup # 发送GET请求到知乎发现页面 url = "https://www.zhihu.com/explore" response = requests.get(url) # 检查请求是否成功 if response.status_code == 200: # 使用BeautifulSoup解析HTML soup = BeautifulSoup(response.text, 'html.parser') # 找到热门话题列表 topics = soup.find('div', {'class': 'TopicItem'}) # 提取每个主题的问题和答案 data = [] for topic in topics.find_all('li'): title = topic.find('a').text.strip() # 问题标题 question_url = topic.find('a')['href'] # 链接到问题详情页 # 点击进入问题详情页并提取答案 detail_response = requests.get(question_url) detail_soup = BeautifulSoup(detail_response.text, 'html.parser') answers = [answer.text.strip() for answer in detail_soup.find_all('span', {'class': '_2p5J6z'})] # 获取所有答案 data.append({ 'title': title, 'answers': answers, }) else: print("请求失败") # 将数据保存为文本文件 with open('zhishi_hot_topics.txt', 'w', encoding='utf-8') as f: for item in data: f.write(f"{item['title']}\n{'---\n'.join(item['answers'])}\n\n") print("数据已保存至zhishi_hot_topics.txt")

阅读全文