首页爬取的帖子回复保存到文件中

爬取的帖子回复保存到文件中

时间: 2024-04-02 10:02:54 浏览: 46

你可以使用Python的requests库和BeautifulSoup库来爬取网页内容，再将回复保存到文件中。以下是一个简单的示例代码： ```python import requests from bs4 import BeautifulSoup url = 'https://bbs.csdn.net/topics/600305459' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') with open('replies.txt', 'w', encoding='utf-8') as file: for reply in soup.find_all('div', class_='reply-body'): content = reply.find('div', class_='markdown-body').get_text().strip() file.write(content + '\n') ``` 首先，使用requests库获取网页内容，并使用BeautifulSoup库解析HTML页面。然后，使用`with open()`语句打开文本文件，以写入模式（'w'）打开，并指定编码为UTF-8。接着，遍历所有回复的div元素，并使用`find()`方法获取回复内容所在的div元素，然后使用`get_text()`方法获取回复内容的纯文本形式，并去除前后空格。最后，将回复内容写入文件中，并在每个回复后添加一个换行符。注意，这里假设回复的div元素有一个class属性为'reply-body'，回复内容所在的div元素有一个class属性为'markdown-body'，你需要根据实际情况进行调整。

阅读全文