首页python爬取红楼梦.txt

python爬取红楼梦.txt

时间: 2023-11-03 14:02:57 浏览: 360

你可以使用Python的requests库和BeautifulSoup库来爬取《红楼梦》的文本。具体步骤如下： 1.导入所需的库： ```python import requests from bs4 import BeautifulSoup ``` 2.使用requests库获取小说网站的HTML代码： ```python url = "http://www.shicimingju.com/book/hongloumeng.html" response = requests.get(url) html = response.content ``` 3.使用BeautifulSoup库解析HTML代码，获取小说正文部分的文本： ```python soup = BeautifulSoup(html, 'html.parser') content = soup.find('div', class_='book-mulu').next_sibling.next_sibling text = content.text ``` 4.将文本保存到文件中： ```python with open('红楼梦.txt', 'w', encoding='utf-8') as f: f.write(text) ``` 完整代码如下： ```python import requests from bs4 import BeautifulSoup url = "http://www.shicimingju.com/book/hongloumeng.html" response = requests.get(url) html = response.content soup = BeautifulSoup(html, 'html.parser') content = soup.find('div', class_='book-mulu').next_sibling.next_sibling text = content.text with open('红楼梦.txt', 'w', encoding='utf-8') as f: f.write(text) ``` 运行完毕后，当前目录下会生成一个名为“红楼梦.txt”的文件，其中包含了《红楼梦》的完整文本。

阅读全文