python爬取 红楼梦.txt
时间: 2023-11-03 15:02:57 浏览: 251
你可以使用Python的requests库和BeautifulSoup库来爬取《红楼梦》的文本。具体步骤如下:
1.导入所需的库:
```python
import requests
from bs4 import BeautifulSoup
```
2.使用requests库获取小说网站的HTML代码:
```python
url = "http://www.shicimingju.com/book/hongloumeng.html"
response = requests.get(url)
html = response.content
```
3.使用BeautifulSoup库解析HTML代码,获取小说正文部分的文本:
```python
soup = BeautifulSoup(html, 'html.parser')
content = soup.find('div', class_='book-mulu').next_sibling.next_sibling
text = content.text
```
4.将文本保存到文件中:
```python
with open('红楼梦.txt', 'w', encoding='utf-8') as f:
f.write(text)
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
url = "http://www.shicimingju.com/book/hongloumeng.html"
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
content = soup.find('div', class_='book-mulu').next_sibling.next_sibling
text = content.text
with open('红楼梦.txt', 'w', encoding='utf-8') as f:
f.write(text)
```
运行完毕后,当前目录下会生成一个名为“红楼梦.txt”的文件,其中包含了《红楼梦》的完整文本。
阅读全文