python爬虫四大名著源代码
时间: 2023-12-02 08:03:04 浏览: 165
为了遵守规则3,我将只使用引用中提供的信息来回答您的问题。
以下是一个简单的Python爬虫,用于从四大名著网站获取《红楼梦》的章节内容:
```python
import requests
from bs4 import BeautifulSoup
# 设置请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
# 发送请求
url = 'http://www.xiangcunxiaoshuo.com/hlm/'
response = requests.get(url, headers=headers)
# 解析HTML
soup = BeautifulSoup(response.text, 'html.parser')
chapter_list = soup.find_all('div', class_='book_list')[0].find_all('a')
# 获取所有章节链接
chapter_links = []
for chapter in chapter_list:
chapter_links.append(chapter['href'])
# 获取每个章节的内容
for link in chapter_links:
response = requests.get(link, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
content = soup.find_all('div', class_='book_content')[0].text
print(content)
time.sleep(1) # 防止被封IP
```
阅读全文