首页python 爬取 00ksw

python 爬取 00ksw

时间: 2023-11-12 14:59:07 浏览: 210

可以使用 Python 中的 requests 和 BeautifulSoup 库来爬取 00ksw 网站上的内容。具体步骤如下： 1. 使用 requests 库发送 HTTP 请求，获取网页的 HTML 内容。 2. 使用 BeautifulSoup 库解析 HTML 内容，提取需要的信息。 3. 将提取的信息保存到本地文件或数据库中。以下是一个简单的示例代码： ```python import requests from bs4 import BeautifulSoup url = 'http://www.00ksw.net/html/1/1696/' response = requests.get(url) response.encoding = 'gbk' # 设置编码，否则中文会乱码 html = response.text soup = BeautifulSoup(html, 'html.parser') title = soup.find('div', class_='bookname').h1.text content = soup.find('div', id='content').text with open('output.txt', 'w', encoding='utf-8') as f: f.write(title + '\n\n') f.write(content) ``` 这段代码可以爬取 00ksw 网站上《斗破苍穹》第一章的内容，并将标题和正文保存到 output.txt 文件中。

阅读全文