编写一个爬取起点中文网的python爬虫
时间: 2024-04-23 11:15:11 浏览: 28
好的,下面是一个简单的爬取起点中文网小说信息的Python爬虫:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://www.qidian.com/all'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
novels = soup.select('.book-mid-info')
for novel in novels:
name = novel.select('h4')[0].text
author = novel.select('.name')[0].text
style = novel.select('.author')[0].text.strip().replace('\n', '')
status = novel.select('.author span')[0].text
intro = novel.select('.intro')[0].text.strip().replace('\n', '')
print(name, author, style, status, intro)
```
以上代码使用了requests库和BeautifulSoup库,requests库用于发送HTTP请求,BeautifulSoup库用于解析HTML文档。通过获取起点中文网小说页面的HTML文档,使用BeautifulSoup库解析文档,提取出小说的名称、作者、类型、状态和简介等信息,并输出到控制台。