首页使用Request库爬取当当网中与python相关的书籍信息。思路分析: 找准url 传查询参数设置请求头 User-Agent 使用requests访问网络，爬取数据通过print打印输出爬取到的网页源码

使用Request库爬取当当网中与python相关的书籍信息。思路分析: 找准url 传查询参数设置请求头 User-Agent 使用requests访问网络，爬取数据通过print打印输出爬取到的网页源码

时间: 2024-09-19 15:06:18 浏览: 81

在Python中，可以使用`requests`库来爬取网站信息，如当当网的图书列表。首先，你需要确定目标URL，通常这类页面会有特定的搜索参数用于筛选书籍。以下是爬取与Python相关的书籍信息的一般步骤： 1. 导入所需库：```python import requests from bs4 import BeautifulSoup # 用于解析HTML内容 ``` 2. 定义URL及其参数：假设你要搜索"Python"关键字，你可以构造这样的URL： ```python url = "https://book.dangdang.com/searchkeyword?keyWords=Python" ``` 3. 设置请求头，包括User-Agent，让服务器认为是一个浏览器而非机器人： ```python headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3' } ``` 4. 发送GET请求并获取响应： ```python response = requests.get(url, headers=headers) ``` 5. 检查请求状态码（200表示成功），然后解析HTML内容： ```python if response.status_code == 200: soup = BeautifulSoup(response.text, 'lxml') else: print("请求失败，状态码:", response.status_code) return ``` 6. 遍历解析后的HTML结构，找到书名、作者等信息： ```python books_info = [] # 存放爬取的数据 for item in soup.find_all('div', class_='list-item'): # 或者查找其他包含书籍信息的元素 title = item.find('a', class_='name').text # 标题 author = item.find('span', class_='author').text # 作者 # ...继续解析其他信息 books_info.append({'title': title, 'author': author}) # 将数据放入字典 ``` 7. 输出或保存爬取的信息： ```python for book in books_info: print(f"书名: {book['title']}, 作者: {book['author']}") # 可以存储到文件或数据库中 ```

阅读全文