运用requests库爬取当当网页
时间: 2023-07-29 20:11:47 浏览: 132
python学习笔记.txt
使用requests库爬取当当网页的步骤如下:
1. 导入requests库
```python
import requests
```
2. 发送GET请求获取页面内容
```python
url = "http://search.dangdang.com/?key=python&act=input"
response = requests.get(url)
```
3. 解析页面内容
```python
html = response.text
```
4. 使用第三方库(如BeautifulSoup)解析页面内容,获取所需信息
```python
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
book_list = soup.find_all('a', class_='pic')
for book in book_list:
print(book['title'])
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
url = "http://search.dangdang.com/?key=python&act=input"
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
book_list = soup.find_all('a', class_='pic')
for book in book_list:
print(book['title'])
```
上述代码会爬取当当网站上搜索关键字为“python”的图书列表,并输出每本图书的书名。需要注意的是,爬取网站时需要尊重网站的robots协议,不要对网站造成过大的负担。
阅读全文