优化这段代码,使它能够爬取到全部计算机书籍import requests from bs4 import BeautifulSoup url = 'http://search.dangdang.com/?key=%BC%C6%CB%E3%BB%FA&act=input' response = requests.get(url) page = response.text soup = BeautifulSoup(page, 'lxml') books = soup.find('ul', class_='bigimg') for book in books.find_all('li'): title = book.find('a', class_='pic').get('title') author = book.find('p', class_='search_book_author').text price = book.find('p', class_='price').find('span', class_='search_now_price').text print(title,author,price)
时间: 2023-12-13 10:03:28 浏览: 66
使用BeautifulSoup进行书籍爬虫(新手必看)
可以添加以下优化:
1. 使用session保持连接,减少网络延迟
2. 使用异常处理,防止程序出错
3. 使用CSS选择器代替find方法,提高效率
下面是优化后的代码:
```python
import requests
from bs4 import BeautifulSoup
url = 'http://search.dangdang.com/?key=%BC%C6%CB%E3%BB%FA&act=input'
# 1. 使用session保持连接
session = requests.Session()
response = session.get(url)
page = response.text
soup = BeautifulSoup(page, 'lxml')
books = soup.select('ul.bigimg li')
for book in books:
try:
title = book.select_one('a.pic').get('title')
author = book.select_one('p.search_book_author').text
price = book.select_one('p.price span.search_now_price').text
print(title, author, price)
except:
print('Error occurred while parsing book information')
```
注意:如果爬取过程中出现验证码或反爬策略,需要进一步优化处理。
阅读全文