优化这段代码,使它能够爬取到全部100页的计算机书籍import requests from bs4 import BeautifulSoup for i in range(1, 101): # 循环爬取100页 url = 'http://search.dangdang.com/?key=%BC%C6%CB%E3%BB%FA&act=input&page_index={}'.format(i) response = requests.get(url) page = response.text soup = BeautifulSoup(page, 'lxml') books = soup.find('ul', class_='bigimg') for book in books.find_all('li'): title = book.find('a', class_='pic').get('title') author = book.find('p', class_='search_book_author').text price = book.find('p', class_='price').find('span', class_='search_now_price').text print(title, author, price)
时间: 2023-12-13 15:03:07 浏览: 85
使用Python的Requests、Selenium和BeautifulSoup结合的爬虫示例代码,用于爬取带有分页的动态网页
可以使用多线程或异步进行优化,以加快数据爬取的速度。
以下是使用多线程进行优化的示例代码:
```python
import requests
from bs4 import BeautifulSoup
from threading import Thread
def crawl_books(start, end):
for i in range(start, end):
url = 'http://search.dangdang.com/?key=%BC%C6%CB%E3%BB%FA&act=input&page_index={}'.format(i)
response = requests.get(url)
page = response.text
soup = BeautifulSoup(page, 'lxml')
books = soup.find('ul', class_='bigimg')
for book in books.find_all('li'):
title = book.find('a', class_='pic').get('title')
author = book.find('p', class_='search_book_author').text
price = book.find('p', class_='price').find('span', class_='search_now_price').text
print(title, author, price)
threads = []
for i in range(1, 101, 10):
t = Thread(target=crawl_books, args=(i, i+10))
threads.append(t)
t.start()
for t in threads:
t.join()
```
以上代码将每10页作为一组,启动一个线程来爬取数据,共启动10个线程,可以大大缩短数据爬取的时间。
阅读全文