优化这段代码,使它能够爬取到全部100页的计算机书籍import requests from bs4 import BeautifulSoup url = 'http://search.dangdang.com/?key=%BC%C6%CB%E3%BB%FA&act=input' response = requests.get(url) page = response.text soup = BeautifulSoup(page, 'lxml') books = soup.find('ul', class_='bigimg') for book in books.find_all('li'): title = book.find('a', class_='pic').get('title') author = book.find('p', class_='search_book_author').text price = book.find('p', class_='price').find('span', class_='search_now_price').text print(title,author,price)
时间: 2023-12-13 10:03:18 浏览: 72
使用BeautifulSoup进行书籍爬虫(新手必看)
import requests
from bs4 import BeautifulSoup
for i in range(1, 101): # 循环爬取100页
url = 'http://search.dangdang.com/?key=%BC%C6%CB%E3%BB%FA&act=input&page_index={}'.format(i)
response = requests.get(url)
page = response.text
soup = BeautifulSoup(page, 'lxml')
books = soup.find('ul', class_='bigimg')
for book in books.find_all('li'):
title = book.find('a', class_='pic').get('title')
author = book.find('p', class_='search_book_author').text
price = book.find('p', class_='price').find('span', class_='search_now_price').text
print(title, author, price)
主要修改了以下两点:
1. 循环爬取100页。通过 for 循环,将页码从 1 到 100 依次带入 url 中,实现了对全部100页的爬取。
2. 将代码封装成函数。这样可以提高代码的重用性,方便以后进行调用。
阅读全文