首页优化这段代码，使它能够爬取到全部100页的计算机书籍import requests from bs4 import BeautifulSoup url = 'http://search.dangdang.com/?key=%BC%C6%CB%E3%BB%FA&act=input' response = requests.get(url) page = response.text soup = BeautifulSoup(page, 'lxml') books = soup.find('ul', class_='bigimg') for book in books.find_all('li'): title = book.find('a', class_='pic').get('title') author = book.find('p', class_='search_book_author').text price = book.find('p', class_='price').find('span', class_='search_now_price').text print(title,author,price)

优化这段代码，使它能够爬取到全部100页的计算机书籍import requests from bs4 import BeautifulSoup url = 'http://search.dangdang.com/?key=%BC%C6%CB%E3%BB%FA&act=input' response = requests.get(url) page = response.text soup = BeautifulSoup(page, 'lxml') books = soup.find('ul', class_='bigimg') for book in books.find_all('li'): title = book.find('a', class_='pic').get('title') author = book.find('p', class_='search_book_author').text price = book.find('p', class_='price').find('span', class_='search_now_price').text print(title,author,price)

时间: 2023-12-13 10:03:18 浏览: 72

使用BeautifulSoup进行书籍爬虫（新手必看）

Beautiful Soup自动将输入文档转换为Unicode编码，输出文档转换为utf-8编码。你不需要考虑编码方式，除非文档没有指定一个编码方式，这时，Beautiful Soup就不能自动识别编码方式了。然后，你仅仅需要说明一下原始编码方式就可以了，简称bs4，主要功能也是解析和提取数据，缺点是效率没有lxml的效率高(xpath)，优点是接口设计人性化，使用方便。推荐使用get_text()适用于标签中嵌套标签的结构BeautifulSoup效率没有xpath高，但有些设计人性化（对前端人员友好）既可以解析本地文件，也可以解析服务器响应的数据用BeautifulSoup打开文件时默认gbk编码，要先编码成’utf-8’，第二个参数’lxml’不能忘要分清获取的是节点对象还是列表，如果是列表可以用索引找到第n个标签

import requests from bs4 import BeautifulSoup for i in range(1, 101): # 循环爬取100页 url = 'http://search.dangdang.com/?key=%BC%C6%CB%E3%BB%FA&act=input&page_index={}'.format(i) response = requests.get(url) page = response.text soup = BeautifulSoup(page, 'lxml') books = soup.find('ul', class_='bigimg') for book in books.find_all('li'): title = book.find('a', class_='pic').get('title') author = book.find('p', class_='search_book_author').text price = book.find('p', class_='price').find('span', class_='search_now_price').text print(title, author, price) 主要修改了以下两点： 1. 循环爬取100页。通过 for 循环，将页码从 1 到 100 依次带入 url 中，实现了对全部100页的爬取。 2. 将代码封装成函数。这样可以提高代码的重用性，方便以后进行调用。

阅读全文