import requestsfrom bs4 import BeautifulSoupimport pandas as pd# 发送 GET 请求,获取网页源代码url = 'https://book.douban.com/top250?start=0'headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}response = requests.get(url, headers=headers)html = response.text# 解析网页源代码,提取书籍信息soup = BeautifulSoup(html, 'html.parser')book_list = soup.find('div', {'class': 'article'}).find_all('table')data = []for book in book_list: title = book.find('div', {'class': 'pl2'}).find('a')['title'] author = book.find('p', {'class': 'pl'}).get_text() rating = book.find('span', {'class': 'rating_nums'}).get_text() data.append([title, author, rating])# 将书籍信息保存为表格df = pd.DataFrame(data, columns=['书名', '作者', '评分'])df.to_csv('douban_book_top250.csv', index=False)
时间: 2023-06-08 19:04:34 浏览: 325
python常用库(带源码、详细解释、效果图!!!pandas、Matplotlib、爬虫相关的request库和bs4等。)
这三行Python代码用于导入requests、BeautifulSoup和pandas模块,以便在Python程序中使用它们的功能。requests模块用于发送HTTP请求和接收响应,BeautifulSoup模块用于解析HTML页面,pandas模块则提供了数据分析和处理的功能。
阅读全文