首页这里我提到的是采集该网站广州所有城区的前5页的租房信息

这里我提到的是采集该网站广州所有城区的前5页的租房信息

时间: 2024-03-11 10:45:51 浏览: 27

非常抱歉，我之前理解有误。针对你的要求，以下是修改后的代码： ```python import requests from bs4 import BeautifulSoup import openpyxl # 需要采集的城区列表 districts = ['tianhe', 'yuexiu', 'liwan', 'haizhu', 'panyu', 'baiyun', 'huangpugz', 'conghua', 'zengcheng', 'huadou'] # 创建 Excel 文件并设置表头 workbook = openpyxl.Workbook() worksheet = workbook.active worksheet.append(['城区', '房源标题', '房源面积', '房源朝向', '价格']) # 循环遍历每个城区和每一页，并采集信息 for district in districts: for page in range(1, 6): url = f'https://gz.lianjia.com/zufang/{district}/pg{page}/' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') items = soup.find_all('div', class_='content__list--item') for item in items: title = item.find('a', class_='content__list--item--aside')['title'] area = item.find('p', class_='content__list--item--des').text.split('/')[1] direction = item.find('p', class_='content__list--item--des').text.split('/')[2] price = item.find('span', class_='content__list--item-price').text worksheet.append([district, title, area, direction, price]) # 保存 Excel 文件 workbook.save('lianjia.xlsx') ``` 这个程序与之前的程序类似，只是修改了循环遍历的范围，采集了每个城区的前 5 页信息。程序首先定义需要采集的城区列表，然后循环遍历每个城区和每一页，采集信息并将其添加到 Excel 文件中。最后，程序保存 Excel 文件并结束。同样需要注意：在实际使用中，请遵守网站的爬虫规则并尊重网站的服务。不要频繁地发送请求或采集大量的数据，以免对网站造成不必要的负担。