使用requests和xpath从搜房网上抓取上海的二手房房源信息（要求获取所有分页上的房源，且每套房源包含标题、楼盘、地点、经纬度、面积、房型、楼层、朝向、建筑年代、单价、总价、经纪人、联系电话等，缺数据的留空）请编写完整代码

首先，你需要安装必要的库，如`requests`, `lxml`（用于解析HTML），以及`beautifulsoup4`（虽然XPath主要用于lxml，但在requests中也可以通过BeautifulSoup支持）。然后按照以下步骤编写Python代码： ```python import requests from bs4 import BeautifulSoup from selenium.webdriver import Chrome, ChromeOptions # 定义要访问的页面URL，这里假设搜房网的二手房列表页面URL base_url = "https://sh.fang.com/touch/ershoufang/" def get_html(url): # 使用requests获取网页内容 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3' } response = requests.get(url, headers=headers) return response.text def parse_page(html): # 解析HTML内容 soup = BeautifulSoup(html, 'lxml') # 查找并保存房源信息 houses = [] for item in soup.select('.list-item'): title = item.select_one('.title').get_text() if item.select('.title') else '' district = item.select_one('.district').get_text() if item.select('.district') else '' location = item.select_one('.location').get_text() if item.select('.location') else '' lat_long = None # 可能需要使用地图API获取经纬度，此处简化略去 area = item.select_one('.area').get_text() if item.select('.area') else '' house_type = item.select_one('.house-type').get_text() if item.select('.house-type') else '' floor = item.select_one('.floor').get_text() if item.select('.floor') else '' orientation = item.select_one('.orientation').get_text() if item.select('.orientation') else '' year_of_construction = item.select_one('.year-of-construction').get_text() if item.select('.year-of-construction') else '' price_per_square_meter = item.select_one('.price-per-square-meter').get_text() if item.select('.price-per-square-meter') else '' total_price = item.select_one('.total-price').get_text() if item.select('.total-price') else '' agent_name = item.select_one('.agent-name').get_text() if item.select('.agent-name') else '' phone_number = item.select_one('.phone-number').get_text() if item.select('.phone-number') else '' houses.append({ 'title': title, 'district': district, 'location': location, 'lat_long': lat_long, 'area': area, 'house_type': house_type, 'floor': floor, 'orientation': orientation, 'year_of_construction': year_of_construction, 'price_per_square_meter': price_per_square_meter, 'total_price': total_price, 'agent_name': agent_name, 'phone_number': phone_number, }) return houses def fetch_all_houses(): # 获取首页房源，后续分页处理需引入Selenium，因为某些网站可能会加载更多房源需要JavaScript houses = parse_page(get_html(base_url)) # 需要使用Selenium模拟点击下一页，这里仅提供思路，实际实现依赖于搜房网的具体结构 next_button_xpath = '//a[contains(text(), "下一页")]' while True: try: # 模拟点击下一页 options = ChromeOptions() driver = Chrome(options=options) driver.get(base_url) next_button = driver.find_element_by_xpath(next_button_xpath) next_button.click() # 获取新页面的HTML new_html = driver.page_source driver.quit() # 解析新页面 new_houses = parse_page(new_html) houses.extend(new_houses) # 如果没有找到“下一页”，则停止循环 if not next_button.is_displayed(): break except Exception as e: print(f"Failed to load more pages: {e}") break return houses all_houses = fetch_all_houses() for house in all_houses: print(house)

阅读全文

相关推荐

使用Scrapy框架爬取房天下房源信息.pptx

从链家抓取杭州二手房数据的爬虫.zip

spider_demo:使用requests和BeautifulSoup抓取页面

使用requests和xpath从搜房网上抓取上海的二手房房源信息（要求获取所有分页上的房源，且每套房源包含标题、楼盘、地点、经纬度、面积、房型、楼层、朝向、建筑年代、单价、总价、经纪人、联系电话等，缺数据的留空）

Python之requests+xpath爬取猫眼电影并写入数据库(图文教程)

scraper_news：Python和XPath上的Scraper de Noticas

使用Python Part-I进行Web抓取

爬虫_8 xpath的使用 好段子爬取_爬虫_python_

爬虫 爬取58同城二手房信息.zip

BossXX 岗位信息简单自动化抓取

Python-PySoundCloud使用requesthtml来抓取和解析单页网站

python jike Time VIP课程信息 抓取脚本

python实现获取链家租房信息

使用Nodejs多进程分页爬虫.zip

Python requests30行代码爬取知乎一个问题的所有回答

使用Python编写一个爬虫来抓取电影

编写程序，使用Selenium访问乐视网的数据，即在乐视网上寻找一个视频，爬取视频的评论信息。目标网站：http://www.le.com。目标内容：爬取视频评论。请分析网站的异步加载请求，使用selenium、requests发送请求完成数据爬取。

OpenCV部署YOLOv5-pose人体姿态估计（C++和Python双版本）.zip

大家在看

COBIT操作手册

2000-2022年 上市公司-股价崩盘风险相关数据（数据共52234个样本，包含do文件、excel数据和参考文献）.zip

IEEE_Std_1588-2008

SC1235设计应用指南_V1.2.pdf

CG2H40010F PDK文件

最新推荐

Python requests30行代码爬取知乎一个问题的所有回答

OpenCV部署YOLOv5-pose人体姿态估计（C++和Python双版本）.zip

HTML挑战：30天技术学习之旅

【CodeBlocks精通指南】：一步到位安装wxWidgets库（新手必备）

andorid studio 配置ERROR: Cause: unable to find valid certification path to requested target

VC++实现文件顺序读写操作的技巧与实践

【大数据时代必备：Hadoop框架深度解析】：掌握核心组件，开启数据科学之旅

opencv的demo程序

NeuronTransportIGA: 使用IGA进行神经元材料传输模拟

【Linux多系统管理大揭秘】：专家级技巧助你轻松驾驭

爬虫_8 xpath的使用好段子爬取_爬虫_python_

爬虫爬取58同城二手房信息.zip

python jike Time VIP课程信息抓取脚本

2000-2022年上市公司-股价崩盘风险相关数据（数据共52234个样本，包含do文件、excel数据和参考文献）.zip