用python的xpath方法获取58同城房产的二手房网站柯桥区域的前十页的每条二手房，在详情页面抓取标题、小区名称、单价、总价、户型、所在楼层、面积最后写入csv文件，加入反爬措施

首先，我们需要安装 `requests` 和 `lxml` 库。 ```python !pip install requests lxml ``` 接着，我们可以使用 `requests` 发送请求，并使用 `lxml` 的 `etree` 模块解析 HTML。 ```python import requests from lxml import etree url = 'https://hz.58.com/ershoufang/kaifaqu/pn1/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) html = etree.HTML(response.text) ``` 接下来，我们需要使用 XPath 获取二手房列表的每个房源的链接，并进入详情页获取信息。 ```python for i in range(1, 11): # 前十页 url = f'https://hz.58.com/ershoufang/kaifaqu/pn{i}/' response = requests.get(url, headers=headers) html = etree.HTML(response.text) house_list = html.xpath('//ul[@class="house-list-wrap"]/li') for house in house_list: detail_url = house.xpath('./div[@class="list-info"]/h2/a/@href')[0] detail_response = requests.get(detail_url, headers=headers) detail_html = etree.HTML(detail_response.text) title = detail_html.xpath('//div[@class="title"]/h1/text()')[0] community = detail_html.xpath('//div[@class="communityName"]/a/text()')[0] price = detail_html.xpath('//span[@class="total"]/text()')[0] unit_price = detail_html.xpath('//span[@class="unit"]/text()')[0] layout = detail_html.xpath('//div[@class="room"]/ul/li[1]/text()')[0] floor = detail_html.xpath('//div[@class="room"]/ul/li[2]/text()')[0] area = detail_html.xpath('//div[@class="room"]/ul/li[3]/text()')[0] ``` 最后，我们将获取到的信息写入 CSV 文件中。 ```python import csv with open('houses.csv', 'w', newline='', encoding='utf-8-sig') as f: writer = csv.writer(f) writer.writerow(['标题', '小区名称', '单价', '总价', '户型', '所在楼层', '面积']) for i in range(1, 11): # 前十页 url = f'https://hz.58.com/ershoufang/kaifaqu/pn{i}/' response = requests.get(url, headers=headers) html = etree.HTML(response.text) house_list = html.xpath('//ul[@class="house-list-wrap"]/li') for house in house_list: detail_url = house.xpath('./div[@class="list-info"]/h2/a/@href')[0] detail_response = requests.get(detail_url, headers=headers) detail_html = etree.HTML(detail_response.text) title = detail_html.xpath('//div[@class="title"]/h1/text()')[0] community = detail_html.xpath('//div[@class="communityName"]/a/text()')[0] price = detail_html.xpath('//span[@class="total"]/text()')[0] unit_price = detail_html.xpath('//span[@class="unit"]/text()')[0] layout = detail_html.xpath('//div[@class="room"]/ul/li[1]/text()')[0] floor = detail_html.xpath('//div[@class="room"]/ul/li[2]/text()')[0] area = detail_html.xpath('//div[@class="room"]/ul/li[3]/text()')[0] writer.writerow([title, community, unit_price, price, layout, floor, area]) ``` 为了防止被反爬机制封禁 IP，我们可以使用一些反爬措施，如随机更换 User-Agent、设置间隔时间等。

用python的xpath方法获取58同城房产的二手房网站柯桥区域的前十页的每条二手房，在详情页面抓取标题、小区名称、单价、总价、户型、所在楼层、面积最后写入csv文件，加入反爬措施

相关推荐

基于python的二手房信息并进行数据处理与分析

python 爬取58二手房信息

Scrcpy框架爬取58同城，编程语言是Python

使用xpath爬取58同城二手房的信息

python使用xpath获取页面元素

用xpath写一个python爬取二手房网页数据

Python 如何获取页面所有Xpath

用xpath爬取58二手房信息

Python 自动获取页面上所有xpath

python爬虫链家指定区域二手房价格

python中xpath是不是不能获取跳转页面的源代码

python 使用xpath解析页面

58同城租房爬虫python

python爬虫爬取链家苏州二手房信息400条

python中使用xpath解析页面信息

python 通过uiauotmator抓取的xpath后如何反向获取其全部的xpath

安居客二手房爬虫xpath

用Python使用XPath查找和点击页面上的元素

最新推荐

Python Selenium自动化获取页面信息的方法

python的xpath获取div标签内html内容,实现innerhtml功能的方法

python-xpath获取html文档的部分内容

python爬虫之xpath的基本使用详解

Python使用xpath实现图片爬取

基于Springboot的医院信管系统

管理建模和仿真的文件

字符串转Float性能调优：优化Python字符串转Float性能的技巧和工具

Error: Cannot find module 'gulp-uglify

基于Springboot的冬奥会科普平台