用python的xpath方法获取贝壳二手房网站柯桥区域的前十页的每条二手房，在详情页面抓取标题、小区名称、单价、总价、房屋户型、楼层情况、面积、装修情况、朝向、建造年代、有无电梯、小区名称、所在区域、产权年限、产权性质，最后全部写入csv文件，加入反爬措施

首先需要安装相关库：requests、lxml、csv 和 time。代码如下： ```python import requests from lxml import etree import csv import time # 请求头信息 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} # 反爬虫措施 session = requests.Session() session.headers.update(headers) # url模板 url_template = 'https://hz.ke.com/ershoufang/keqiao/pg{}/' # 存储数据的列表 data_list = [] # 遍历前10页 for i in range(1, 11): url = url_template.format(i) # 获取页面数据 response = session.get(url) html = etree.HTML(response.text) # 获取每个房源详情页面的url detail_urls = html.xpath('//div[@class="info clear"]/div[@class="title"]/a/@href') for detail_url in detail_urls: # 获取详情页面数据 response = session.get(detail_url) html = etree.HTML(response.text) # 解析数据 title = html.xpath('//h1[@class="main"]/text()')[0] community = html.xpath('//div[@class="communityName"]/a[1]/text()')[0] unit_price = html.xpath('//span[@class="unitPriceValue"]/text()')[0] total_price = html.xpath('//span[@class="total"]/text()')[0] house_info = html.xpath('//div[@class="room"]/div[@class="mainInfo"]/text()')[0] floor_info = html.xpath('//div[@class="room"]/div[@class="subInfo"]/text()')[0] area = html.xpath('//div[@class="area"]/div[@class="mainInfo"]/text()')[0] decoration = html.xpath('//div[@class="subInfo"][1]/text()')[0] orientation = html.xpath('//div[@class="subInfo"][2]/text()')[0] year = html.xpath('//div[@class="subInfo"][3]/text()')[0] elevator = html.xpath('//div[@class="subInfo"][4]/text()')[0] community_name = html.xpath('//div[@class="areaName"]/span[@class="info"]/a[1]/text()')[0] location = html.xpath('//div[@class="areaName"]/span[@class="info"]/a[2]/text()')[0] limit = html.xpath('//div[@class="base"]/div[@class="content"][4]/text()')[0] property_type = html.xpath('//div[@class="base"]/div[@class="content"][5]/text()')[0] # 存储数据 data_list.append([title.strip(), community.strip(), unit_price.strip(), total_price.strip(), house_info.strip(), floor_info.strip(), area.strip(), decoration.strip(), orientation.strip(), year.strip(), elevator.strip(), community_name.strip(), location.strip(), limit.strip(), property_type.strip()]) # 防止访问过快被封IP time.sleep(1) # 将数据写入csv文件 with open('keqiao.csv', 'w', newline='', encoding='utf-8-sig') as f: writer = csv.writer(f) writer.writerow( ['标题', '小区名称', '单价', '总价', '房屋户型', '楼层情况', '面积', '装修情况', '朝向', '建造年代', '有无电梯', '小区名称', '所在区域', '产权年限', '产权性质']) writer.writerows(data_list) ``` 以上代码中，我们使用 xpath 方法来获取页面数据。同时，我们给请求加上了反爬虫措施，使用了会话保持和时间间隔等方法。最后将数据写入 csv 文件。

阅读全文

相关推荐

基于python的二手房信息并进行数据处理与分析

Python爬虫实验，抓取二手房数据和页面内容，实验源代码和设计报告

基于python的二手房数据分析资源合集

用python的xpath方法获取58同城房产的二手房网站柯桥区域的前十页的每条二手房，在详情页面抓取标题、小区名称、单价、总价、户型、所在楼层、面积最后写入csv文件，加入反爬措施

python xpath获取页面注释的方法

python使用xpath获取页面元素

Python Xpath 的使用

python Xpath语法的使用

python-xpath获取html文档的部分内容

python的xpath获取div标签内html内容,实现innerhtml功能的方法

Python爬虫实战：二手房数据与院校页面内容抓取

Python XPath爬虫实战：抓取百度网盘链接

python xpath获取全文

python XPATH

python 通过uiauotmator抓取的xpath后如何反向获取其全部的xpath

python中xpath是不是不能获取跳转页面的源代码

用xpath写一个python爬取二手房网页数据

python使用xpath

python xpath iframe src

贝壳二手房网站二手房信息爬虫

最新推荐

Python Selenium自动化获取页面信息的方法

python的xpath获取div标签内html内容,实现innerhtml功能的方法

python-xpath获取html文档的部分内容

python爬虫之xpath的基本使用详解

Python使用xpath实现图片爬取

高清艺术文字图标资源，PNG和ICO格式免费下载

管理建模和仿真的文件

DMA技术：绕过CPU实现高效数据传输

SGM8701电压比较器如何在低功耗电池供电系统中实现高效率运作？

mui框架HTML5应用界面组件使用示例教程