Scrapy-redis爬取链家二手房房价代码
时间: 2023-10-08 13:08:01 浏览: 138
下面是使用Scrapy-redis爬取链家二手房房价的代码:
1. 创建一个Scrapy项目
```
$ scrapy startproject lianjia
```
2. 修改settings.py配置文件,启用Redis
```
# 启用Redis调度器
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
# 启用Redis去重
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
# 启用Redis队列
SCHEDULER_QUEUE_CLASS = "scrapy_redis.queue.SpiderPriorityQueue"
# 设置Redis连接信息
REDIS_HOST = 'localhost'
REDIS_PORT = 6379
```
3. 创建一个Spider
```
$ scrapy genspider lianjia_spider lianjia.com
```
4. 修改lianjia_spider.py
```
# 引入RedisSpider
from scrapy_redis.spiders import RedisSpider
class LianjiaSpider(RedisSpider):
name = 'lianjia_spider'
allowed_domains = ['lianjia.com']
# 设定爬取的初始链接
def start_requests(self):
url = 'https://sh.lianjia.com/ershoufang/'
yield scrapy.Request(url=url, callback=self.parse)
# 解析二手房列表页
def parse(self, response):
# 获取下一页链接并加入Redis队列
next_page = response.css('div.page-box a.next::attr(href)').get()
if next_page is not None:
next_page_url = response.urljoin(next_page)
self.server.rpush('lianjia_spider:start_urls', next_page_url)
# 解析二手房信息
house_list = response.css('ul.sellListContent li.clear')
for house in house_list:
# 获取房屋名称和链接
name = house.css('div.title a::text').get()
link = house.css('div.title a::attr(href)').get()
# 获取房屋价格
price = house.css('div.totalPrice span::text').get()
# 获取房屋面积和户型
area = house.css('div.houseInfo a::text').getall()[1]
layout = house.css('div.houseInfo a::text').getall()[0]
# 输出房屋信息
print(name, link, price, area, layout)
```
5. 运行爬虫
```
$ scrapy runspider lianjia_spider.py
```
6. 启动Redis队列
```
$ redis-server
```
7. 启动Redis Spider
```
$ scrapy runspider lianjia_spider.py
```
运行后,Scrapy会将初始链接加入Redis队列,并从队列中获取链接进行爬取。当爬取到列表页时,会将下一页的链接加入Redis队列,从而实现了分布式爬取。爬取到的房屋信息会输出到控制台。
阅读全文