python爬虫案例 爬虫51job代码
时间: 2024-09-23 12:01:46 浏览: 64
Python爬虫案例中,抓取51job网站的信息是一个常见的实战练习。下面是一个简单的Python Scrapy框架爬取51job职位信息的示例,Scrapy是一个强大的网络爬虫框架:
```python
# 导入必要的库
import scrapy
# 创建一个Spider类
class JobSpider(scrapy.Spider):
name = '51job_spider'
allowed_domains = ['www.51job.com']
start_urls = ['http://search.51job.com/list/000000,000000,0000,00,9,99,python,2.html'] # 起始URL
def parse(self, response):
# 解析HTML响应,找到职位列表
job_elements = response.css('div.job-item')
for job in job_elements:
title = job.css('.title a::text').get() # 提取职位标题
company = job.css('.company::text').get() # 提取公司名
salary = job.css('.salary span::text').get() # 提取薪资范围
location = job.css('.location::text').get() # 提取工作地点
yield {
'title': title,
'company': company,
'salary': salary,
'location': location,
}
# 分页处理,如果存在下一页链接,继续请求并解析
next_page = response.css('a.next::attr(href)').get()
if next_page is not None:
yield response.follow(next_page, self.parse)
阅读全文