51job 爬虫 2023
时间: 2023-12-17 17:28:24 浏览: 91
关于51job网站的爬虫课程设计
根据提供的引用内容,我可以为您提供使用Python Scrapy框架爬取51Job职位信息的方法,具体步骤如下:
1.安装Scrapy框架
```shell
pip install scrapy
```
2.创建Scrapy项目
```shell
scrapy startproject job_spider
```
3.创建爬虫
```shell
cd job_spider
scrapy genspider job job.com
```
4.在job_spider/spiders/job.py文件中编写爬虫代码,具体代码如下:
```python
import scrapy
class JobSpider(scrapy.Spider):
name = 'job'
allowed_domains = ['51job.com']
start_urls = ['https://search.51job.com/list/000000,000000,0000,00,9,99,python,2,1.html']
def parse(self, response):
job_list = response.xpath('//div[@class="el"]')
for job in job_list:
item = {}
item['position'] = job.xpath('.//p/span/a/@title').extract_first()
item['company'] = job.xpath('.//span[@class="t2"]/a/@title').extract_first()
item['location'] = job.xpath('.//span[@class="t3"]/text()').extract_first()
item['salary'] = job.xpath('.//span[@class="t4"]/text()').extract_first()
item['release_date'] = job.xpath('.//span[@class="t5"]/text()').extract_first()
yield item
next_page = response.xpath('//div[@class="p_in"]/ul/li[last()]/a/@href')
if next_page:
yield response.follow(next_page[0], self.parse)
```
5.运行爬虫
```shell
scrapy crawl job -o job.csv
```
以上代码将会爬取51Job上Python职位的信息,并将结果保存在job.csv文件中。
阅读全文