请帮我写一个爬取拉勾网职位信息的爬虫代码
时间: 2023-06-10 08:04:27 浏览: 169
爬取拉勾网职位信息的爬虫
好的,我可以帮你写一个Python的爬虫代码,使用requests和BeautifulSoup库来爬取拉勾网的职位信息。以下是代码:
```python
import requests
from bs4 import BeautifulSoup
def get_job_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
job_name = soup.select('.job-name')[0].text.strip()
salary = soup.select('.job_request .salary')[0].text.strip()
experience = soup.select('.job_request p')[1].text.strip()
education = soup.select('.job_request p')[2].text.strip()
tags = [tag.text.strip() for tag in soup.select('.position-label li')]
company = soup.select('.company')[0].text.strip()
return {'Job': job_name, 'Salary': salary, 'Experience': experience, 'Education': education, 'Tags': tags, 'Company': company}
def get_jobs_list(city, keyword, pages=1):
jobs_list = []
for page in range(1, pages+1):
url = f'https://www.lagou.com/{city}/zhaopin/{keyword}/{str(page)}/?filterOption=3'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
job_items = soup.select('.itemcon')
for job in job_items:
job_url = job.select('.positionlink')[0].get('href')
job_info = get_job_info(job_url)
jobs_list.append(job_info)
return jobs_list
jobs = get_jobs_list('beijing', 'python', pages=2)
for job in jobs:
print(job)
```
该爬虫代码可以爬取拉勾网北京地区Python职位的信息。你可以替换city和keyword变量的值来搜索其他城市和职位需求。pages参数是可选的,默认为1,如果设置为2,则爬取前两页的职位信息。
以上是该爬虫代码的实现,如果你有其他问题,可以随时问我。
阅读全文