python爬取拉勾网代码
时间: 2023-10-29 14:03:02 浏览: 67
爬取拉勾网的代码可以使用Python的requests和BeautifulSoup库来实现。首先,我们需要导入这两个库:
```python
import requests
from bs4 import BeautifulSoup
```
接下来,我们可以使用requests库发送HTTP请求来获取拉勾网的页面内容:
```python
def get_page_content(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.text
else:
return None
```
然后,我们可以使用BeautifulSoup库来解析页面内容,提取我们需要的信息:
```python
def parse_page(content):
soup = BeautifulSoup(content, 'lxml')
job_list = soup.find_all('li', attrs={'class': 'con_list_item'})
for job in job_list:
job_title = job.find('h3').find('a').text
job_salary = job.find('span', attrs={'class': 'money'}).text
job_company = job.find('div', attrs={'class': 'company'}).find('a').text
print('岗位:', job_title)
print('薪资:', job_salary)
print('公司:', job_company)
print('-------------------------------------')
```
最后,我们可以将以上两个函数组合起来进行爬取:
```python
if __name__ == '__main__':
url = 'https://www.lagou.com/zhaopin/Python/?labelWords=label'
content = get_page_content(url)
if content:
parse_page(content)
```
这样,我们就可以使用上述代码对拉勾网的Python招聘信息进行爬取了。记得要遵守网站的爬取规则,并且注意不要对网站进行恶意操作,以免违反相关法律规定。