python爬取牛客网站招聘信息1000条id包含:编号、公司名称、学历要求、工作类型、工作名称、薪资、发布时间、截止时间、城市编码、公司规模、福利、岗位职责、地区、工作经验等信息,并以表格形式写入文本
时间: 2023-10-03 19:04:21 浏览: 85
基于python的招聘信息爬取与分析.pdf
5星 · 资源好评率100%
好的,这是一个比较复杂的爬虫任务,需要用到多个Python库和技术。下面是一个可能的实现方式:
```python
import requests
from bs4 import BeautifulSoup
import csv
# 定义一个函数,用于爬取一页的招聘信息
def scrape_page(page_num):
url = f'https://www.nowcoder.com/recommend/intern?skillType=0&city=0&jobType=0&page={page_num}'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
job_list = soup.find_all('li', class_='recommend-intern-item')
# 定义一个列表,用于保存所有招聘信息
job_info_list = []
# 遍历每一个招聘信息,提取所需的信息
for job in job_list:
job_id = job.get('data-recommend-id')
company_name = job.find('a', class_='company-name').text.strip()
education = job.find('span', class_='education').text.strip()
job_type = job.find('span', class_='job-type').text.strip()
job_name = job.find('a', class_='job-name').text.strip()
salary = job.find('span', class_='salary').text.strip()
publish_time = job.find('span', class_='publish-time').text.strip()
end_time = job.find('span', class_='end-time').text.strip()
city_code = job.find('span', class_='city-code').text.strip()
company_scale = job.find('span', class_='company-scale').text.strip()
welfare = job.find('span', class_='welfare').text.strip()
job_responsibility = job.find('div', class_='job-responsibility').text.strip()
area = job.find('span', class_='area').text.strip()
work_experience = job.find('span', class_='work-experience').text.strip()
# 将所提取的信息保存到一个字典中
job_info = {
'job_id': job_id,
'company_name': company_name,
'education': education,
'job_type': job_type,
'job_name': job_name,
'salary': salary,
'publish_time': publish_time,
'end_time': end_time,
'city_code': city_code,
'company_scale': company_scale,
'welfare': welfare,
'job_responsibility': job_responsibility,
'area': area,
'work_experience': work_experience
}
# 将该字典添加到job_info_list中
job_info_list.append(job_info)
return job_info_list
# 爬取前100页的招聘信息
all_job_info = []
for page_num in range(1, 101):
job_info = scrape_page(page_num)
all_job_info.extend(job_info)
# 将所有招聘信息写入csv文件
with open('jobs.csv', 'w', encoding='utf-8', newline='') as f:
writer = csv.writer(f)
writer.writerow(all_job_info[0].keys()) # 写入表头
for job_info in all_job_info:
writer.writerow(job_info.values())
```
这段代码使用了requests和BeautifulSoup库来爬取牛客网的招聘信息。首先定义了一个`scrape_page`函数,用于爬取一页的招聘信息,返回一个包含多个字典的列表。然后在主程序中循环调用该函数,将所有招聘信息保存到一个`all_job_info`列表中。最后使用csv库将所有招聘信息写入到一个csv文件中。
阅读全文