python爬虫爬取前程无忧的java工程师前五页,并将结果写入csv文件中
时间: 2024-03-14 18:48:59 浏览: 82
以下是一个基于Python的爬取前程无忧Java工程师职位信息并写入CSV文件的代码示例:
```python
import requests
import csv
from bs4 import BeautifulSoup
# 设置请求头,模拟浏览器访问
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
# 获取前程无忧Java工程师职位信息
def get_job_info(page):
url = f'https://search.51job.com/list/010000,000000,0000,00,9,99,Java%25E5%25BC%2580%25E5%258F%2591%25E5%25B7%25A5%25E7%25A8%258B%25E5%25B8%2588,2,{page}.html'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
job_list = soup.select('div.el')
job_info_list = []
for job in job_list:
try:
job_name = job.select_one('a').get('title')
company_name = job.select_one('span.t2').get('title')
location = job.select_one('span.t3').get_text()
salary = job.select_one('span.t4').get_text()
publish_date = job.select_one('span.t5').get_text()
job_info = [job_name, company_name, location, salary, publish_date]
job_info_list.append(job_info)
except Exception as e:
print(e)
return job_info_list
# 将职位信息写入CSV文件
def write_to_csv(job_info_list):
with open('job_info.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['职位名称', '公司名称', '工作地点', '薪资', '发布日期'])
for job_info in job_info_list:
writer.writerow(job_info)
# 爬取前五页职位信息并写入CSV文件
job_info_list = []
for i in range(1, 6):
job_info_list += get_job_info(i)
write_to_csv(job_info_list)
print('职位信息已保存至job_info.csv文件中')
```
以上代码会爬取前程无忧Java工程师职位信息的前五页,并将职位名称、公司名称、工作地点、薪资和发布日期等信息写入CSV文件中。需要注意的是,由于网页结构和反爬虫策略的不同,以上代码仅供参考,实际应用中需要根据具体情况进行适当的修改。
阅读全文