import requests from bs4 import BeautifulSoup url = 'https://search.51job.com/list/010000,000000,0000,00,9,99,python,2,1.html' r = requests.get(url) soup = BeautifulSoup(r.text, 'html.parser') job_list = soup.find_all('div', class_='el') for job in job_list: job_name = job.find('a').text.strip() company_name = job.find('span', class_='t2').text.strip() location = job.find('span', class_='t3').text.strip() salary = job.find('span', class_='t4').text.strip() print(f'{job_name} | {company_name} | {location} | {salary}')

时间: 2023-10-04 11:08:34 浏览: 171

这段代码是之前我给出的爬取 51job 网站数据的代码。这个代码可以爬取 51job 网站上关键词为 python 的职位信息，并输出职位名称、公司名称、工作地点和薪资。你可以根据需要修改关键词或输出的信息。需要注意的是，爬取网站数据时需要遵守网站的 robots.txt 文件，不要过于频繁地爬取同一个页面。

优化代码#coding=utf-8 import requests from bs4 import BeautifulSoup url = 'https://coi.hzau.edu.cn/' header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)' +\ 'AppleWebKit/537.36 (KHTML, like Gecko)' +\ 'Chrome/113.0.0.0' +\ 'Safari/537.36 Edg/113.0.1774.57'} response = requests.get(url, headers=header) soup = BeautifulSoup(response.text, 'html.parser') articles = soup.select(".list_right .list_right_list li") teacher_list = soup.find_al1("li", class_="name") for teacher in teacher_list: name = teacher.a.text.strip()#获取老师姓名 position = teacher.span.text.strip()#获取老师职位 print(f"姓名:{name}") print(f"职位:{position}") print("-------------")

以下是优化的代码： ```python import requests from bs4 import BeautifulSoup url = 'https://coi.hzau.edu.cn/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.57' } response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') teacher_list = soup.select("ul.job-list li.name") for teacher in teacher_list: name = teacher.a.text.strip() position = teacher.span.text.strip() print(f"姓名:{name}") print(f"职位:{position}") print("-------------") ``` 优化的地方包括： 1. 将请求头信息放在一个字典中，使代码更加简洁。 2. 将文章列表的选择器和老师列表的选择器放在一个列表里，提高代码可维护性。 3. 使用 CSS 选择器代替 find_all() 方法，提高代码的效率。 4. 将老师列表的选择器修改为 ".name"，可以直接获取所有老师的信息。

import requests from bs4 import BeautifulSoup import re import pandas as pd # 目标招聘网站的URL（以智联招聘为例） url = "https://www.58.com/ppezp2023083158986/" # 请求头，模拟浏览器访问 headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36" } # 发送HTTP请求，获取网页内容 response = requests.get(url, headers=headers) html_content = response.text # 使用BeautifulSoup解析HTML soup = BeautifulSoup(html_content, "html.parser") # 查找所有招聘信息的标签（根据实际网页结构调整） job_list = soup.find_all("div", class_="joblist__item") print(job_list) # 初始化一个空列表，用于存储提取的数据 jobs_data = [] # 正则表达式模式，用于提取薪资范围（示例） salary_pattern = re.compile(r"(\d+-\d+)千/月") # 遍历每个招聘信息 for job in job_list: # 提取职位名称 job_title = job.find("span", class_="jobname__title").text.strip() print(job_title) # 提取公司名称 company_name = job.find("a", class_="company__title").text.strip() # 提取工作地点 location = job.find("span", class_="job__location").text.strip() # 提取薪资范围（使用正则表达式） salary_text = job.find("span", class_="job__salary").text.strip() salary_match = salary_pattern.search(salary_text) salary = salary_match.group(1) if salary_match else "面议" # 提取工作经验要求 experience = job.find("span", class_="job__experience").text.strip() # 提取学历要求 education = job.find("span", class_="job__education").text.strip() # 提取职位描述 description = job.find("div", class_="job__desc").text.strip() # 将提取的数据存储为字典 job_info = { "职位名称": job_title, "公司名称": company_name, "工作地点": location, "薪资范围": salary, "工作经验": experience, "学历要求": education, "职位描述": description } # 将字典添加到列表中 jobs_data.append(job_info) print(job_info) # 将数据存储到DataFrame中 df = pd.DataFrame(jobs_data) # 保存到Excel文件 df.to_excel("招聘信息.xlsx", index=False) print("数据爬取完成，已保存到招聘信息.xlsx")

### Python 网络爬虫抓取招聘信息并保存至 Excel 为了实现从招聘网站抓取招聘信息并将信息保存到 Excel 文件的功能，可以采用 `requests` 库代替 `urllib.request` 来发送 HTTP 请求，因为前者更易于使用；利用 `BeautifulSoup` 解析 HTML 文档以提取所需的信息；最后通过 `openpyxl` 或者 `pandas` 将数据写入 Excel 表格。下面是一段完整的示例代码： ```python import requests from bs4 import BeautifulSoup import pandas as pd def fetch_job_listings(url): response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') jobs = [] for job_card in soup.find_all('div', class_='job-card'): title_element = job_card.select_one('.title a') company_element = job_card.select_one('.company-name span') location_element = job_card.select_one('.location') if all([title_element, company_element, location_element]): job_info = { 'Title': title_element.get_text(strip=True), 'Company': company_element.get_text(strip=True), 'Location': location_element.get_text(strip=True).strip() } jobs.append(job_info) return jobs url = "https://example.com/jobs" # 替换为目标招聘网站的实际URL jobs_data = fetch_job_listings(url) df = pd.DataFrame(jobs_data) output_file_path = './recruitment_information.xlsx' df.to_excel(output_file_path, index=False) print(f'Jobs data has been saved to {output_file_path}') ``` 这段脚本定义了一个函数 `fetch_job_listings()` ，它接收一个 URL 参数作为输入，并返回包含职位名称、公司名以及工作地点在内的字典列表。接着创建 Pandas DataFrame 对象以便于后续处理和导出操作。最终调用 `.to_excel()` 方法把 DataFrame 的内容写出到本地磁盘上的 Excel 文件中[^1]。请注意，在实际应用此代码前需调整 CSS Selectors (`class_`, `select_one`) 以匹配目标页面的具体结构。此外还需确保遵守目标站点的服务条款，合理设置请求频率以免给服务器造成过大负担[^2]。

阅读全文

相关推荐

Python爬虫实战：抓取http://www.win4000.com/美桌图片

Python爬虫实战教程：PPT/Word/影视/电子书全攻略

Python项目大小：HTTP/HTTPS请求实现与估算方法

利用pathon爬取https://www.liepin.com/company-jobs/954482/的招聘信息的源代码

.编写程序，从“人才热线”网站爬取与“python”有关的招聘岗位列表信息(网页地址为:https://s.cjol.com/kw-python/? SearchT pe=3)。

https://www.lagou.com/wn/爬取该网页职位名称，薪资待遇，学历，企业类型，工作地点数据保存为CSV文件的python代码

https://www.job5156.com/用这个网址获取搜索页面中推荐的公司名称，地址，招聘要求，工资，招聘岗位，招聘信息的静态数据

帮我爬取https://www.huibo.com/cq/jobs/all-b0300/这个网站的招聘信息，工资待遇，要求，经验，从事年龄的代码，要求用csv格式，保存为excel文档，要求做到可视化和词频统计

python爬取这个网站https://www.nowcoder.com/jobs/intern/center?recruitType=2招聘信息100条，包括公司名称、岗位要求、工作类型、工作名称、薪资、城市名称、公司规模、岗位职责、每周工作天数等信息，并写入文本

利用Python爬虫requests+BeautifulSoup实现丁香营销师招聘爬取（源码）

内推是企业一种比较新颖的招聘方式，通过内推能够让人才更高效、自由的流动，使招聘变得更有效率、更具情感。通过逆向分析法获取泰迪内推平台首页“https://www.5iai.com/#/index”中“热门职位”栏下的职位名称。

爬虫部分：使用Python的 requests 和 BeautifulSoup 库获取网站上的数据，并解析成结构化的数据爬取51job网站

python使用beautifulsoup爬取51job招聘信息

requests爬取51job招聘网数据

使用Python爬取51job招聘信息，使用requests库，json库在51job主页爬取福州JAVA工程师相关招聘信息，保留爬取的信息页面信息，每一行职位信息用50个“-”符号分隔，将爬取到的数据保存为txt文档

爬取51job招聘信息，使用requests库、json库在51job主页爬取福州JAVA工程师相关招聘信息，保留爬取的信息页面信息，每一行职位信息用50个“-”符号分隔，将爬取到的数据保存为txt文档。代码示例

大家在看

silvaco中文学习资料

AES128（CBC或者ECB）源码

EMC VNX 5300使用安装

华为MA5671光猫使用 华为MA5671补全shell 101版本可以补全shell，安装后自动补全，亲测好用，需要的可以下载

视频转换芯片 TP9950 iic 驱动代码

最新推荐

智慧园区3D可视化解决方案PPT(24页).pptx

掌握Android RecyclerView拖拽与滑动删除功能

【IBM HttpServer入门全攻略】：一步到位的安装与基础配置教程

[root@localhost~]#mount-tcifs-0username=administrator,password=hrb.123456//192.168.100.1/ygptData/home/win mount：/home/win：挂载点不存在

惠普8594E与IT8500系列电子负载使用教程

MATLAB与Python在SAR点目标仿真中的对决：哪种工具更胜一筹？

前端代理配置config.js配置proxyTable多个代理不生效

最小二乘法程序深入解析与应用案例

SAR点目标仿真应用指南：案例研究与系统设计实战

eclipse为项目配置jdk

华为MA5671光猫使用华为MA5671补全shell 101版本可以补全shell，安装后自动补全，亲测好用，需要的可以下载