python爬虫: 1.获取58兼职网站上的招聘信息: 1、输入关键字,可列出兼职名称、内容与价格。(60%) 2、列出对于此兼职之额外要求,例如期限。(10%) 2.爬虫结果输出类型为json(10%)
时间: 2024-06-10 12:05:33 浏览: 77
python爬虫: 1、记录捕获各网站的数据 2、逆向JS(请求数据加密、响应数据解密)
ChitGPT: 好的,您想要实现一个 Python 爬虫程序,从58兼职网站上获取招聘信息,并将结果以json格式输出。了解了你的要求之后,我将为您提供一种可行的方案。您可以在Python中使用Requests与BeautifulSoup模块,使用Requests模块发起HTTP请求,利用BeautifulSoup模块解析HTML内容,最终获取所需数据,并将其转换为Json格式输出。以下是大致步骤:
1. 导入 requests 和 BeautifulSoup 模块
```python
import requests
from bs4 import BeautifulSoup
```
2. 构造请求URL,并发送请求获取网页内容
```python
def spider(keyword):
url = "https://jz.58.com/job/?key={}".format(keyword)
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
response = requests.get(url, headers=headers)
return response.text
```
3. 解析网页内容,提取招聘信息
```python
def parse(html):
soup = BeautifulSoup(html, "html.parser")
items = soup.find_all("li", {"class": "job_item"})
result = []
for item in items:
title = item.find("span", {"class": "title"}).text
price = item.find("div", {"class": "price"}).find("span").text.strip()
detail = item.find("a", {"class": "detail"}).get("href")
result.append({
"title": title,
"price": price,
"detail": detail
})
return result
```
4. 将提取到的招聘信息转化为Json格式并输出
```python
import json
def output_json(result):
json_str = json.dumps(result, ensure_ascii=False)
print(json_str)
```
5. 最终代码如下:
```python
import requests
from bs4 import BeautifulSoup
import json
def spider(keyword):
url = "https://jz.58.com/job/?key={}".format(keyword)
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
response = requests.get(url, headers=headers)
return response.text
def parse(html):
soup = BeautifulSoup(html, "html.parser")
items = soup.find_all("li", {"class": "job_item"})
result = []
for item in items:
title = item.find("span", {"class": "title"}).text
price = item.find("div", {"class": "price"}).find("span").text.strip()
detail = item.find("a", {"class": "detail"}).get("href")
result.append({
"title": title,
"price": price,
"detail": detail
})
return result
def output_json(result):
json_str = json.dumps(result, ensure_ascii=False)
print(json_str)
if __name__ == '__main__':
keyword = "兼职"
html = spider(keyword)
result = parse(html)
output_json(result)
```
以上是一个简单的Python爬虫示例,可以根据实际情况对代码进行适当的修改。由于ChitGPT无法执行实际的代码,这里仅做参考。
阅读全文