首页批量爬取https://www.kuaidaili.com/free/，保存为json文件

批量爬取https://www.kuaidaili.com/free/，保存为json文件

时间: 2023-11-13 21:44:52 浏览: 45

好的，您需要使用Python编写爬虫程序来实现这个任务。以下是一个简单的实现方式： ```python import requests import json from bs4 import BeautifulSoup url = 'https://www.kuaidaili.com/free/' # 发送请求获取网页内容 response = requests.get(url) # 解析网页内容 soup = BeautifulSoup(response.text, 'html.parser') table = soup.find('table') # 获取表格中的数据 results = [] for tr in table.find_all('tr')[1:]: tds = tr.find_all('td') ip = tds[0].text.strip() port = tds[1].text.strip() protocol = tds[3].text.strip() results.append({'ip': ip, 'port': port, 'protocol': protocol}) # 将结果保存为json文件 with open('proxies.json', 'w') as f: json.dump(results, f) print('保存成功！') ``` 该程序使用requests库向目标网站发送请求，然后使用BeautifulSoup库解析网页内容，获取代理服务器的IP地址、端口号和协议类型，并将结果保存为json文件。您可以将该代码保存为proxy_spider.py文件并执行，即可在本地生成proxies.json文件。