2023行政区划采集
时间: 2024-01-12 10:20:40 浏览: 29
为了采集2023年的行政区划数据,我们可以使用Python中的爬虫库进行数据采集。以下是一个简单的示例:
```python
import requests
import json
url = 'http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2020/index.html'
response = requests.get(url)
response.encoding = 'gbk'
html = response.text
# 解析html获取各级行政区划的链接
province_links = []
city_links = []
county_links = []
town_links = []
village_links = []
for line in html.split('\n'):
if 'province' in line:
province_links.append(line.split('"')[1])
elif 'city' in line:
city_links.append(line.split('"')[1])
elif 'county' in line:
county_links.append(line.split('"')[1])
elif 'town' in line:
town_links.append(line.split('"')[1])
elif 'village' in line:
village_links.append(line.split('"')[1])
# 获取各级行政区划的数据
data = []
for link in province_links:
response = requests.get(link)
response.encoding = 'gbk'
html = response.text
for line in html.split('\n'):
if 'province' in line:
province_code = line.split('>')[1].split('<')[0]
elif 'city' in line:
city_code = line.split('>')[1].split('<')[0]
city_name = line.split('>')[2].split('<')[0]
data.append({'id': city_code, 'pid': province_code, 'aname': city_name, 'lv': 2})
elif 'county' in line:
county_code = line.split('>')[1].split('<')[0]
county_name = line.split('>')[2].split('<')[0]
data.append({'id': county_code, 'pid': city_code, 'aname': county_name, 'lv': 3})
elif 'town' in line:
town_code = line.split('>')[1].split('<')[0]
town_name = line.split('>')[2].split('<')[0]
data.append({'id': town_code, 'pid': county_code, 'aname': town_name, 'lv': 4})
elif 'village' in line:
village_code = line.split('>')[1].split('<')[0]
village_name = line.split('>')[2].split('<')[0]
data.append({'id': village_code, 'pid': town_code, 'aname': village_name, 'lv': 5})
# 将数据保存到文件或数据库中
with open('data.json', 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False)
```
该示例中,我们首先使用requests库获取2020年的行政区划数据的网页内容,然后解析html获取各级行政区划的链接。接着,我们依次访问各级行政区划的链接,解析html获取各级行政区划的数据,并将数据保存到一个列表中。最后,我们将数据保存到文件或数据库中。
相关推荐
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![json](https://img-home.csdnimg.cn/images/20210720083646.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)