Python爬取国家统计局的省市区
时间: 2023-11-05 16:52:09 浏览: 137
python爬取国家省市区资料
可以使用Python中的requests和BeautifulSoup库来实现爬取国家统计局的省市区信息。以下是示例代码:
```python
import requests
from bs4 import BeautifulSoup
url = 'http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2019/index.html'
response = requests.get(url)
response.encoding = 'gbk'
soup = BeautifulSoup(response.text, 'html.parser')
province_list = soup.select('.provincetr a')
for province in province_list:
province_name = province.text.strip()
province_code = province['href'][:2]
print(province_code, province_name)
city_url = f'http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2019/{province_code}.html'
city_response = requests.get(city_url)
city_response.encoding = 'gbk'
city_soup = BeautifulSoup(city_response.text, 'html.parser')
city_list = city_soup.select('.citytr a')
for city in city_list:
city_name = city.text.strip()
city_code = city['href'][3:5]
print('\t', city_code, city_name)
county_url = f'http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2019/{province_code}/{city_code}.html'
county_response = requests.get(county_url)
county_response.encoding = 'gbk'
county_soup = BeautifulSoup(county_response.text, 'html.parser')
county_list = county_soup.select('.countytr a') if len(county_soup.select('.countytr a')) > 0 else county_soup.select('.towntr a')
for county in county_list:
county_name = county.text.strip()
county_code = county['href'][5:9]
print('\t\t', county_code, county_name)
```
以上代码可以输出各个省市区的名称和代码。注意需要指定网页编码为`gbk`。
阅读全文