pyhton 爬取广东省统计年鉴数据
时间: 2023-06-19 19:08:09 浏览: 380
可以使用 Python 中的 requests 和 BeautifulSoup 库来爬取广东省统计年鉴数据。步骤如下:
1. 打开数据源网站 http://tjj.gd.gov.cn/data/yearbook/2021/index.html
2. 使用 requests 库发送 GET 请求获取网页源码,代码如下:
```python
import requests
url = 'http://tjj.gd.gov.cn/data/yearbook/2021/index.html'
response = requests.get(url)
html = response.text
```
3. 使用 BeautifulSoup 库解析网页源码,获取需要的数据,代码如下:
```python
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table', class_='table table-bordered table-hover')
thead = table.thead.tr
tbody = table.tbody
years = [th.text for th in thead.find_all('th')[1:]]
data = {}
for tr in tbody.find_all('tr'):
tds = tr.find_all('td')
topic = tds[0].text
values = [td.text for td in tds[1:-1]]
data[topic] = dict(zip(years, values))
```
4. 将数据保存到本地文件,代码如下:
```python
import json
with open('data.json', 'w') as f:
json.dump(data, f)
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
import json
url = 'http://tjj.gd.gov.cn/data/yearbook/2021/index.html'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table', class_='table table-bordered table-hover')
thead = table.thead.tr
tbody = table.tbody
years = [th.text for th in thead.find_all('th')[1:]]
data = {}
for tr in tbody.find_all('tr'):
tds = tr.find_all('td')
topic = tds[0].text
values = [td.text for td in tds[1:-1]]
data[topic] = dict(zip(years, values))
with open('data.json', 'w') as f:
json.dump(data, f)
```
这样就可以将广东省统计年鉴数据爬取下来并保存到本地文件中了。
阅读全文