首页用python爬取http://www.stats.gov.cn/sj/zxfb/202302/t20230228_1919011.html的数据并放入数据库

用python爬取http://www.stats.gov.cn/sj/zxfb/202302/t20230228_1919011.html的数据并放入数据库

时间: 2023-07-19 14:47:49 浏览: 285

好的，我可以为您提供一个基本的爬虫代码，您可以根据自己的需要进行修改和完善。首先，您需要安装Python的requests和BeautifulSoup库，可以使用以下命令进行安装： ``` pip install requests pip install beautifulsoup4 ``` 接下来，您可以使用以下代码爬取并存储数据到数据库中： ```python import requests from bs4 import BeautifulSoup import sqlite3 # 请求数据 url = 'http://www.stats.gov.cn/sj/zxfb/202302/t20230228_1919011.html' response = requests.get(url) html = response.text # 解析数据 soup = BeautifulSoup(html, 'html.parser') table = soup.find_all('table')[0] rows = table.find_all('tr') data = [] for row in rows: cols = row.find_all('td') cols = [col.text.strip() for col in cols] data.append(cols) # 存储数据到数据库 conn = sqlite3.connect('data.db') c = conn.cursor() c.execute('CREATE TABLE IF NOT EXISTS stats (id INTEGER PRIMARY KEY AUTOINCREMENT, year INTEGER, month INTEGER, value REAL)') for cols in data[1:]: year, month, value = cols[0], cols[1], cols[2] c.execute('INSERT INTO stats (year, month, value) VALUES (?, ?, ?)', (year, month, value)) conn.commit() conn.close() ``` 这段代码使用requests库获取页面的HTML代码，使用BeautifulSoup库解析HTML代码，并将数据存储到SQLite数据库中。您可以根据自己的需要修改存储方式或者采用其他数据库。

阅读全文