selenium爬取国家统计局
时间: 2024-01-26 15:14:02 浏览: 110
以下是使用Selenium和BeautifulSoup爬取国家统计局数据的示例代码:
```python
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
from time import sleep
import pandas as pd
# 设置Chrome浏览器的选项
chrome_options = Options()
chrome_options.add_argument('--headless') # 无界面模式
chrome_options.add_argument('--disable-gpu') # 禁用GPU加速
# 创建Chrome浏览器实例
driver = webdriver.Chrome(options=chrome_options)
# 打开国家统计局网站
driver.get('http://www.stats.gov.cn/')
# 等待页面加载完成
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.ID, 'dataList')))
# 获取页面源代码
html = driver.page_source
# 使用BeautifulSoup解析页面
soup = BeautifulSoup(html, 'html.parser')
# 提取数据
data_list = []
table = soup.find('table', id='dataList')
rows = table.find_all('tr')
for row in rows:
cols = row.find_all('td')
if len(cols) > 0:
data = [col.get_text() for col in cols]
data_list.append(data)
# 关闭浏览器
driver.quit()
# 将数据转换为DataFrame
df = pd.DataFrame(data_list)
# 输出数据
print(df)
```
阅读全文