python爬取网页表格数据
时间: 2023-10-08 20:08:30 浏览: 37
要爬取网页表格数据,可以使用 Python 的 requests 和 BeautifulSoup 库。
首先,使用 requests 库获取网页内容:
```python
import requests
url = "https://example.com/table-page"
response = requests.get(url)
html = response.content
```
接着,使用 BeautifulSoup 库解析 HTML,并找到表格:
```python
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")
```
然后,遍历表格的行和列,将数据存储在列表中:
```python
data = []
rows = table.find_all("tr")
for row in rows:
cols = row.find_all("td")
row_data = []
for col in cols:
row_data.append(col.text.strip())
data.append(row_data)
```
最后,将数据写入 CSV 文件:
```python
import csv
with open("table.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(data)
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
import csv
url = "https://example.com/table-page"
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")
data = []
rows = table.find_all("tr")
for row in rows:
cols = row.find_all("td")
row_data = []
for col in cols:
row_data.append(col.text.strip())
data.append(row_data)
with open("table.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(data)
```