python爬取网页表格_python提取网页表格并保存为csv
时间: 2024-04-26 21:20:42 浏览: 9
你可以使用 Python 的 requests 和 Beautiful Soup 库来爬取网页表格并保存为 CSV 文件。
首先,使用 requests 库获取网页的 HTML 内容:
```
import requests
url = 'http://example.com/table.html'
response = requests.get(url)
html_content = response.content
```
接着,使用 Beautiful Soup 库解析 HTML 内容,找到表格并提取数据:
```
from bs4 import BeautifulSoup
import csv
soup = BeautifulSoup(html_content, 'html.parser')
table = soup.find('table')
rows = []
for row in table.find_all('tr'):
cells = []
for cell in row.find_all(['td', 'th']):
cells.append(cell.get_text())
rows.append(cells)
```
最后,将数据保存为 CSV 文件:
```
with open('table.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerows(rows)
```
完整代码如下:
```
import requests
from bs4 import BeautifulSoup
import csv
url = 'http://example.com/table.html'
response = requests.get(url)
html_content = response.content
soup = BeautifulSoup(html_content, 'html.parser')
table = soup.find('table')
rows = []
for row in table.find_all('tr'):
cells = []
for cell in row.find_all(['td', 'th']):
cells.append(cell.get_text())
rows.append(cells)
with open('table.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerows(rows)
```
相关推荐
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)
![py](https://img-home.csdnimg.cn/images/20210720083646.png)