利用beautifulsoup4爬取网页http://vip.stock.finance.sina.com.cn/corp/go.php/vFD_BalanceSheet/stockid/000568/ctrl/part/displaytype/4.phtml中BalanceSheetNewTable0 > tbody表格的内容并利用panda库将其内容存储到excel表格中
时间: 2023-08-23 22:05:32 浏览: 144
好的,这是一个关于使用beautifulsoup4和pandas库的网页爬取和数据处理的问题。下面是代码实现的步骤:
1. 导入需要的库
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
```
2. 获取网页内容
```python
url = 'http://vip.stock.finance.sina.com.cn/corp/go.php/vFD_BalanceSheet/stockid/000568/ctrl/part/displaytype/4.phtml'
response = requests.get(url)
html = response.content
```
3. 解析网页内容
```python
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table', attrs={'id': 'BalanceSheetNewTable0'})
```
4. 获取表格内容并保存到pandas的DataFrame对象中
```python
data = []
for row in table.tbody.find_all('tr'):
cols = []
for col in row.find_all(['td', 'th']):
cols.append(col.text.strip())
data.append(cols)
df = pd.DataFrame(data[1:], columns=data[0])
```
5. 将数据保存到excel表格中
```python
df.to_excel('balance_sheet.xlsx', index=False)
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'http://vip.stock.finance.sina.com.cn/corp/go.php/vFD_BalanceSheet/stockid/000568/ctrl/part/displaytype/4.phtml'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table', attrs={'id': 'BalanceSheetNewTable0'})
data = []
for row in table.tbody.find_all('tr'):
cols = []
for col in row.find_all(['td', 'th']):
cols.append(col.text.strip())
data.append(cols)
df = pd.DataFrame(data[1:], columns=data[0])
df.to_excel('balance_sheet.xlsx', index=False)
```
将代码保存到python文件中,运行即可将表格内容保存到excel文件中。
阅读全文