首页爬虫requests库和bs4库和pandas库

爬虫requests库和bs4库和pandas库

时间: 2023-07-09 20:51:49 浏览: 61

爬虫数据可以使用 requests 库获取，然后使用 BeautifulSoup(bs4) 库进行解析，最后使用 pandas 库进行数据处理和写入 Excel 文件。以下是一个示例代码： ```python import requests from bs4 import BeautifulSoup import pandas as pd # 请求数据 url = 'https://www.example.com/data' response = requests.get(url) html = response.text # 解析数据 soup = BeautifulSoup(html, 'html.parser') table = soup.find('table') rows = table.find_all('tr') data = [] for row in rows: cols = row.find_all('td') cols = [col.text.strip() for col in cols] data.append(cols) # 转化为 pandas 的 DataFrame 格式 df = pd.DataFrame(data) # 写入 Excel 文件 df.to_excel('data.xlsx', index=False) ``` 其中，`url` 是请求数据的网址，`response = requests.get(url)` 发送 HTTP GET 请求，`response.text` 获取响应数据的 HTML 文本。`BeautifulSoup(html, 'html.parser')` 使用 BeautifulSoup 解析 HTML 文本，`soup.find('table')` 获取 HTML 文档中的数据表，`table.find_all('tr')` 获取表格中所有的数据行，`cols = [col.text.strip() for col in cols]` 将数据行中的每列数据转化为字符串并去除空格。最后，`pd.DataFrame(data)` 将数据转化为 pandas DataFrame 格式，`df.to_excel('data.xlsx', index=False)` 将数据写入名为 `data.xlsx` 的 Excel 文件中。