首页用Jupyter—Notebook爬取网页数据实例

用Jupyter—Notebook爬取网页数据实例

时间: 2024-07-15 08:00:50 浏览: 510

在Jupyter Notebook中使用Python爬取网页数据是一个常见的数据获取任务，可以利用诸如`requests`库来发送HTTP请求，然后使用`BeautifulSoup`或`pandas`的`read_html`等工具解析HTML内容。以下是一个简单的例子：首先，确保已安装必要的库： ```bash pip install requests beautifulsoup4 pandas ``` 接下来，创建一个Jupyter Notebook，开始编写代码： ```python # 导入所需库 import requests from bs4 import BeautifulSoup import pandas as pd # 定义要抓取的URL url = "https://www.example.com" # 替换为你想要爬取的网站地址 # 发送GET请求 response = requests.get(url) # 检查请求是否成功 if response.status_code == 200: # 使用BeautifulSoup解析HTML soup = BeautifulSoup(response.text, 'html.parser') # 找到你需要的数据（例如，表格、段落或链接） # 这里假设你找到的是一个包含数据的表格 table = soup.find('table') # 可能需要根据实际结构调整 # 将HTML解析为pandas DataFrame if table: data = pd.read_html(str(table)) # 读取表格数据，假设只有一个表格 else: data = None # 如果没有找到表格，data将保持None # 打印或保存数据 if data is not None: print(data.head()) # 显示数据的前几行 # 或者，保存到文件 data.to_csv('web_data.csv', index=False) else: print(f"请求失败，状态码: {response.status_code}") ```

阅读全文