使用python爬取https://www.kaggle.com/datasets/sudhanshu2198/oil-spill-detection上的数据集
时间: 2023-07-19 10:52:40 浏览: 95
好的,您可以按照以下步骤使用Python爬取https://www.kaggle.com/datasets/sudhanshu2198/oil-spill-detection上的数据集:
1. 导入必要的库
```python
import requests
from bs4 import BeautifulSoup
import os
```
2. 发送请求并解析网页
```python
url = "https://www.kaggle.com/datasets/sudhanshu2198/oil-spill-detection"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
```
3. 查找下载链接
```python
download_link = soup.find("a", attrs={"id": "downloadButton"})['href']
```
4. 下载数据集
```python
response = requests.get(download_link, stream=True)
file_size = int(response.headers.get("Content-Length", 0))
filename = os.path.join(os.getcwd(), "oil_spill_detection.zip")
with open(filename, "wb") as f:
for data in response.iter_content(1024):
f.write(data)
```
这样,您就可以使用Python成功地爬取https://www.kaggle.com/datasets/sudhanshu2198/oil-spill-detection上的数据集。
阅读全文