首页写一个Ubuntu下的爬虫demo

写一个Ubuntu下的爬虫demo

时间: 2024-04-28 10:21:35 浏览: 50

好的，这里是一个基于 Python 和 BeautifulSoup 的简单爬虫 demo，可以在 Ubuntu 下运行：首先，在终端中安装必要的依赖库： ``` sudo apt-get update sudo apt-get install python3 python3-pip pip3 install requests beautifulsoup4 ``` 然后，创建一个名为 `crawler.py` 的 Python 文件，并在其中编写以下代码： ```python import requests from bs4 import BeautifulSoup url = 'https://www.example.com' response = requests.get(url) if response.status_code == 200: soup = BeautifulSoup(response.content, 'html.parser') links = soup.find_all('a') for link in links: href = link.get('href') if href: print(href) else: print('Error:', response.status_code) ``` 在这个 demo 中，我们使用 `requests` 库发送 HTTP 请求并获取网页内容，然后使用 `BeautifulSoup` 库解析 HTML 标签并提取网页中的链接，最后将链接打印到终端中。保存并运行 `crawler.py` 文件： ``` python3 crawler.py ``` 你可以将 `https://www.example.com` 替换成你想要爬取的网站地址，也可以根据具体需求修改代码。

阅读全文