用python写一个爬虫框架
时间: 2023-07-19 07:48:06 浏览: 44
好的,我可以为您提供一个简单的Python爬虫框架,以下是代码:
```python
import requests
from bs4 import BeautifulSoup
class Spider:
def __init__(self, url):
self.url = url
def fetch(self):
response = requests.get(self.url)
return response.text
def parse(self, html):
soup = BeautifulSoup(html, 'html.parser')
links = [link.get('href') for link in soup.find_all('a')]
return links
def run(self):
html = self.fetch()
links = self.parse(html)
for link in links:
print(link)
# 测试
url = 'https://www.example.com'
spider = Spider(url)
spider.run()
```
这个爬虫框架使用了requests库来获取网页内容,使用BeautifulSoup库来解析网页,并提取所有链接。在测试的时候,您只需要将您想要爬取的网页链接替换到url变量即可。如果您需要更高级的爬虫框架,可以考虑使用Scrapy框架。