首页如何使用requests库和BeautifulSoup库实现网络爬虫

如何使用requests库和BeautifulSoup库实现网络爬虫

时间: 2024-02-25 08:58:39 浏览: 82

使用requests库和BeautifulSoup库可以方便地实现网络爬虫，以下是一个简单的示例代码： ```python from bs4 import BeautifulSoup import requests url = 'http://www.example.com' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # 提取标题和链接 for link in soup.find_all('a'): print(link.get('href'), link.text) # 提取图片链接并下载图片 for img in soup.find_all('img'): img_url = img.get('src') img_data = requests.get(img_url).content with open('image.jpg', 'wb') as f: f.write(img_data) ``` 以上代码中，首先使用requests库发送GET请求获取网页内容，然后使用BeautifulSoup库解析HTML文档。使用`find_all()`方法查找所有`<a>`标签和`<img>`标签，并使用`get()`方法获取标签属性和文本。使用`requests.get()`方法获取图片链接的内容，并使用`open()`方法将图片保存到本地。当然，实际应用中还需要考虑反爬虫机制、异常处理、数据存储等问题。

阅读全文