如何用python爬取多个网页的logo
时间: 2024-04-20 14:24:55 浏览: 56
您可以使用Python中的requests和BeautifulSoup库来爬取多个网页的Logo。下面是一个简单的示例代码:
```python
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
def get_logo_url(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
logo_url = None
# 根据网页结构找到Logo图片的URL
# 这里仅提供一个简单示例,具体根据网页结构进行相应的修改
logo = soup.find('img', {'class': 'logo'})
if logo:
logo_url = urljoin(url, logo['src'])
return logo_url
# 要爬取的网页URL列表
urls = ['https://www.example1.com', 'https://www.example2.com', 'https://www.example3.com']
for url in urls:
logo_url = get_logo_url(url)
if logo_url:
# 下载Logo图片
response = requests.get(logo_url)
if response.status_code == 200:
with open(f'{url.split("//")[1]}.png', 'wb') as f:
f.write(response.content)
print(f'Successfully downloaded logo from {url}')
else:
print(f'Failed to download logo from {url}')
else:
print(f'No logo found on {url}')
```
上述代码假设Logo图片的URL位于`<img>`标签的`src`属性中,且Logo图片的类名为`logo`。您可以根据具体的网页结构进行相应的修改,以适应不同的网页。
请注意,爬取网页内容需要遵守相关网站的规则和法律,确保您获得了合法的权限。
阅读全文