用def函数for循环爬取五个网站
时间: 2024-09-16 15:04:01 浏览: 35
在Python中,你可以使用`requests`库来发送HTTP请求获取网页内容,并结合`BeautifulSoup`库解析HTML。这里我们创建一个简单的例子,使用`def`函数结合`for`循环爬取指定数量(例如5个)的网站。但是请注意,实际网络抓取需要遵守网站的robots.txt规则,并尊重数据隐私。
```python
import requests
from bs4 import BeautifulSoup
# 定义一个获取网页内容的函数
def fetch_webpage(url):
try:
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
return soup
else:
print(f"Failed to fetch {url}, status code: {response.status_code}")
return None
except Exception as e:
print(f"Error occurred while fetching {url}: {str(e)}")
return None
# 爬取五个网站
websites = ["http://example1.com", "http://example2.com", "http://example3.com",
"http://example4.com", "http://example5.com"]
for i, url in enumerate(websites, start=1):
webpage = fetch_webpage(url)
if webpage is not None:
# 这里可以进一步处理抓取到的网页内容,如提取信息等
print(f"Successfully fetched website {i}: {url}")
阅读全文