尽管Requests库功能很友好、开发简单(其实除了import外只需一行主要代码),但请编写一个小程序,“任意”找个url,测试一下成功爬取100次网页的时间。(某些网站对于连续爬取页面将采取屏蔽IP的策略,所以,要避开这类网站。)
时间: 2024-10-07 11:04:51 浏览: 33
在Python中,我们可以使用`requests`库配合`time`模块来实现这个简单的程序。首先,我们需要避免过于频繁地请求某个特定的URL以防止被封禁。这里我们将使用随机选取的URL列表,并设置一个合理的间隔时间。下面是一个基本示例:
```python
import requests
import time
from random import choice
# 避免的敏感关键词列表(可以根据实际情况修改)
blacklist_urls = ['https://example.com', 'https://api.example.com'] # 添加需要避开的网址
urls = [
"http://random.org", # 你可以替换为一个动态生成的URL列表,如:"https://news.ycombinator.com/"
"https://github.com",
"https://www.reddit.com",
"https://openweathermap.org"
] - set(blacklist_urls) # 去除黑名单中的url
start_time = time.time()
for _ in range(100):
url = choice(urls)
try:
response = requests.get(url, timeout=5) # 设置超时限制
if response.status_code == 200:
print(f"Successfully fetched URL: {url}")
else:
print(f"Failed to fetch URL: {url}, status code: {response.status_code}")
# 等待一段时间,模拟人类操作
time.sleep(1) # 调整间隔以降低请求频率
except requests.exceptions.RequestException as e:
print(f"Error occurred: {e}")
end_time = time.time()
total_time = end_time - start_time
print(f"\nTotal time taken for 100 requests: {total_time} seconds")
阅读全文
相关推荐
![txt](https://img-home.csdnimg.cn/images/20241231045021.png)
![7z](https://img-home.csdnimg.cn/images/20241231044736.png)
![pptx](https://img-home.csdnimg.cn/images/20241231044947.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)