import requests from bs4 import BeautifulSoup codes = ['600887', '002027'] for code in codes: url = f'https://www.sse.com.cn/home/search/index.shtml?webswd={code}' response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') name_elem = soup.find('div', class_='company_info').find('h2') if not name_elem: continue name = name_elem.text price_elem = soup.find('span', class_='last') if not price_elem: continue price = price_elem.text print(f'{name} 的股价为 {price}') 执行以上代码报错 raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.sse.com.cn', port=443): Max retries exceeded with url: / home / search / index.shtml?webswd = 600887(Caused by ConnectTimeoutError( < urllib3.connection.HTTPSConnection object at 0x00000278705B05B0 >, 'Connection to www.sse.com.cn timed out. (connect timeout=None)')) 根据报错优化代码
时间: 2024-02-29 19:56:31 浏览: 26
根据报错信息,我们看到是连接超时的问题,可以增加一个超时时间参数来解决。修改代码如下:
```python
import requests
from bs4 import BeautifulSoup
codes = ['600887', '002027']
for code in codes:
url = f'https://www.sse.com.cn/home/search/index.shtml?webswd={code}'
try:
response = requests.get(url, timeout=5) # 增加timeout参数
soup = BeautifulSoup(response.content, 'html.parser')
name_elem = soup.find('div', class_='company_info').find('h2')
if not name_elem:
continue
name = name_elem.text
price_elem = soup.find('span', class_='last')
if not price_elem:
continue
price = price_elem.text
print(f'{name} 的股价为 {price}')
except requests.exceptions.RequestException as e:
print(f'{url} 爬取失败,原因为:{e}')
```
以上代码中,我们增加了一个try-except语句块来捕获异常,同时增加了timeout参数,设置为5秒,表示请求超过5秒未响应即会抛出异常。如果请求失败,我们会打印出失败的URL和失败原因,方便我们进行调试。