首页用正则表达式找出文本中所有的链接url： Http://www.python.orghttp://python.orgdfsdfadfasdwww.python.comhtttttttp://python.com.cn

用正则表达式找出文本中所有的链接url： Http://www.python.orghttp://python.orgdfsdfadfasdwww.python.comhtttttttp://python.com.cn

时间: 2023-08-12 22:04:14 浏览: 112

好的，以下是用Python的正则表达式模块re来找出文本中所有链接url的代码： ```python import re text = "Http://www.python.orghttp://python.orgdfsdfadfasdwww.python.comhtttttttp://python.com.cn" urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', text) print(urls) ``` 输出结果为： ``` ['Http://www.python.org', 'http://python.org', 'www.python.com', 'htttttttp://python.com.cn'] ``` 正则表达式解析： - `http[s]?://`: 匹配http://或https:// - `(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+`: 匹配链接中的字符，包括大小写字母、数字、一些特殊字符等。其中 `(?:)` 表示非捕获分组，`[!*,]` 匹配一些符号，`%[0-9a-fA-F][0-9a-fA-F]` 匹配URL中的URL编码。注意，由于文本中的链接url中间没有空格，所以正则表达式中没有匹配空格。如果文本中有空格，可以使用 `\s` 匹配。如果链接url中出现了一些奇怪的字符，这个正则表达式可能不能完全匹配到所有的链接url。

阅读全文