首页<li class="news_other">http://www.chinanews.com.cn/cj/2023/06-07/10020478.shtml 2023-06-07 07:24:00 </li>这种如何分别提取url和时间

<li class="news_other">http://www.chinanews.com.cn/cj/2023/06-07/10020478.shtml 2023-06-07 07:24:00 </li>这种如何分别提取url和时间

时间: 2023-08-07 12:07:03 浏览: 70

可以使用正则表达式来提取URL和时间。提取URL的正则表达式可以是：`<li class="news_other">\s*<a href="(.*?)">`。解释如下： - `<li class="news_other">` 匹配 `<li class="news_other">` 标签。 - `\s*` 匹配 0 个或多个空格。 - `<a href="` 匹配 `<a href="` 字符串。 - `(.*?)` 使用非贪婪模式匹配任意字符，直到下一个字符串的匹配。 - `">` 匹配 `">` 字符串。提取时间的正则表达式可以是：`(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})`。解释如下： - `(` 表示开始一个捕获组。 - `\d{4}-\d{2}-\d{2}` 匹配日期格式，例如 `2023-06-07`。 - `\s+` 匹配 1 个或多个空格。 - `\d{2}:\d{2}:\d{2}` 匹配时间格式，例如 `07:24:00`。 - `)` 表示结束捕获组。使用 Python 代码可以这样实现： ```python import re text = '<li class="news_other">http://www.chinanews.com.cn/cj/2023/06-07/10020478.shtml 2023-06-07 07:24:00 </li>' url_pattern = r'<li class="news_other">\s*<a href="(.*?)">' time_pattern = r'(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})' url_match = re.search(url_pattern, text) if url_match: url = url_match.group(1) print(url) time_match = re.search(time_pattern, text) if time_match: time = time_match.group(1) print(time) ``` 输出结果为： ``` http://www.chinanews.com.cn/cj/2023/06-07/10020478.shtml 2023-06-07 07:24:00 ```

阅读全文