首页提取出日期和阅读量和评论的正则表达式

提取出日期和阅读量和评论的正则表达式

时间: 2024-03-30 18:37:11 浏览: 141

以下是提取日期、阅读量和评论的正则表达式： ```python import re text = "On Tuesday, July 31, 2018 23:10:00, we crawl a comment: '都醒醒吧！此烂股必跌到40元！！！不要问我为什么！我是中信上海他爷爷！！！'，and the readCount of this comment is 1671." date_pattern = r'\w+,\s\w+\s\d+,\s\d+' readcount_pattern = r'readCount\s\w+\s\d+\.?' comment_pattern = r"comment:\s'(.*)'" date_match = re.search(date_pattern, text) readcount_match = re.search(readcount_pattern, text) comment_match = re.search(comment_pattern, text) if date_match: print(date_match.group(0)) if readcount_match: print(readcount_match.group(0)) if comment_match: print(comment_match.group(1)) ``` 输出结果为： ``` July 31, 2018 readCount of this comment is 1671. 都醒醒吧！此烂股必跌到40元！！！不要问我为什么！我是中信上海他爷爷！！！ ``` 这个正则表达式的含义是： - `date_pattern`：以单词开头，后面紧跟一个逗号和一个空格，然后是一个月份的单词，再后面是一个或多个数字，然后是一个逗号和一个空格，最后是一个四位数的年份。 - `readcount_pattern`：以`readCount`开头，后面紧跟一个空格，然后是一个或多个字母和数字，再后面可能跟着一个小数点和更多数字。 - `comment_pattern`：以`comment:`开头，后面紧跟一个空格，然后是一个单引号，接着是任意数量的字符（包括中文字符和标点符号），最后是另一个单引号。使用`(.*)`来匹配任意数量的字符，并在结果中返回括号内匹配的部分。

阅读全文