python不使用任何第三方库，用最大匹配法筛选出出现次数前十的词语不包括标点符号

假设有一个文本文件 `text.txt`，里面包含了需要处理的文本内容。首先，我们需要读取文本内容，并将其转换为字符串形式： ```python with open('text.txt', 'r', encoding='utf-8') as f: text = f.read() ``` 接下来，我们需要定义一个最大匹配法函数，用于分词： ```python def max_match(text, max_len): words = [] while len(text) > 0: word = text[:max_len] while word not in dictionary and len(word) > 1: word = word[:-1] words.append(word) text = text[len(word):] return words ``` 其中，`text` 为需要分词的文本内容，`max_len` 为最大匹配的词语长度，`dictionary` 为一个包含所有可能出现的词语的列表，需要在程序中定义或从其他来源导入。接着，我们需要定义一个函数来统计每个词语出现的次数： ```python def count_words(words): word_counts = {} for word in words: if word.isalpha(): if word not in word_counts: word_counts[word] = 1 else: word_counts[word] += 1 return word_counts ``` 其中，`words` 为分词后得到的词语列表，`word_counts` 为一个字典，用于统计每个词语出现的次数。最后，我们可以将上述代码整合起来，实现筛选出出现次数前十的词语的功能： ```python # 读取文本内容 with open('text.txt', 'r', encoding='utf-8') as f: text = f.read() # 最大匹配法分词 max_len = 5 dictionary = ['apple', 'banana', 'orange', 'peach', 'pear'] words = max_match(text, max_len) # 统计词语出现次数 word_counts = count_words(words) # 打印出现次数前十的词语 top_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)[:10] for word, count in top_words: print(word, count) ``` 请注意，上述代码中的 `dictionary` 列表只包含了一些简单的示例词语，实际应用中需要根据具体情况进行调整和扩充。

CSDN会员

开通CSDN年卡参与万元壕礼抽奖

海量 VIP免费资源千本正版电子书商城会员专享价千门课程&专栏

全年可省5,000元立即开通