首页对商城评价文件利用jieba库进行词频统计,分析用户评价信息。

对商城评价文件利用jieba库进行词频统计,分析用户评价信息。

时间: 2024-05-16 07:17:38 浏览: 81

好的，以下是利用jieba库进行词频统计的代码示例： ```python import jieba from collections import Counter # 读取评价文件并分词 with open('evaluation.txt', 'r', encoding='utf-8') as f: content = f.read() words = jieba.cut(content) # 过滤停用词 stopwords = set() with open('stopwords.txt', 'r', encoding='utf-8') as f: for line in f.readlines(): stopwords.add(line.strip()) filtered_words = [word for word in words if word not in stopwords and word != ' '] # 统计词频并输出 counter = Counter(filtered_words) print(counter.most_common(20)) ``` 上述代码中，我们首先使用jieba库对评价文件进行分词，然后读取一个停用词文件，将其中的词语存储到一个set中，再过滤掉分词结果中的停用词和空格。接着使用Python标准库collections中的Counter类统计词频，并输出出现次数最多的前20个词语及其出现次数。需要注意的是，上述代码中的`evaluation.txt`是指商城评价文件，`stopwords.txt`是指停用词文件，需要根据具体情况进行修改。另外，jieba库默认使用全模式分词，可以根据实际需要选择其他分词模式。

阅读全文