首页利用python进行tf-idf算法绘制高频词汇的条形图

利用python进行tf-idf算法绘制高频词汇的条形图

时间: 2023-07-25 12:45:47 浏览: 153

好的，下面是使用Python进行tf-idf算法绘制高频词汇的条形图的代码： ```python import jieba import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.feature_extraction.text import TfidfVectorizer # 读取文件 with open('your_file.txt', 'r', encoding='utf-8') as f: text = f.read() # 分词 words = jieba.cut(text) # 统计词频 word_counts = {} for word in words: if len(word) > 1: word_counts[word] = word_counts.get(word, 0) + 1 # 转换成DataFrame df = pd.DataFrame.from_dict(word_counts, orient='index', columns=['count']) # 计算TF-IDF值 tfidf = TfidfVectorizer() tfidf_matrix = tfidf.fit_transform([text]) weights = np.asarray(tfidf_matrix.mean(axis=0)).ravel().tolist() weights_df = pd.DataFrame({'term': tfidf.get_feature_names(), 'weight': weights}) weights_df = weights_df.sort_values('weight', ascending=False) # 绘制条形图 top_n = 10 # 取前10个高频词 top_words = df.sort_values('count', ascending=False).head(top_n) top_words.plot(kind='bar', x=top_words.index, y='count', color='blue', legend=False) plt.xlabel('Words') plt.ylabel('Frequency') plt.title('Top {} Words'.format(top_n)) plt.show() ``` 请将`your_file.txt`替换为您的文本文件路径。这段代码将会读取文件、分词、统计词频、计算TF-IDF值和绘制高频词汇的条形图，您可以通过修改`top_n`的值来控制绘制的高频词汇数量。

阅读全文