首页python数据预处理，jieba分词，去除停用词表，词频分析，并生成词云图代码

python数据预处理，jieba分词，去除停用词表，词频分析，并生成词云图代码

时间: 2023-04-02 10:03:55 浏览: 323

以下是 Python 数据预处理、jieba 分词、去除停用词表、词频分析并生成词云图的代码： ```python import jieba from wordcloud import WordCloud import matplotlib.pyplot as plt # 读取文本文件 with open('text.txt', 'r', encoding='utf-8') as f: text = f.read() # 分词 words = jieba.cut(text) # 去除停用词表 stopwords = ['的', '了', '是', '我', '你', '他', '她', '它', '我们', '你们', '他们', '她们', '它们'] filtered_words = [] for word in words: if word not in stopwords: filtered_words.append(word) # 词频分析 word_freq = {} for word in filtered_words: if word in word_freq: word_freq[word] += 1 else: word_freq[word] = 1 # 生成词云图 wordcloud = WordCloud(font_path='msyh.ttc', background_color='white', width=800, height=600).generate_from_frequencies(word_freq) plt.imshow(wordcloud, interpolation='bilinear') plt.axis('off') plt.show() ``` 希望能对你有所帮助！

阅读全文