首页英文词频统计python

英文词频统计python

时间: 2023-07-01 21:16:00 浏览: 272

你好！要实现英文词频统计，可以使用Python中的nltk库和collections库。以下是一个简单的示例代码： ```python import nltk from collections import Counter # 下载nltk中的停用词 nltk.download('stopwords') # 加载英文停用词 stopwords = nltk.corpus.stopwords.words('english') # 打开文件并读取内容 with open('sample.txt', 'r') as file: text = file.read() # 把所有的字母转换为小写 text = text.lower() # 提取所有的单词 words = nltk.tokenize.word_tokenize(text) # 去除停用词 words = [word for word in words if word not in stopwords] # 统计词频 word_freq = Counter(words) # 输出前10个最常见的单词和它们的词频 print(word_freq.most_common(10)) ``` 在这个示例代码中，我们打开一个文本文件并读取其中的内容。然后把所有的字母转换为小写，并使用nltk库的word_tokenize函数提取所有的单词。接着，我们使用collections库的Counter函数统计每个单词出现的次数。最后，我们输出前10个最常见的单词和它们的词频。当然，这只是一个简单的示例代码，你可以根据自己的需要进行修改和扩展。

阅读全文