首页.英文词频统计Python

.英文词频统计Python

时间: 2024-12-28 20:28:26 浏览: 18

在Python中，对英文文本进行词频统计通常会使用自然语言处理（Natural Language Processing, NLP）库，如NLTK (Natural Language Toolkit) 或者其派生库wordfreq、collections等。以下是一个简单的步骤： 1. **安装库**：首先需要安装`nltk`库（如果还没安装），可以使用`pip install nltk`命令。 2. **下载数据**：对于`nltk`，可能需要下载punkt分词器和停用词列表，运行`import nltk; nltk.download('punkt')` 和 `nltk.download('stopwords')`。 3. **加载和预处理文本**：读取英文文本文件，去除标点符号、数字，并将所有单词转换为小写形式，以便进行准确计数。 ```python from nltk.corpus import stopwords from nltk.tokenize import word_tokenize import string def preprocess_text(text): text = text.lower() # 移除标点符号 text = text.translate(str.maketrans('', '', string.punctuation)) # 分词 tokens = word_tokenize(text) # 去除停用词 stop_words = set(stopwords.words('english')) filtered_tokens = [token for token in tokens if token not in stop_words] return filtered_tokens # 使用函数 with open('your_file.txt', 'r') as file: text = file.read() tokens = preprocess_text(text) ``` 4. **词频计算**：利用`collections.Counter`来统计词频。 ```python from collections import Counter word_freq = Counter(tokens) print(word_freq.most_common()) # 输出最常见的单词及其频率 ```

阅读全文