首页词频统计Python

词频统计Python

时间: 2024-06-14 13:02:52 浏览: 150

词频统计在Python中通常用于分析文本数据，找出其中最常见的词语，这对于文本挖掘、自然语言处理（NLP）和数据分析等领域非常有用。Python有许多库可以帮助进行词频统计，最常用的是`collections`模块中的`Counter`类和`nltk`（自然语言工具包）。以下是使用`collections.Counter`进行词频统计的基本步骤： 1. **导入库**: ```python from collections import Counter ``` 2. **读取文本**: ```python with open('your_text_file.txt', 'r') as file: text = file.read() ``` 3. **预处理文本**: - 分词（tokenization） - 删除标点、数字和停用词（可选） ```python import string from nltk.corpus import stopwords stop_words = set(stopwords.words('english')) # 使用英文停用词 words = text.translate(str.maketrans('', '', string.punctuation)).lower().split() filtered_words = [word for word in words if word not in stop_words] ``` 4. **创建词频计数器**: ```python word_freq = Counter(filtered_words) ``` 5. **查看最频繁的单词**: ```python most_common_words = word_freq.most_common(10) # 获取最常见的10个单词及其频率 ```

阅读全文