首页python词频统计txt

python词频统计txt

时间: 2024-11-04 08:06:19 浏览: 20

在Python中，我们可以使用内置的`collections`模块或者第三方库如`nltk`（Natural Language Toolkit）来进行文本文件（如.txt格式）的词频统计。以下是使用这两个方法的基本步骤： **使用collections模块**: ```python from collections import Counter import re def word_frequency(file_path): with open(file_path, 'r', encoding='utf-8') as file: text = file.read() # 使用正则表达式去除标点符号和换行符，并转换为小写 words = re.findall(r'\b\w+\b', text.lower()) # 使用Counter计算词频 word_counts = Counter(words) return word_counts # 示例 word_counts = word_frequency('your_text_file.txt') ``` **使用nltk库**: ```python import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize def word_freq_from_nltk(file_path): nltk.download('punkt') # 如果第一次使用，需要下载punkt数据集 nltk.download('stopwords') # 下载停用词列表 with open(file_path, 'r', encoding='utf-8') as file: text = file.read() stop_words = set(stopwords.words('chinese')) # 或者英文，看你的文本内容 words = [word for word in word_tokenize(text) if word.isalnum() and not word in stop_words] freq_dist = nltk.FreqDist(words) return freq_dist # 示例 freq_distribution = word_freq_from_nltk('your_text_file.txt') ```

阅读全文