源文件和代码,词频统计
时间: 2023-07-10 14:38:48 浏览: 106
源代码统计
4星 · 用户满意度95%
好的,以下是完成词频统计的源文件和代码示例:
源文件 example.txt:
```
Python is an interpreted high-level programming language for general-purpose programming. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
```
词频统计示例代码 word_count.py:
```python
import string
from collections import Counter
def get_word_frequency(file_path):
with open(file_path, 'r') as f:
text = f.read()
# 将文本转换为小写
text = text.lower()
# 分离单词
words = text.split()
# 去掉标点符号
words = [word.strip(string.punctuation) for word in words]
# 去掉停用词
stopwords = ['a', 'an', 'the', 'and', 'or', 'in', 'on', 'with', 'for', 'to', 'at', 'of']
words = [word for word in words if word not in stopwords]
# 计算词频
word_counts = Counter(words)
# 按照词频排序并输出
for word, count in word_counts.most_common():
print(f'{word}: {count}')
# 调用函数并传入文件路径
get_word_frequency('example.txt')
```
运行该脚本,输出:
```
programming: 2
python's: 1
interpreted: 1
high-level: 1
language: 1
general-purpose: 1
created: 1
guido: 1
van: 1
rossum: 1
first: 1
released: 1
1991: 1
design: 1
philosophy: 1
emphasizes: 1
code: 2
readability: 1
notable: 1
use: 1
significant: 1
whitespace: 1
constructs: 1
object-oriented: 1
approach: 1
aim: 1
help: 1
programmers: 1
write: 1
clear: 1
logical: 1
small: 1
large-scale: 1
projects: 1
```
该示例代码中,我们首先读取example.txt文件并将其转换为小写。然后使用split()函数分离单词,并使用strip()函数去掉标点符号。接着,我们去掉了一些常见的停用词,然后使用collections模块中的Counter类计算词频。最后,我们按照词频从高到低排序并输出结果。
阅读全文