import jieba with open('123.txt', 'r', encoding='utf-8') as f: text = f.read() words = jieba.lcut(text) word_counts = {} for word in words: if len(word) < 2: continue if word in word_counts: word_counts[word] += 1 else: word_counts[word] = 1 sorted_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True) nouns = [] for word, count in sorted_words: pos = jieba.lcut(word)[0].split('/')[1] if pos == 'n': nouns.append(word) if len(nouns) == 10: break print(nouns)
时间: 2024-03-16 12:47:47 浏览: 24
这段代码可以计算指定文本文件中出现频率排名前10的名词。你需要将文本文件命名为“123.txt”,并将其放在与代码文件相同的目录下,然后运行这段代码即可。请注意,代码中使用的是jieba分词库,所以需要确保该库已经安装。如果没有安装,可以使用以下命令进行安装:
```
pip install jieba
```
另外,在运行代码之前,你需要将文本文件中的内容替换为你想要分析的实际文本。
相关问题
修改 import jieba import jieba.posseg as pseg path = r'D:/明朝collection.txt' with open(path, 'r', encoding='utf-8') as f: text = f.read() jieba.enable_paddle() words = pseg.cut(text, use_paddle=True) counts = {} for word in words: if len(word) == 1: continue else: counts[word] = counts.get(word, 0) + 1 items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) for i in range(500): word, count = items[i] print("{0:<5}{1:>5}".format(word, count))
import jieba.posseg as pseg
path = r'D:/明朝collection.txt'
with open(path, 'r', encoding='utf-8') as f:
text = f.read()
jieba.enable_paddle()
words = pseg.cut(text, use_paddle=True)
counts = {}
for word, flag in words:
if len(word) == 1:
continue
else:
counts[word] = counts.get(word, 0) + 1
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)
for i in range(500):
word, count = items[i]
print("{0:<5}{1:>5}".format(word, count))
import csv import jieba.posseg as pseg import jieba import paddle path = r'D:/明朝collection.txt' with open(path, 'r', encoding='utf-8') as f: text = f.read() jieba.enable_paddle() words = pseg.cut(text, use_paddle=True) counts = {} for word, flag in words: if len(word) == 1: continue else: counts[word] = counts.get(word, 0) + 1 items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) # 将结果写入CSV文件 with open('D:/output.csv', 'w', encoding='utf-8-sig', newline='') as f: writer = csv.writer(f) writer.writerow(['Word', 'Count', 'Flag']) # 写入表头 for i in range(500): word, count = items[i] print("{0:<5}{1:>5}".format(word, count)) words_with_flag = pseg.cut(word, use_paddle=True) for w, f in words_with_flag: print(f) writer.writerow([word, count,flag])
您修改的代码已经添加了将词性输出到CSV文件的功能,并且将文件路径修改为了`D:/output.csv`。修改后的代码如下:
```python
import csv
import jieba.posseg as pseg
import jieba
import paddle
path = r'D:/明朝collection.txt'
with open(path, 'r', encoding='utf-8') as f:
text = f.read()
jieba.enable_paddle()
words = pseg.cut(text, use_paddle=True)
counts = {}
for word, flag in words:
if len(word) == 1:
continue
else:
counts[word] = counts.get(word, 0) + 1
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)
# 将结果写入CSV文件
with open('D:/output.csv', 'w', encoding='utf-8-sig', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Word', 'Count', 'Flag']) # 写入表头
for i in range(500):
word, count = items[i]
print("{0:<5}{1:>5}".format(word, count))
words_with_flag = pseg.cut(word, use_paddle=True)
for w, flag in words_with_flag:
print(flag)
writer.writerow([word, count, flag])
```
这样修改后,代码将会将词语、词频以及词性信息输出到`D:/output.csv`文件中,并且在控制台中打印出词性信息。