import jieba excludes={"将军","却说","这样","他们","东汉","","然而","自己","这个","没有"} txt=open("C:\python\三国演义.txt","r",encoding='utf-8').read() words=jieba.lcut(txt) counts={} for word in words: if len(word)==1: continue else: counts[word]=counts.get(word,0)+1 items=list(counts.items()) items.sort(key=lambda x:x[1],reverse=True) for i in range(15): word,count=items[i] print("{0:<10}{1:>5}".format(word,count))
时间: 2023-06-17 10:02:04 浏览: 53
这段代码使用了jieba库对三国演义进行分词,并且统计每个词出现的次数,最后输出出现频率最高的15个词及其出现次数。在代码中,使用了一个字典counts来存储每个词的出现次数,使用了items()方法将字典转化为列表,使用了sort()方法对列表进行排序,最后使用了format()方法进行输出。其中,excludes参数是在分词时指定需要排除的一些词语。注意,这段代码中读取文件的路径需要使用双斜杆或者单斜杆加r来表示,否则会出错。
相关问题
import jieba excludes = {"将军","却说","荆州","二人","不可","不能","如此"}
Unfortunately, as an AI language model, I am not able to execute such a command. However, I can explain that the command "import jieba" imports the jieba library, which is a Chinese text segmentation library used to split Chinese sentences into individual words. The "excludes" parameter is used to exclude certain words from being segmented, in this case, the words "将军", "却说", "荆州", "二人", "不可", "不能", and "如此".
import jieba excludes={"将军","却说","这样","他们","东汉","","然而","自己","这个","没有"} txt=open("C:\python\三国演义.txt","r",encoding='utf-8').read() words=jieba.lcut(txt) counts={} for word in words: if len(word)==1: continue else: counts[word]=counts.get(word,0)+1 items=list(counts.items()) items.sort(key=lambda x:x[1],reverse=True) for i in range(15): word,count=items[i] print("{0:<10}{1:>5}".format(word,count))请问这段代码有错吗
这段代码缺少语法错误,但可能会有以下问题:
1. `excludes`参数应该是`jieba.lcut()`的参数之一,应该写成`jieba.lcut(txt, exclude=excludes)`。
2. 打开文件路径中的反斜杠`\`应该写成双反斜杠`\\`或者使用原始字符串`r`,即`txt=open(r"C:\python\三国演义.txt","r",encoding='utf-8').read()`。
3. `counts`字典中的键值对没有按照词频排序,应该在输出之前将它们按照词频排序,可以使用`sorted()`函数,即`items = sorted(counts.items(), key=lambda x:x[1], reverse=True)`。
4. 如果要剔除一些常用的词语,需要将它们放在`excludes`列表中,而不是将它们写成一个字符串。
修改后的代码如下:
```python
import jieba
excludes = ["将军", "却说", "这样", "他们", "东汉", "", "然而", "自己", "这个", "没有"]
txt = open(r"C:\python\三国演义.txt", "r", encoding='utf-8').read()
words = jieba.lcut(txt, exclude=excludes)
counts = {}
for word in words:
if len(word) == 1:
continue
else:
counts[word] = counts.get(word, 0) + 1
items = sorted(counts.items(), key=lambda x:x[1], reverse=True)
for i in range(15):
word, count = items[i]
print("{0:<10}{1:>5}".format(word, count))
```
相关推荐
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![](https://img-home.csdnimg.cn/images/20210720083646.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)