import jieba.posseg as pseg texts1 = [] for text in texts: pairs = pseg.cut(text) result = [[pair.word, pair.flag] for pair in pairs] texts1.append(result)
时间: 2024-03-07 09:52:29 浏览: 21
这段代码使用了中文分词库jieba的posseg模块,对一个文本列表texts中的每个文本进行了词性标注,并将结果以列表形式存储到另一个列表texts1中。具体来说,代码中的for循环遍历了文本列表texts,对于每个文本,使用pseg.cut()方法进行分词和词性标注,将结果存储到pairs变量中,然后通过列表推导式将每个词语及其对应的词性以列表形式存储到result变量中。最后将result添加到texts1列表中。
相关问题
帮我修改一下代码import jieba.posseg as pseg import nltk sentence = "那个地方小的黄色的细腻的狮子" words = pseg.cut(sentence) grammar = "NP:{<DT>?<JJ>*<NN>}" cp = nltk.RegexpParser(grammar) result = cp.parse(words) result.draw()
好的,我来帮您修改代码:
```python
import jieba.posseg as pseg
import nltk
sentence = "那个地方小的黄色的细腻的狮子"
words = [(word, tag) for word, tag in pseg.cut(sentence)]
grammar = "NP:{<DT>?<JJ>*<NN>}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(words)
result.draw()
```
修改后的代码将分词结果从 generator 转换为列表,以便于使用 nltk 进行正则匹配。同时,也可以使用 word 和 tag 的元组进行正则匹配。
修改 import jieba import jieba.posseg as pseg path = r'D:/明朝collection.txt' with open(path, 'r', encoding='utf-8') as f: text = f.read() jieba.enable_paddle() words = pseg.cut(text, use_paddle=True) counts = {} for word in words: if len(word) == 1: continue else: counts[word] = counts.get(word, 0) + 1 items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) for i in range(500): word, count = items[i] print("{0:<5}{1:>5}".format(word, count))
import jieba.posseg as pseg
path = r'D:/明朝collection.txt'
with open(path, 'r', encoding='utf-8') as f:
text = f.read()
jieba.enable_paddle()
words = pseg.cut(text, use_paddle=True)
counts = {}
for word, flag in words:
if len(word) == 1:
continue
else:
counts[word] = counts.get(word, 0) + 1
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)
for i in range(500):
word, count = items[i]
print("{0:<5}{1:>5}".format(word, count))