Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm Community Edition 2023.1.1\plugins\python-ce\helpers\pydev\pydevconsole.py", line 364, in runcode coro = func() File "<input>", line 3, in <module> File "C:\Users\13974\AppData\Local\Programs\Python\Python310\lib\site-packages\gensim\corpora\dictionary.py", line 78, in init self.add_documents(documents, prune_at=prune_at) File "C:\Users\13974\AppData\Local\Programs\Python\Python310\lib\site-packages\gensim\corpora\dictionary.py", line 204, in add_documents self.doc2bow(document, allow_update=True) # ignore the result, here we only care about updating token ids File "C:\Users\13974\AppData\Local\Programs\Python\Python310\lib\site-packages\gensim\corpora\dictionary.py", line 241, in doc2bow raise TypeError("doc2bow expects an array of unicode tokens on input, not a single string") TypeError: doc2bow expects an array of unicode tokens on input, not a single string

时间: 2023-06-28 12:05:41 浏览: 284

ZIP

traceback-path-mapper:PyCharm无法处理来自远程服务器的回溯。这是一个简单的脚本，用于修补Python回溯，将远程路径替换为本地路径

5星 · 资源好评率100%

这个错误是因为 `doc2bow` 函数期望输入一个unicode token的数组，而不是单一的字符串。你需要将输入的文档分词并转换为一个token数组，然后再传递给 `doc2bow` 函数。你可以用如下的代码完成这个操作： ```python from gensim.corpora.dictionary import Dictionary # 假设你的文档已经存储在一个名为 documents 的数组中 # 分词并创建一个token列表 tokens_list = [doc.split() for doc in documents] # 创建词典 dictionary = Dictionary(tokens_list) # 将文档转换为词袋表示 corpus = [dictionary.doc2bow(tokens) for tokens in tokens_list] ``` 这样就可以将文档转换为词袋表示了。注意，这里使用的是gensim库中的 `Dictionary` 和 `doc2bow` 函数。

阅读全文

相关推荐

CSDN会员

开通CSDN年卡参与万元壕礼抽奖

海量 VIP免费资源千本正版电子书商城会员专享价千门课程&专栏

全年可省5,000元立即开通