2023-06-06 14:02:21,310 - INFO - running WikiExtractor.py: parse the chinese corpus D:\软件\python\lib\site-packages\gensim\utils.py:1333: UserWarning: detected Windows; aliasing chunkize to chunkize_serial warnings.warn("detected %s; aliasing chunkize to chunkize_serial" % entity) Traceback (most recent call last): File "D:\pythonFiles\图灵\Python_project\self_learn\大语言模型\WikiExtractor.py", line 52, in <module> parse_corpus(infile, outfile) File "D:\pythonFiles\图灵\Python_project\self_learn\大语言模型\WikiExtractor.py", line 29, in parse_corpus for text in wiki.get_texts(): File "D:\软件\python\lib\site-packages\gensim\corpora\wikicorpus.py", line 693, in get_texts for tokens, title, pageid in pool.imap(_process_article, group): File "D:\软件\python\lib\multiprocessing\pool.py", line 870, in next raise value File "D:\软件\python\lib\multiprocessing\pool.py", line 537, in _handle_tasks put(task) File "D:\软件\python\lib\multiprocessing\connection.py", line 211, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "D:\软件\python\lib\multiprocessing\reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) AttributeError: Can't pickle local object 'parse_corpus.<locals>.tokenize' 怎么解决
时间: 2023-08-24 12:09:09 浏览: 173
babel-plugin-object-to-json-parse:该插件将对象文字转换为JSON.parse
这个错误是因为在使用多进程时,无法将本地对象pickle化,建议尝试将函数`parse_corpus`中的`tokenize`函数定义在`parse_corpus`函数外,并将其作为参数传递到`parse_corpus`函数中。或者可以使用`pathos.multiprocessing`替代Python自带的`multiprocessing`模块,因为`pathos.multiprocessing`可以pickle化本地函数。
阅读全文