首页怎么将已经分词完了的文件夹去停用词，请用Python写出代码

怎么将已经分词完了的文件夹去停用词，请用Python写出代码

时间: 2024-05-09 16:18:03 浏览: 70

好的，下面是Python代码： ```python import os import jieba # 定义停用词列表 stop_words = ['的', '了', '是', '在', '他', '她', '你', '我', '们', '和', '与'] # 分词并去除停用词 def cut_words(file_name): with open(file_name, mode='r', encoding='utf-8') as f: content = f.read() # 分词 seg_list = jieba.cut(content) # 去除停用词 result = '' for word in seg_list: if word not in stop_words: result += word + ' ' return result.strip() # 获取文件夹下所有文件的路径 def get_file_paths(dir_path): file_names = os.listdir(dir_path) file_paths = [] for file_name in file_names: file_path = os.path.join(dir_path, file_name) if os.path.isfile(file_path): file_paths.append(file_path) return file_paths # 批量分词并去除停用词 def batch_cut_words(dir_path): file_paths = get_file_paths(dir_path) for file_path in file_paths: result = cut_words(file_path) # 将结果写入新文件 with open(file_path + '_new.txt', mode='w', encoding='utf-8') as f: f.write(result) ``` 以上代码实现了一个`cut_words()`函数，这个函数用来对单个文本文件进行分词并去除停用词；另外还实现了`get_file_paths()`函数和`batch_cut_words()`函数，用来批量处理整个文件夹下的文件。你只需要把文件夹路径传入`batch_cut_words()`函数即可对整个文件夹下的文本文件进行分词并去除停用词，处理后的结果将会保存在同名的新文件中，例如对于文件夹下的`test.txt`文件，处理后的结果将会保存在`test.txt_new.txt`文件中。

阅读全文