首页对data路径中的txt文件分别进行分词、剔除停用词，并在out文件夹下生成对应的新文本文件的代码如何编写

对data路径中的txt文件分别进行分词、剔除停用词，并在out文件夹下生成对应的新文本文件的代码如何编写

时间: 2023-03-29 14:04:45 浏览: 69

您好，以下是对data路径中的txt文件分别进行分词、剔除停用词，并在out文件夹下生成对应的新文本文件的代码： ```python import os import jieba import codecs # 停用词表路径 stopwords_path = 'stopwords.txt' # 加载停用词表 stopwords = [line.strip() for line in codecs.open(stopwords_path, 'r', 'utf-8').readlines()] # 待处理文件夹路径 data_path = 'data/' # 处理后文件夹路径 out_path = 'out/' # 遍历文件夹中的所有txt文件 for file_name in os.listdir(data_path): if file_name.endswith('.txt'): # 读取文件内容 with codecs.open(os.path.join(data_path, file_name), 'r', 'utf-8') as f: content = f.read() # 分词 seg_list = jieba.cut(content) # 剔除停用词 filtered_words = [word for word in seg_list if word not in stopwords] # 保存处理后的文件 with codecs.open(os.path.join(out_path, file_name), 'w', 'utf-8') as f: f.write(' '.join(filtered_words)) ``` 希望能够帮到您！