首页把上述要求改成pdf转txt，其他不变

把上述要求改成pdf转txt，其他不变

时间: 2024-02-04 10:01:49 浏览: 17

好的，将代码中的 "html" 和 "HTML" 替换为 "pdf" 和 "PDF"，同时需要使用相应的 Python 库来处理 PDF 文件。下面是修改后的示例代码： ```python import os import PyPDF2 source_dir = r'C:\Users\74289\文本挖掘\词频统计\爬pdf\pdf版文献' # 原始文件夹路径 target_dir = r'C:\Users\74289\文本挖掘\词频统计\pdf转文本版文献' # 目标文件夹路径 if not os.path.exists(target_dir): os.mkdir(target_dir) for root, dirs, files in os.walk(source_dir): for file in files: if file.endswith('.pdf') or file.endswith('.PDF'): pdf_file = os.path.join(root, file) with open(pdf_file, 'rb') as f: reader = PyPDF2.PdfFileReader(f) content = '' for i in range(reader.numPages): page = reader.getPage(i) content += page.extractText() txt_file = os.path.join(target_dir, file[:-4] + '.txt') with open(txt_file, 'w', encoding='utf-8') as f: f.write(content) ``` 这个代码将从指定的源文件夹中遍历所有的 PDF 文件，并将其转换为文本文件，存储到指定的目标文件夹中。