docx文件:import re # 读取文件内容 with open("example_output.txt", "r", encoding="utf-8") as f: content = f.read() # 使用正则表达式去除重复 pattern = r'^(\d+)\.\s―.*\[答案\]\s(\w).*$' result = {} for line in content.split('\n'): match = re.match(pattern, line) if match: q_num = match.group(1) answer = match.group(2) result[q_num] = answer # 将去重后的内容写入新文件 with open("example_output_no_duplicate.txt", "w", encoding="utf-8") as f: for q_num, answer in result.items(): f.write(f"{q_num}. ―___________________ [答案] {answer}\n")
时间: 2024-01-02 18:04:22 浏览: 63
如何读取数据.docx
对于docx文件,可以使用`docx2txt`库来读取文件内容。以下是修改后的代码:
```python
import re
import docx2txt
# 读取docx文件内容
text = docx2txt.process("example.docx")
# 使用正则表达式去除重复
pattern = r'^(\d+)\.\s―.*\[答案\]\s(\w).*$'
result = {}
for line in text.split('\n'):
match = re.match(pattern, line)
if match:
q_num = match.group(1)
answer = match.group(2)
result[q_num] = answer
# 将去重后的内容写入新文件
with open("example_output_no_duplicate.txt", "w", encoding="utf-8") as f:
for q_num, answer in result.items():
f.write(f"{q_num}. ―___________________ [答案] {answer}\n")
```
该代码通过`docx2txt.process()`函数读取docx文件内容,并使用正则表达式去除重复。最后将去重后的内容写入新文件。
阅读全文