with open(csv_file, "w", newline="", encoding="utf-8-sig") as f: writer = csv.writer(f) writer.writerow(["Talk ID", "Time", "Sender ID", "Receive ID", "Talk Type", "Content", "File Name"])
时间: 2023-07-17 20:12:25 浏览: 188
根据你的代码,你想使用CSV写入器(`csv.writer`)将行标题写入CSV文件。以下是修改后的代码示例:
```python
import os
from bs4 import BeautifulSoup
import re
import csv
# 指定文件夹路径
folder_path = "C:/Users/test/Desktop/DIDItest"
# 正则表达式模式
pattern = r'<body>(.*?)<\/body>'
# 打开原有的CSV文件路径
csv_file = "output.csv"
# 打开CSV文件进行写入
with open(csv_file, "w", newline="", encoding="utf-8-sig") as f:
writer = csv.writer(f)
# 写入行标题
writer.writerow(["Talk ID", "Time", "Sender ID", "Receive ID", "Talk Type", "Content", "File Name"])
# 遍历文件夹中的所有文件
for root, dirs, files in os.walk(folder_path):
for file in files:
# 读取html文件
file_path = os.path.join(root, file)
with open(file_path, "r", encoding="utf-8-sig") as f:
html_code = f.read()
# 创建BeautifulSoup对象
soup = BeautifulSoup(html_code, 'html.parser')
# 使用正则表达式匹配<body>标签内的数据
body_data = re.findall(pattern, html_code, re.DOTALL)
# 剔除<p>和()</p>
body_data = body_data[0].replace("<p>", "").replace("()</p>", "")
# 使用正则表达式提取talk_id、时间、发送者ID和接收者ID
matches2 = re.findall(r'(?:中发言|发送)\s*(.*?)\s*(?:音频 :|图片 :)?(?:\[([^\]]+)\])?', body_data)
for match in matches2:
# 提取链接地址
file_text = match[1]
matches = re.findall(r'"([^"]*)"', file_text)
if matches:
file_name = matches[0]
else:
file_name = "No matches found."
# 替换字符
file_name = file_name.replace('No matches found.', '')
# 将提取的数据写入CSV文件
writer.writerow(["", "", "", "", "", "", file_name])
print("数据已成功写入CSV文件")
```
在上面的代码中,我使用了`csv.writer`将行标题写入到CSV文件中。然后,遍历文件夹中的所有文件,并将提取的文件名数据写入CSV文件中。
请确保提供正确的文件夹路径和CSV文件路径,并根据需要调整代码。
阅读全文