如何有效地读取并解析文件夹中的所有XML文件,这些数据转换为Excel格式(如CSV或xlsx),XML的文件名写在第一列
时间: 2024-10-25 20:17:36 浏览: 13
要有效地读取、解析文件夹中的所有XML文件,将数据转换为Excel格式(例如CSV或xlsx),并在第一列包含XML文件名,你可以按照以下步骤操作:
1. **安装所需库**:
使用`lxml`库来解析XML,`pandas`用于数据操作,`openpyxl`或`csv`取决于你选择导出为哪种格式。
```
pip install lxml pandas openpyxl
```
2. **定义读取和解析函数**:
```python
import os
from lxml import etree
import pandas as pd
def read_and_parse_xml(xml_folder, save_folder, extension='xlsx'):
data = []
for filename in os.listdir(xml_folder):
if filename.endswith(f'.{extension}'): # 可选条件,避免处理非XML文件
continue
with open(os.path.join(xml_folder, filename), 'r', encoding='utf-8') as f:
xml_data = etree.fromstring(f.read())
file_name = filename.split('.')[0] # 获取文件名
record = {'File Name': file_name} # 创建记录
for child in xml_data.iter():
record[child.tag] = child.text
data.append(record)
return data
```
3. **创建Excel DataFrame**:
```python
def save_to_excel(data, excel_filename, folder=save_folder):
df = pd.DataFrame(data)
df.set_index('File Name', inplace=True) # 设置文件名为索引
df.to_excel(os.path.join(folder, excel_filename), index=False)
```
4. **应用函数到文件夹**:
```python
xml_input_folder = 'your/xml/folder'
output_folder = 'output/excel/folder'
save_to_excel(read_and_parse_xml(xml_input_folder, output_folder), 'data_with_xml_names.xlsx')
```
5. **转换为CSV格式**(如果你希望生成CSV):
```python
def save_to_csv(data, csv_filename, folder=save_folder):
df = pd.DataFrame(data)
df.to_csv(os.path.join(folder, csv_filename), index_label='File Name')
save_to_csv(read_and_parse_xml(xml_input_folder, output_folder), 'data_with_xml_names.csv')
```
阅读全文