编写一个python代码,对二分类问题进行分类,输入的数据为两个excel表格,每个表格有四个特征列,最终要求分类的数据也存储在excel中,并将分类结果存于同一个excel中
时间: 2024-09-17 19:03:29 浏览: 47
为了编写一个Python代码来处理这个任务,我们可以使用pandas库来读取Excel文件、scikit-learn库来进行二分类并保存结果到新的Excel文件。假设您已经有了训练数据集(train.xlsx)和测试数据集(test.xlsx),并且目标变量(标签)位于每张表的最后一列。这是一个简单的示例:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import openpyxl
# 读取数据
def load_data(file_name):
return pd.read_excel(file_name)
train_df = load_data('train.xlsx')
test_df = load_data('test.xlsx')
# 检查数据集中是否有目标变量
if 'target_column' not in train_df.columns or 'target_column' not in test_df.columns:
raise ValueError("Target column is missing. Please make sure it's named 'target_column'.")
# 分割数据集
X_train, X_test, y_train, y_test = train_test_split(train_df.drop('target_column', axis=1), train_df['target_column'], test_size=0.2, random_state=42)
# 创建模型并训练
model = LogisticRegression()
model.fit(X_train, y_train)
# 预测测试集
y_pred = model.predict(test_df.drop('target_column', axis=1))
# 计算准确率
accuracy = accuracy_score(test_df['target_column'], y_pred)
print(f"Accuracy: {accuracy}")
# 将预测结果写入新的Excel文件
output_file = "classification_results.xlsx"
writer = pd.ExcelWriter(output_file, engine='openpyxl')
# 写入原始测试数据
test_df.to_excel(writer, sheet_name='Original Data', index=False)
# 创建新工作表写入预测结果
results_sheet = writer.book.create_sheet(title="Classification Results")
results_sheet.append(['ID', 'Predicted Label'])
results_sheet.set_column(0, 0, 15) # 调整第一列宽度
results_sheet['A1'] = test_df.index.tolist() + ['Label']
results_sheet['B1'] = y_pred.tolist()
writer.save()
阅读全文