二分类问题,分类结果为正常或者不正常,正常数据与不正常数据在两个不同的excel中,预测分类数据在新的excel中,请编写一个python代码,实现该功能
时间: 2024-09-17 12:03:34 浏览: 34
二分类肿瘤基因数据,小样本,高纬度,用来训练分类算法
针对您的需求,我们可以使用pandas读取数据,sklearn进行预处理和模型训练,然后将结果写入新的Excel文件。这里是一个基本的Python脚本示例:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
import openpyxl
# 读取正常数据和不正常数据
def load_data(normal_file, abnormal_file):
normal_df = pd.read_excel(normal_file)
abnormal_df = pd.read_excel(abnormal_file)
# 合并两部分数据
all_data = pd.concat([normal_df, abnormal_df], ignore_index=True)
return all_data
# 加上类别编码
def encode_labels(data):
le = LabelEncoder()
data['target_column'] = le.fit_transform(data['target_column'])
return data, le
# 数据划分
data = load_data('normal_data.xlsx', 'abnormal_data.xlsx')
data, le = encode_labels(data)
X = data.drop('target_column', axis=1)
y = data['target_column']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 训练模型(例如使用Logistic Regression)
model = LogisticRegression()
model.fit(X_train, y_train)
# 预测
y_pred = model.predict(X_test)
# 创建新的Excel写入预测结果
output_file = "prediction_results.xlsx"
writer = pd.ExcelWriter(output_file, engine='openpyxl')
# 将原始测试数据写入
test_df = data[data.index.isin(y_test.index)]
test_df.to_excel(writer, sheet_name='Original Data', index=False)
# 创建新的工作表写入预测结果
results_sheet = writer.book.create_sheet(title="Prediction Results")
results_sheet.append(['ID', 'Actual Label', 'Predicted Label'])
results_sheet.set_column(0, 0, 15) # 调整第一列宽度
results_sheet['A1'] = test_df.index.tolist() + ['Actual Label', 'Predicted Label']
results_sheet.iloc[:, 1:] = test_df[['target_column', 'predicted_label']].values
writer.save()
# 输出分类报告作为性能指标
print(classification_report(y_test, y_pred, target_names=[le.inverse_transform([0]), le.inverse_transform([1])]))
阅读全文