breast cancer 数据集分析代码
时间: 2023-09-11 16:11:20 浏览: 183
以下是使用Python和Scikit-learn库进行乳腺癌数据集分析的代码:
```python
# 导入所需的库和函数
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
# 加载乳腺癌数据集
cancer = load_breast_cancer()
# 将数据集转换为DataFrame
df = pd.DataFrame(np.c_[cancer['data'], cancer['target']], columns = np.append(cancer['feature_names'], ['target']))
# 拆分数据集为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(df[cancer['feature_names']], df['target'], test_size=0.20, random_state=42)
# 定义逻辑回归模型并训练
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
# 对测试集进行预测
y_pred = logreg.predict(X_test)
# 计算模型准确率和混淆矩阵
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
print(f"Confusion Matrix:\n{confusion_matrix(y_test, y_pred)}")
```
这段代码将乳腺癌数据集加载到DataFrame中,并将其拆分为训练集和测试集。然后,它定义了一个逻辑回归模型,并使用训练数据拟合该模型。最后,它使用测试集进行预测,并计算了模型的准确率和混淆矩阵。
阅读全文