使用朴素贝叶斯和逻辑回归算法实现乳腺癌数据分类代码实现
时间: 2024-11-25 10:33:14 浏览: 25
在Python中,我们可以使用scikit-learn库来实现朴素贝叶斯(Naive Bayes)和逻辑回归(Logistic Regression)对乳腺癌数据进行分类。这里是一个简单的例子,我们将使用著名的鸢尾花数据集(iris dataset),假设我们有一个名为`breast_cancer_data.csv`的数据文件,其中包含特征列和标签列。
首先,我们需要导入必要的库:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
```
然后加载数据并预处理:
```python
# 加载数据
data = pd.read_csv('breast_cancer_data.csv')
X = data.drop('target', axis=1) # 特征列
y = data['target'] # 标签列
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 数据标准化
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```
接着,分别训练朴素贝叶斯和逻辑回归模型:
```python
# Naive Bayes
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# Logistic Regression
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
```
最后,评估模型性能:
```python
# 预测并计算准确率
gnb_pred = gnb.predict(X_test)
log_reg_pred = log_reg.predict(X_test)
accuracy_gnb = accuracy_score(y_test, gnb_pred)
accuracy_logreg = accuracy_score(y_test, log_reg_pred)
print(f"Naive Bayes Accuracy: {accuracy_gnb}")
print(f"Logistic Regression Accuracy: {accuracy_logreg}")
```
如果你有具体的特征列和标签列名需要调整,记得相应修改`X`和`y`的选取部分。同时,如果实际数据集不是结构化的CSV文件,你需要先将其转化为DataFrame形式。
阅读全文