利用 scikit-learn 机器学习库实现 SVM 的二分类问题,需自行选择数据集。选择scikit-learn中的breast_cancer、
时间: 2024-12-11 08:24:15 浏览: 12
Breast-Cancer-Scikitlearn:使用Scikitlearn进行机器学习的简单教程
在Python中,我们可以利用scikit-learn库中的支持向量机(SVM,Support Vector Machine)来进行二分类问题。这里我们以著名的鸢尾花(Iris)数据集为例,这个数据集恰好也是scikit-learn内置的,非常适合用于演示。以下是使用SVM进行二分类的基本步骤:
1. 导入所需的库:
```python
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
```
2. 加载数据集并查看前几行:
```python
iris = datasets.load_iris()
print(iris.data[:5])
```
3. 预处理数据,通常我们会将特征缩放:
```python
X = iris.data
y = iris.target
# 将类别标签转换为便于处理的数值形式
le = preprocessing.LabelEncoder()
y_encoded = le.fit_transform(y)
# 分割数据集为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.3, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
```
4. 创建并训练SVM模型:
```python
svm = SVC(kernel='linear') # 或者选择其他内核如 'poly', 'rbf', 'sigmoid'
svm.fit(X_train_scaled, y_train)
```
5. 预测和评估模型性能:
```python
y_pred = svm.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
```
阅读全文