机器学习期末考试简答题题及答案
时间: 2025-01-02 14:31:06 浏览: 37
### 机器学习期末考试简答题及参考答案
#### 题目一:解释监督学习的概念并列举常见的监督学习算法。
在监督学习中,训练数据集由输入特征向量及其对应的标签构成。目标是从这些已标注的数据中学到一个映射函数,使得对于新的未见过的样本能够预测其类别或数值输出[^2]。常见类型的监督学习包括但不限于:
- **回归分析**:用于连续型变量之间的关系建模;
- **分类器构建**:如决策树、支持向量机(SVM),以及神经网络等;
```python
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
# 加载鸢尾花数据集作为示例
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target)
clf = DecisionTreeClassifier().fit(X_train, y_train)
print(f"测试集上的准确率: {clf.score(X_test, y_test)}")
```
---
#### 题目二:描述无监督学习的特点,并给出两种具体的实现方式。
不同于监督学习,在无监督学习里并没有给定明确的目标值来指导模型的学习过程。这类方法旨在发现隐藏于高维空间中的模式结构或是降低维度以便更好地理解原始数据分布特性。具体来说,
- K-means 聚类通过迭代更新质心位置来进行簇划分操作;
- 概率聚类则基于统计学原理假设观测来自于某个混合密度函数下的随机抽样结果;
```python
import numpy as np
from sklearn.cluster import KMeans
from scipy.stats import multivariate_normal
def gmm_clustering(data, n_components=3):
"""Gaussian Mixture Model clustering."""
# 初始化参数
means_init = data[np.random.choice(len(data), size=n_components)]
covariances_init = [np.cov(data.T)] * n_components
weights = []
responsibilities = None
for _ in range(10): # 迭代次数
# E-step 计算责任矩阵 (responsibilities matrix)
log_likelihoods = [
mvn.logpdf(data) + np.log(weight / sum(weights))
for weight, mvn in zip(
weights or [1/n_components]*n_components,
map(multivariate_normal, means_init, covariances_init))]
responsibilities = np.exp(np.array(log_likelihoods).T)
resp_sum = responsibilities.sum(axis=-1)[...,None]
normalized_responsibility = responsibilities / resp_sum
# M-step 更新均值和协方差估计
Nk = normalized_responsibility.sum(axis=0)
new_means = ((data[:,None,:] @ normalized_responsibility[...,None]).squeeze() /
Nk[:,None])
diff_from_mean = data[:,None,:]-new_means[None,:,:]
new_covars = (((diff_from_mean[:,:,:,None]) *
(diff_from_mean[:,:,None,:])) @
normalized_responsibility[...,None,None]).sum(axis=(0,1)) / Nk[:,None]
means_init, covariances_init = new_means, new_covars.diagonal()
weights = list(Nk/len(data))
return {'means': means_init.tolist(),
'covariances': covariances_init.tolist()}
```
---
#### 题目三:阐述半监督学习的应用场景及其优势所在。
当可用标记样本数量有限而大量未加标签的信息源存在时,采用半监督框架可以有效利用这部分额外资源提升泛化性能表现。相比于完全依赖人工标注的传统有监督途径而言,这种方法不仅降低了成本开销还可能带来更优的结果因为引入了更多关于整体分布的知识有助于改善边界区域内的不确定性判定问题.
---
阅读全文