如何实现matlab构建天气问题与鸢尾花的决策树
时间: 2024-10-24 07:18:59 浏览: 38
要在MATLAB中构建天气问题和鸢尾花数据集的决策树,可以遵循以下步骤:
### 实验准备
- **硬件**:笔记本或台式电脑。
- **软件环境**:Windows 7 或 Windows 10 操作系统,MATLAB 2012 及以上版本。
### 实验步骤
#### 1. 实验环境配置
确保已安装MATLAB,并且能够正常运行。
#### 2. 数据读取
- **天气问题数据**:加载 `weatherNumeric.mat` 文件。
- **鸢尾花数据集**:加载 `fisheriris.mat` 文件。
```matlab
% 加载天气问题数据
load('weatherNumeric.mat');
% 加载鸢尾花数据
load('fisheriris.mat');
```
#### 3. 属性选择
选择用于构建决策树的属性。
```matlab
% 天气问题数据
attributes_weather = {'Outlook', 'Temperature', 'Humidity', 'Wind'};
data_weather = weatherNumeric(:, attributes_weather);
labels_weather = weatherNumeric.Label;
% 鸢尾花数据
attributes_iris = {'Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width'};
data_iris = meas; % meas 是 fisheriris.mat 中的数据矩阵
labels_iris = species; % species 是鱼尾花的种类标签
```
#### 4. 构建决策树
使用 ID3 或 C4.5 算法构建决策树。
##### ID3 算法
```matlab
function tree = buildID3Tree(data, labels)
% ID3 算法实现
% data: 训练数据
% labels: 标签
% 返回:决策树
% 基本终止条件
if all(labels == labels(1))
tree = struct('label', labels(1));
return;
end
if size(data, 2) == 0
tree = struct('label', mode(labels));
return;
end
% 选择最佳特征
bestFeatureIndex = chooseBestFeatureToSplit(data, labels);
% 创建新的决策树节点
tree = struct('feature', bestFeatureIndex, 'children', cell());
% 获取该特征的所有可能值
uniqueValues = unique(data(:, bestFeatureIndex));
for value = uniqueValues
% 分割数据集
subData = data(data(:, bestFeatureIndex) == value, :);
subLabels = labels(data(:, bestFeatureIndex) == value);
% 递归构建子树
subtree = buildID3Tree(subData(:, setdiff(1:size(data, 2), bestFeatureIndex)), subLabels);
% 添加子树到当前节点
tree.children{value} = subtree;
end
end
function index = chooseBestFeatureToSplit(data, labels)
% 选择最佳特征
numFeatures = size(data, 2);
maxInfoGain = -inf;
bestFeatureIndex = 1;
baseEntropy = calcEntropy(labels);
for i = 1:numFeatures
infoGain = baseEntropy - calcConditionalEntropy(data(:, i), labels);
if infoGain > maxInfoGain
maxInfoGain = infoGain;
bestFeatureIndex = i;
end
end
end
function entropy = calcEntropy(labels)
% 计算熵
labelCounts = histcounts(categorical(labels));
probabilities = labelCounts / sum(labelCounts);
entropy = -sum(probabilities .* log2(probabilities + eps));
end
function condEntropy = calcConditionalEntropy(feature, labels)
% 计算条件熵
uniqueValues = unique(feature);
totalSamples = length(labels);
condEntropy = 0;
for value = uniqueValues
subLabels = labels(feature == value);
weight = length(subLabels) / totalSamples;
condEntropy = condEntropy + weight * calcEntropy(subLabels);
end
end
```
##### C4.5 算法
C4.5 算法与 ID3 类似,但使用信息增益比( Gain Ratio )作为特征选择的标准。
```matlab
function tree = buildC45Tree(data, labels)
% C4.5 算法实现
% data: 训练数据
% labels: 标签
% 返回:决策树
% 基本终止条件
if all(labels == labels(1))
tree = struct('label', labels(1));
return;
end
if size(data, 2) == 0
tree = struct('label', mode(labels));
return;
end
% 选择最佳特征
bestFeatureIndex = chooseBestFeatureToSplitByGainRatio(data, labels);
% 创建新的决策树节点
tree = struct('feature', bestFeatureIndex, 'children', cell());
% 获取该特征的所有可能值
uniqueValues = unique(data(:, bestFeatureIndex));
for value = uniqueValues
% 分割数据集
subData = data(data(:, bestFeatureIndex) == value, :);
subLabels = labels(data(:, bestFeatureIndex) == value);
% 递归构建子树
subtree = buildC45Tree(subData(:, setdiff(1:size(data, 2), bestFeatureIndex)), subLabels);
% 添加子树到当前节点
tree.children{value} = subtree;
end
end
function index = chooseBestFeatureToSplitByGainRatio(data, labels)
% 选择最佳特征
numFeatures = size(data, 2);
maxGainRatio = -inf;
bestFeatureIndex = 1;
baseEntropy = calcEntropy(labels);
for i = 1:numFeatures
gain = baseEntropy - calcConditionalEntropy(data(:, i), labels);
splitInfo = calcSplitInformation(data(:, i));
gainRatio = gain / (splitInfo + eps);
if gainRatio > maxGainRatio
maxGainRatio = gainRatio;
bestFeatureIndex = i;
end
end
end
function splitInfo = calcSplitInformation(feature)
% 计算分裂信息
uniqueValues = unique(feature);
totalSamples = length(feature);
splitInfo = 0;
for value = uniqueValues
prob = sum(feature == value) / totalSamples;
splitInfo = splitInfo - prob * log2(prob + eps);
end
end
```
#### 5. 绘制决策树
使用 MATLAB 的 `treeplot` 函数或其他工具绘制决策树。
```matlab
% 构建决策树
id3_tree_weather = buildID3Tree(data_weather, labels_weather);
c45_tree_weather = buildC45Tree(data_weather, labels_weather);
id3_tree_iris = buildID3Tree(data_iris, labels_iris);
c45_tree_iris = buildC45Tree(data_iris, labels_iris);
% 绘制决策树
figure;
subplot(2, 2, 1);
plotDecisionTree(id3_tree_weather, attributes_weather);
title('ID3 Tree - Weather Data');
subplot(2, 2, 2);
plotDecisionTree(c45_tree_weather, attributes_weather);
title('C4.5 Tree - Weather Data');
subplot(2, 2, 3);
plotDecisionTree(id3_tree_iris, attributes_iris);
title('ID3 Tree - Iris Data');
subplot(2, 2, 4);
plotDecisionTree(c45_tree_iris, attributes_iris);
title('C4.5 Tree - Iris Data');
```
### 结果分析
- 对不同的拟合结果进行分析比较,评估模型的性能。
- 可以使用准确率、召回率、F1分数等指标来评估模型的表现。
### 注意事项
- 实验中应遵守实验室实验规则,爱护实验室设备,保持卫生清洁。
- 各种实验仪器设备在观察使用后放归原处,不得损坏或随意放置。
希望这些步骤能帮助你在MATLAB中成功构建并可视化决策树!
阅读全文