决策树分类代码
时间: 2023-07-05 21:20:47 浏览: 40
以下是使用Python语言实现决策树分类的代码示例:
```python
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# 加载数据集
iris = load_iris()
X = iris.data
y = iris.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 定义决策树分类器
clf = DecisionTreeClassifier()
# 训练模型
clf.fit(X_train, y_train)
# 预测测试集
y_pred = clf.predict(X_test)
# 输出模型准确率
print("Accuracy:", clf.score(X_test, y_test))
```
在代码中,我们使用sklearn库中的DecisionTreeClassifier类实现决策树分类。首先,我们加载iris数据集,并将其划分为训练集和测试集。然后,定义分类器并使用训练集训练模型。最后,使用测试集进行预测,并输出模型的准确率。
相关问题
c45决策树分类代码
C4.5决策树是一种经典的机器学习算法,用于分类问题。下面是一个简单的C4.5决策树分类代码的示例:
```python
import numpy as np
class Node:
def __init__(self, feature=None, threshold=None, label=None):
self.feature = feature
self.threshold = threshold
self.label = label
self.left = None
self.right = None
class C45DecisionTree:
def __init__(self):
self.root = None
def entropy(self, labels):
_, counts = np.unique(labels, return_counts=True)
probabilities = counts / len(labels)
entropy = -np.sum(probabilities * np.log2(probabilities))
return entropy
def information_gain(self, data, labels, feature, threshold):
left_mask = data[:, feature] <= threshold
right_mask = data[:, feature] > threshold
left_labels = labels[left_mask]
right_labels = labels[right_mask]
left_entropy = self.entropy(left_labels)
right_entropy = self.entropy(right_labels)
parent_entropy = self.entropy(labels)
information_gain = parent_entropy - (len(left_labels) / len(labels)) * left_entropy - (len(right_labels) / len(labels)) * right_entropy
return information_gain
def choose_best_split(self, data, labels):
best_gain = 0
best_feature = None
best_threshold = None
num_features = data.shape[1]
for feature in range(num_features):
unique_values = np.unique(data[:, feature])
for threshold in unique_values:
gain = self.information_gain(data, labels, feature, threshold)
if gain > best_gain:
best_gain = gain
best_feature = feature
best_threshold = threshold
return best_feature, best_threshold
def build_tree(self, data, labels):
if len(np.unique(labels)) == 1:
return Node(label=labels[0])
best_feature, best_threshold = self.choose_best_split(data, labels)
left_mask = data[:, best_feature] <= best_threshold
right_mask = data[:, best_feature] > best_threshold
left_data = data[left_mask]
left_labels = labels[left_mask]
right_data = data[right_mask]
right_labels = labels[right_mask]
node = Node(feature=best_feature, threshold=best_threshold)
node.left = self.build_tree(left_data, left_labels)
node.right = self.build_tree(right_data, right_labels)
return node
def fit(self, data, labels):
self.root = self.build_tree(data, labels)
def predict(self, data):
predictions = []
for sample in data:
node = self.root
while node.label is None:
if sample[node.feature] <= node.threshold:
node = node.left
else:
node = node.right
predictions.append(node.label)
return predictions
# 示例用法
data = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
labels = np.array([0, 0, 1, 1])
tree = C45DecisionTree()
tree.fit(data, labels)
test_data = np.array([[1.5, 2.5], [3.5, 4.5]])
predictions = tree.predict(test_data)
print(predictions)
```
这段代码实现了C4.5决策树的构建和分类功能。你可以根据自己的数据集进行相应的调整和使用。
决策树分类代码树状图
当使用决策树进行分类时,可以使用Python中的Scikit-learn库来实现。以下是一个决策树分类的示例代码:
```python
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# 加载数据集
iris = load_iris()
X = iris.data
y = iris.target
# 创建决策树分类器
clf = DecisionTreeClassifier()
# 拟合模型
clf.fit(X, y)
# 可视化决策树
tree.plot_tree(clf)
```
上述代码首先加载了一个经典的鸢尾花数据集(load_iris)。然后,创建了一个决策树分类器(DecisionTreeClassifier)并使用数据集进行拟合。最后,使用tree.plot_tree函数可视化生成的决策树。
请注意,为了运行此代码,您需要安装Scikit-learn库。可以通过以下命令在终端或命令提示符中安装Scikit-learn:
```
pip install -U scikit-learn
```
希望这可以帮助到您!如果您有任何其他问题,请随时提问。