There is a excel that names Health_Data.csv, two columns of it named cholesterol and BMICat.Using decision Tree in Jupyter notebook to build a predictive model to ascertain if a person with a cholesterol level of 3 is mostly overweight.
时间: 2023-07-02 21:08:53 浏览: 176
文件包含读取Csv,按一定要求筛选,导出到excel三个模块,适合大多数的数据筛选情况
Sure, here are the steps to build a decision tree model in Jupyter Notebook to predict if a person with a cholesterol level of 3 is mostly overweight using the Health_Data.csv file:
1. Import necessary libraries:
```python
import pandas as pd
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import graphviz
```
2. Load the dataset:
```python
data = pd.read_csv('Health_Data.csv')
```
3. Prepare the data for modeling:
```python
# Select the columns of interest
df = data[['cholesterol', 'BMICat']]
# Convert categorical column to numeric
df['BMICat'] = pd.factorize(df['BMICat'])[0]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['cholesterol']], df['BMICat'], test_size=0.2, random_state=42)
```
4. Train the decision tree model:
```python
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
```
5. Make predictions on the test set:
```python
y_pred = model.predict(X_test)
```
6. Evaluate the model's accuracy:
```python
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
```
7. Visualize the decision tree:
```python
dot_data = export_graphviz(model, out_file=None,
feature_names=['cholesterol'],
class_names=['Normal Weight', 'Overweight'],
filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(dot_data)
graph
```
This will display a decision tree that shows the rules used by the model to predict if a person with a cholesterol level of 3 is mostly overweight.
阅读全文