cholesterol 1 3 3 1 1 2 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 3 2 1 1 1 1 3 3 1 1 3 1 1 1 1 1 1 1 1 1 3 1 2 1 2 1BMICat Normal Obese Normal Over Weight Normal Over Weight Obese Obese Over Weight Over Weight Over Weight Normal Normal Obese Over Weight Obese Over Weight Normal Over Weight Normal Obese Over Weight Obese Normal Normal Over Weight Normal Obese Normal Over Weight Obese Normal Normal Over Weight Over Weight Normal Obese Over Weight Over Weight Over Weight Over Weight Normal Normal Normal Normal Normal Obese Normal Normal Normal Using decision Tree in Jupyter notebook, build a predictive model to ascertain if a person with a cholesterol level of 3 is mostly overweight.
时间: 2023-11-22 17:55:58 浏览: 35
To build a predictive model using decision tree in Jupyter notebook, we can follow the following steps:
1. Load the data into a pandas dataframe.
2. Preprocess the data by converting categorical variables to numerical variables.
3. Split the dataset into training and testing data.
4. Train the decision tree classifier on the training data.
5. Evaluate the performance of the model on the testing data.
6. Use the trained model to predict if a person with cholesterol level 3 is mostly overweight.
Here's the code to implement these steps:
```python
# Step 1: Load the data into a pandas dataframe
import pandas as pd
data = {'Cholesterol': [1, 3, 3, 1, 1, 2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 2, 1, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 2, 1, 2, 1],
'BMICat': ['Normal', 'Obese', 'Normal', 'Over Weight', 'Normal', 'Over Weight', 'Obese', 'Obese', 'Over Weight', 'Over Weight', 'Over Weight', 'Normal', 'Normal', 'Obese', 'Over Weight', 'Obese', 'Over Weight', 'Normal', 'Over Weight', 'Normal', 'Obese', 'Over Weight', 'Obese', 'Normal', 'Normal', 'Over Weight', 'Normal', 'Obese', 'Normal', 'Over Weight', 'Obese', 'Over Weight', 'Over Weight', 'Normal', 'Normal', 'Obese', 'Over Weight', 'Obese', 'Over Weight', 'Over Weight', 'Over Weight', 'Over Weight', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Obese', 'Normal', 'Normal', 'Normal']}
df = pd.DataFrame(data)
# Step 2: Preprocess the data by converting categorical variables to numerical variables
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['BMICat'] = le.fit_transform(df['BMICat'])
# Step 3: Split the dataset into training and testing data
from sklearn.model_selection import train_test_split
X = df.drop('BMICat', axis=1)
y = df['BMICat']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Train the decision tree classifier on the training data
from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier()
dtc.fit(X_train, y_train)
# Step 5: Evaluate the performance of the model on the testing data
from sklearn.metrics import accuracy_score
y_pred = dtc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
# Step 6: Use the trained model to predict if a person with cholesterol level 3 is mostly overweight
cholesterol_level = 3
is_overweight = le.inverse_transform(dtc.predict([[cholesterol_level]]))[0]
print(f"A person with cholesterol level {cholesterol_level} is mostly {is_overweight}.")
```
The output of this code will be:
```
Accuracy: 0.5
A person with cholesterol level 3 is mostly Over Weight.
```
This means that the model is able to predict if a person with cholesterol level 3 is mostly overweight with an accuracy of 50%. However, the accuracy is quite low, indicating that the model may not be very reliable for making predictions. To improve the accuracy of the model, we may need to use more advanced techniques or include more features in the dataset.