# Changing 'character' columns to factor/numeric sales_dataset$Platform=as.factor(sales_dataset$Platform) sales_dataset$Year_of_Release=as.numeric(sales_dataset$Year_of_Release) sales_dataset$Genre=as.factor(sales_dataset$Genre) sales_dataset$Publisher=as.factor(sales_dataset$Publisher) sales_dataset$Developer=as.factor(sales_dataset$Developer) sales_dataset$Rating=as.factor(sales_dataset$Rating)
时间: 2023-12-05 18:03:12 浏览: 94
sys:1: DtypeWarning: Columns (7) have mixed types.Specify dtype option on import or set low_memory=False.
This warning message indicates that there are columns in the dataset that have mixed data types, such as numeric and string data types. This can cause issues with data analysis and modeling, so it is recommended to either specify the data type for these columns when importing the dataset or set the parameter "low_memory" to False to avoid this warning.
There is a excel that names Health_Data.csv, two columns of it named cholesterol and BMICat.Using decision Tree in Jupyter notebook to build a predictive model to ascertain if a person with a cholesterol level of 3 is mostly overweight.
Sure, here are the steps to build a decision tree model in Jupyter Notebook to predict if a person with a cholesterol level of 3 is mostly overweight using the Health_Data.csv file:
1. Import necessary libraries:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import graphviz
2. Load the dataset:
data = pd.read_csv('Health_Data.csv')
3. Prepare the data for modeling:
# Select the columns of interest
df = data[['cholesterol', 'BMICat']]
# Convert categorical column to numeric
df['BMICat'] = pd.factorize(df['BMICat'])[0]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['cholesterol']], df['BMICat'], test_size=0.2, random_state=42)
4. Train the decision tree model:
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
5. Make predictions on the test set:
y_pred = model.predict(X_test)
6. Evaluate the model's accuracy:
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
7. Visualize the decision tree:
dot_data = export_graphviz(model, out_file=None,
class_names=['Normal Weight', 'Overweight'],
filled=True, rounded=True,
graph = graphviz.Source(dot_data)
This will display a decision tree that shows the rules used by the model to predict if a person with a cholesterol level of 3 is mostly overweight.