改进代码，对'dti', 'installment', 'employmentTitle', 'interestRate', 'annualIncome', 'revolBal', 'revolUtil', 'Creditage', 'loanAmnt','avg_income','avg_loanAmnt','mean_interestRate','rest_money','ficoRange_mean','rest_Revol'对分箱

以下是对这些特征进行分箱的代码： ```python import pandas as pd import numpy as np from scipy import stats def binning(col, cut_points, labels=None): # Define min and max values: minval = col.min() maxval = col.max() # create list by adding min and max values to cut_points break_points = [minval] + cut_points + [maxval] # if no labels provided, use default labels 0 ... (n-1) if not labels: labels = range(len(cut_points)+1) # Binning using cut function of pandas colBin = pd.cut(col, bins=break_points, labels=labels, include_lowest=True) return colBin def apply_binning(data): # create new dataframe to store binned features binned_data = pd.DataFrame(index=data.index) # list of features to be binned features_to_bin = ['dti', 'installment', 'interestRate', 'annualIncome', 'revolBal', 'revolUtil', 'Creditage', 'loanAmnt'] # define cut points for each feature and apply binning function for feature in features_to_bin: if feature == 'dti': cut_points = [-1, 10, 20, 30, 40, 50, np.inf] elif feature == 'installment': cut_points = [-1, 200, 400, 600, 800, 1000, 1200, 1400, np.inf] elif feature == 'interestRate': cut_points = [-1, 5, 10, 15, 20, 25, 30, np.inf] elif feature == 'annualIncome': cut_points = [-1, 20000, 40000, 60000, 80000, 100000, 120000, 140000, 160000, 180000, 200000, np.inf] elif feature == 'revolBal': cut_points = [-1, 5000, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000, np.inf] elif feature == 'revolUtil': cut_points = [-1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, np.inf] elif feature == 'Creditage': cut_points = [-1, 10, 20, 30, 40, 50, 60, 70, 80, np.inf] elif feature == 'loanAmnt': cut_points = [-1, 5000, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000, np.inf] binned_data[feature] = binning(data[feature], cut_points) # create new binned features by calculating average values binned_data['avg_income'] = pd.cut(data['annualIncome'], bins=[-1, 40000, 80000, 120000, 160000, np.inf], labels=[1, 2, 3, 4, 5], include_lowest=True) binned_data['avg_loanAmnt'] = pd.cut(data['loanAmnt'], bins=[-1, 10000, 20000, 30000, 40000, 50000, np.inf], labels=[1, 2, 3, 4, 5, 6], include_lowest=True) binned_data['mean_interestRate'] = pd.cut(data['interestRate'], bins=[-1, 10, 15, 20, 25, 30], labels=[1, 2, 3, 4, 5], include_lowest=True) # create new binned feature by calculating remaining money binned_data['rest_money'] = pd.cut(data['annualIncome'] - data['loanAmnt'], bins=[-1, 0, 10000, 20000, 30000, 40000, 50000, np.inf], labels=[1, 2, 3, 4, 5, 6, 7], include_lowest=True) # create new binned feature by calculating mean FICO score binned_data['ficoRange_mean'] = data['ficoRangeHigh'] - data['ficoRangeLow'] # create new binned feature by calculating remaining revolving balance binned_data['rest_Revol'] = pd.cut(data['revolBal'] - data['loanAmnt'], bins=[-1, 0, 5000, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000, np.inf], labels=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], include_lowest=True) return binned_data ```

阅读全文

改进代码，对'dti', 'installment', 'employmentTitle', 'interestRate', 'annualIncome', 'revolBal', 'revolUtil', 'Creditage', 'loanAmnt','avg_income','avg_loanAmnt','mean_interestRate','rest_money','ficoRange_mean','rest_Revol'对分箱

相关推荐

dti.zip_DCTI_DLTI_DLTI DCTI_DTI

Fiber-Tracking.rar_DTI_Python纤维_streamline_纤维_纤维模型

ranging_files_DTI_雷达_

Opel vectra 2.0DTI 0281001873.zip_C++_Builder_

matlab齿轮代码-dti-error:计算来自dtiInit的张量拟合与扩散加权成像数据之间的RMSE

遍历 DTI-振荡器 Blau_Ergodic_DTI - MetaTrader 5脚本.zip

DTI matlab 实现代码

raspberry_pi_qt_dti_workshop

DTI 21_7474_00_e 电线电缆强制性产品认证新技术规定.rar

来自 T1 / DTI 的准确电磁头模型和 EEG 导联场矩阵_python_代码_下载

DTI Fiber tracking原代码

matlab求平均值的代码男-DTI:SBDTI管道

NIRAL_DTI_PIPELINE:执行DTI管道的Python脚本

EST_DTI：DTI Digital数字广播电视台

改进的iFCM_S算法：多通道DTI特征下的脑组织自动分割

获取DTI FW图代码

free_water_dti.exe

MrVista软件对DTI数据预处理步骤，详细点

教师节主题班会.pptx

最新推荐

教师节主题班会.pptx

正整数数组验证库：确保值符合正整数规则

管理建模和仿真的文件

【损失函数与随机梯度下降】：探索学习率对损失函数的影响，实现高效模型训练

在ADS软件中，如何选择并优化低噪声放大器的直流工作点以实现最佳性能？

系统移植工具集：镜像、工具链及其他必备软件包

"互动学习：行动中的多样性与论文攻读经历"

【损失函数与批量梯度下降】：分析批量大小对损失函数影响，优化模型学习路径

在设计高性能模拟电路时，如何根据应用需求选择合适的运算放大器，并评估供电对电路性能的影响？

掌握JavaScript加密技术：客户端加密核心要点