对于分箱后得到的结果给出python代码计算它的iv值
时间: 2024-05-01 07:24:10 浏览: 8
IV(Information Value)信息价值是一种用于评估单个预测变量的预测能力的指标,常用于特征筛选。下面给出一个简单的Python函数来计算IV值:
```python
import pandas as pd
import numpy as np
def calc_iv(df, feature, target):
"""
计算IV值
:param df: 数据框
:param feature: 待计算IV值的特征
:param target: 目标变量
:return: IV值
"""
lst = []
cols = ['Variable', 'Value', 'All', 'Bad']
for i in range(df[feature].nunique()):
val = list(df[feature].unique())[i]
lst.append([feature, val, len(df[df[feature] == val]), len(df[(df[feature] == val) & (df[target] == 1)])])
data = pd.DataFrame(lst, columns=cols)
data = data[data['Bad'] > 0]
data['Share'] = data['All'] / data['All'].sum()
data['Bad Rate'] = data['Bad'] / data['All']
data['Distribution Good'] = (data['All'] - data['Bad']) / (data['All'].sum() - data['Bad'].sum())
data['Distribution Bad'] = data['Bad'] / data['Bad'].sum()
data['WoE'] = np.log(data['Distribution Good'] / data['Distribution Bad'])
data['IV'] = (data['WoE'] * (data['Distribution Good'] - data['Distribution Bad'])).sum()
return data['IV'].iloc[0]
```
其中,`df`为数据框,`feature`为待计算IV值的特征,`target`为目标变量。函数中,首先统计出每个特征值的样本数量、坏样本数量,然后计算每个特征值的占比、坏样本率、好样本分布率、坏样本分布率、WOE值和IV值,最后返回IV值即可。
使用该函数计算分箱后的IV值:
```python
iv = calc_iv(df, 'feature', 'target')
print(iv)
```
其中,`df`为分箱后的数据框,`feature`为待计算IV值的特征,`target`为目标变量。