metrics.roc_auc_score
时间: 2023-04-26 22:00:18 浏览: 129
metrics.roc_auc_score是一个用于计算ROC曲线下面积(ROC AUC)的函数,通常用于二元分类模型的性能评估。ROC AUC是一个介于0和1之间的值,数值越大表示模型的性能越好。当ROC AUC等于0.5时,表示模型的预测性能与随机猜测相当,而当ROC AUC等于1时,表示模型的预测性能完美。
相关问题
sklearn.metrics.roc_auc_score
`sklearn.metrics.roc_auc_score` 是 Scikit-learn 中用于计算二分类模型 ROC 曲线下面积(AUC)的函数。
在使用该函数时,你需要提供真实标签 `y_true` 和预测标签 `y_pred`。其中,`y_true` 是一个长度为 `n_samples` 的一维数组,表示每个样本的真实标签;`y_pred` 是一个长度为 `n_samples` 的一维数组,表示模型对每个样本的预测标签。
下面是一个简单的例子:
```python
from sklearn.metrics import roc_auc_score
y_true = [0, 0, 1, 1]
y_pred = [0.1, 0.4, 0.35, 0.8]
roc_auc = roc_auc_score(y_true, y_pred)
print("ROC AUC score:", roc_auc)
```
在这个例子中,`y_true` 表示四个样本的真实标签,分别为 0、0、1、1;`y_pred` 表示模型对这四个样本的预测概率,分别为 0.1、0.4、0.35、0.8。函数的返回值 `roc_auc` 表示模型的 ROC 曲线下面积,即 0.75。
Sklearn.metrics.roc_auc_score模块中的源代码
以下是sklearn.metrics.roc_auc_score模块的源代码:
```python
def roc_auc_score(y_true, y_score, average='macro', sample_weight=None,
max_fpr=None, multi_class='raise', labels=None):
"""Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC)
from prediction scores.
Note: this implementation can be used with binary, multiclass and
multilabel classification, but some restrictions apply (see Parameters).
Read more in the :ref:`User Guide <roc_metrics>`.
Parameters
----------
y_true : array-like of shape (n_samples,) or (n_samples, n_classes)
True labels or binary label indicators. The binary and multiclass cases
expect labels with shape (n_samples,) while the multilabel case expects
binary label indicators with shape (n_samples, n_classes).
y_score : array-like of shape (n_samples,) or (n_samples, n_classes)
Target scores. In the binary and multilabel cases, these can be either
probability estimates or non-thresholded decision values (as returned
by `decision_function` on some classifiers). In the multiclass case,
these must be probability estimates which sum to 1. The binary case
expects a shape (n_samples,), and the scores must be the scores of
the class with the greater label. The multiclass and multilabel
cases expect a shape (n_samples, n_classes).
average : {'micro', 'macro', 'samples', 'weighted'} or None, \
default='macro'
If ``None``, the scores for each class are returned. Otherwise,
this determines the type of averaging performed on the data:
``'micro'``:
Calculate metrics globally by counting the total true positives,
false negatives and false positives.
``'macro'``:
Calculate metrics for each label, and find their unweighted
mean. This does not take label imbalance into account.
``'weighted'``:
Calculate metrics for each label, and find their average, weighted
by support (the number of true instances for each label). This
alters 'macro' to account for label imbalance; it can result in an
F-score that is not between precision and recall.
``'samples'``:
Calculate metrics for each instance, and find their average
(only meaningful for multilabel classification).
sample_weight : array-like of shape (n_samples,), default=None
Sample weights.
max_fpr : float or None, default=None
If not ``None``, the standardized partial AUC [2]_ over the range
[0, max_fpr] is returned. For the multiclass case, ``max_fpr``
should be either ``None`` or ``1.0`` as partial AUC makes
sense for binary classification only.
multi_class : {'raise', 'ovr', 'ovo'}, default='raise'
Multiclass only. Determines the type of configuration to use.
The default value raises an error, so either ``'ovr'`` or
``'ovo'`` must be passed explicitly.
``'ovr'``:
Computes ROC curve independently for each class. For each class,
the binary problem y_true == i or not is solved and the
corresponding ROC curve is computed and averaged across
classes. This is a commonly used strategy for multiclass
or multi-label classification problems.
``'ovo'``:
Computes pairwise ROC curve for each pair of classes. For each
pair of classes, the binary problem y_true == i or y_true == j
is solved and the corresponding ROC curve is computed. The
micro-averaged ROC curve is computed from the individual curves
and hence is agnostic to the class balance.
labels : array-like of shape (n_classes,), default=None
Multiclass only. List of labels to index ``y_score`` used for
multiclass. If ``None``, the lexical order of ``y_true`` is used to
index ``y_score``.
Returns
-------
auc : float or dict (if ``multi_class`` is ``'ovo'`` or ``'ovr'``)
AUC of the ROC curves.
If ``multi_class`` is ``'ovr'``, then returns an array of shape
``(n_classes,)`` such that each element corresponds to the AUC of
the ROC curve for a specific class.
If ``multi_class`` is ``'ovo'``, then returns a dict where the keys
are ``(i, j)`` tuples and the values are the AUCs of the ROC curve
for the binary problem of predicting class ``i`` vs. class ``j``.
See also
--------
roc_curve : Compute Receiver operating characteristic (ROC) curve.
roc_auc : Compute Area Under the Receiver Operating Characteristic Curve
(ROC AUC) from prediction scores
Examples
--------
>>> import numpy as np
>>> from sklearn.metrics import roc_auc_score
>>> y_true = np.array([0, 0, 1, 1])
>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> roc_auc_score(y_true, y_scores)
0.75
>>> y_true = np.array([0, 0, 1, 1])
>>> y_scores = np.array([[0.1, 0.9], [0.4, 0.6], [0.35, 0.65], [0.8, 0.2]])
>>> roc_auc_score(y_true, y_scores, multi_class='ovo')
0.6666666666666667
>>> roc_auc_score(y_true, y_scores[:, 1])
0.75
"""
# validation of the input y_score
if not (y_true.shape == y_score.shape):
raise ValueError("y_true and y_score have different shape.")
if (not is_multilabel(y_true) and not is_multiclass(y_true)):
# roc_auc_score only supports binary and multiclass classification
# for the time being
if len(np.unique(y_true)) == 2:
# Only one class present in y_true. ROC AUC score is not defined
# in that case. Note that raising an error is consistent with the
# deprecated roc_auc_score behavior.
raise ValueError(
"ROC AUC score is not defined in that case: "
"y_true contains only one label ({0}).".format(
format_label(y_true[0])
)
)
else:
raise ValueError(
"ROC AUC score is not defined in that case: "
"y_true has {0} unique labels. ".format(len(np.unique(y_true))) +
"ROC AUC score is defined only for binary or multiclass "
"classification where the number of classes is greater than "
"one."
)
if multi_class == 'raise':
raise ValueError("multi_class must be in ('ovo', 'ovr')")
elif multi_class == 'ovo':
if is_multilabel(y_true):
# check if max_fpr is valid in this case
if max_fpr is not None and (max_fpr == 0 or max_fpr > 1):
raise ValueError("Expected max_fpr in range (0, 1], got: %f" %
max_fpr)
return _multiclass_roc_auc_score_ovr(y_true, y_score,
average, sample_weight,
max_fpr=max_fpr)
else:
return _binary_roc_auc_score(y_true, y_score, average,
sample_weight, max_fpr=max_fpr)
elif multi_class == 'ovr':
if is_multilabel(y_true):
return _multilabel_roc_auc_score_ovr(y_true, y_score,
average, sample_weight)
else:
return _multiclass_roc_auc_score_ovr(y_true, y_score,
average, sample_weight,
labels=labels)
else:
raise ValueError("Invalid multi_class parameter: {0}".format(multi_class))
```
这段代码实现了计算ROC AUC的功能,支持二元、多类和多标签分类。其中,分为'ovo'和'ovr'两种多类模式,'ovo'表示一对一,'ovr'表示一对多。
阅读全文