Sklearn.metrics.roc_auc_score模块中的源代码
时间: 2024-06-09 22:06:26 浏览: 174
RandomForest_sklearn.zip_sklearn_sklearn RF_southern9qq_随机森林
以下是sklearn.metrics.roc_auc_score模块的源代码:
```python
def roc_auc_score(y_true, y_score, average='macro', sample_weight=None,
max_fpr=None, multi_class='raise', labels=None):
"""Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC)
from prediction scores.
Note: this implementation can be used with binary, multiclass and
multilabel classification, but some restrictions apply (see Parameters).
Read more in the :ref:`User Guide <roc_metrics>`.
Parameters
----------
y_true : array-like of shape (n_samples,) or (n_samples, n_classes)
True labels or binary label indicators. The binary and multiclass cases
expect labels with shape (n_samples,) while the multilabel case expects
binary label indicators with shape (n_samples, n_classes).
y_score : array-like of shape (n_samples,) or (n_samples, n_classes)
Target scores. In the binary and multilabel cases, these can be either
probability estimates or non-thresholded decision values (as returned
by `decision_function` on some classifiers). In the multiclass case,
these must be probability estimates which sum to 1. The binary case
expects a shape (n_samples,), and the scores must be the scores of
the class with the greater label. The multiclass and multilabel
cases expect a shape (n_samples, n_classes).
average : {'micro', 'macro', 'samples', 'weighted'} or None, \
default='macro'
If ``None``, the scores for each class are returned. Otherwise,
this determines the type of averaging performed on the data:
``'micro'``:
Calculate metrics globally by counting the total true positives,
false negatives and false positives.
``'macro'``:
Calculate metrics for each label, and find their unweighted
mean. This does not take label imbalance into account.
``'weighted'``:
Calculate metrics for each label, and find their average, weighted
by support (the number of true instances for each label). This
alters 'macro' to account for label imbalance; it can result in an
F-score that is not between precision and recall.
``'samples'``:
Calculate metrics for each instance, and find their average
(only meaningful for multilabel classification).
sample_weight : array-like of shape (n_samples,), default=None
Sample weights.
max_fpr : float or None, default=None
If not ``None``, the standardized partial AUC [2]_ over the range
[0, max_fpr] is returned. For the multiclass case, ``max_fpr``
should be either ``None`` or ``1.0`` as partial AUC makes
sense for binary classification only.
multi_class : {'raise', 'ovr', 'ovo'}, default='raise'
Multiclass only. Determines the type of configuration to use.
The default value raises an error, so either ``'ovr'`` or
``'ovo'`` must be passed explicitly.
``'ovr'``:
Computes ROC curve independently for each class. For each class,
the binary problem y_true == i or not is solved and the
corresponding ROC curve is computed and averaged across
classes. This is a commonly used strategy for multiclass
or multi-label classification problems.
``'ovo'``:
Computes pairwise ROC curve for each pair of classes. For each
pair of classes, the binary problem y_true == i or y_true == j
is solved and the corresponding ROC curve is computed. The
micro-averaged ROC curve is computed from the individual curves
and hence is agnostic to the class balance.
labels : array-like of shape (n_classes,), default=None
Multiclass only. List of labels to index ``y_score`` used for
multiclass. If ``None``, the lexical order of ``y_true`` is used to
index ``y_score``.
Returns
-------
auc : float or dict (if ``multi_class`` is ``'ovo'`` or ``'ovr'``)
AUC of the ROC curves.
If ``multi_class`` is ``'ovr'``, then returns an array of shape
``(n_classes,)`` such that each element corresponds to the AUC of
the ROC curve for a specific class.
If ``multi_class`` is ``'ovo'``, then returns a dict where the keys
are ``(i, j)`` tuples and the values are the AUCs of the ROC curve
for the binary problem of predicting class ``i`` vs. class ``j``.
See also
--------
roc_curve : Compute Receiver operating characteristic (ROC) curve.
roc_auc : Compute Area Under the Receiver Operating Characteristic Curve
(ROC AUC) from prediction scores
Examples
--------
>>> import numpy as np
>>> from sklearn.metrics import roc_auc_score
>>> y_true = np.array([0, 0, 1, 1])
>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> roc_auc_score(y_true, y_scores)
0.75
>>> y_true = np.array([0, 0, 1, 1])
>>> y_scores = np.array([[0.1, 0.9], [0.4, 0.6], [0.35, 0.65], [0.8, 0.2]])
>>> roc_auc_score(y_true, y_scores, multi_class='ovo')
0.6666666666666667
>>> roc_auc_score(y_true, y_scores[:, 1])
0.75
"""
# validation of the input y_score
if not (y_true.shape == y_score.shape):
raise ValueError("y_true and y_score have different shape.")
if (not is_multilabel(y_true) and not is_multiclass(y_true)):
# roc_auc_score only supports binary and multiclass classification
# for the time being
if len(np.unique(y_true)) == 2:
# Only one class present in y_true. ROC AUC score is not defined
# in that case. Note that raising an error is consistent with the
# deprecated roc_auc_score behavior.
raise ValueError(
"ROC AUC score is not defined in that case: "
"y_true contains only one label ({0}).".format(
format_label(y_true[0])
)
)
else:
raise ValueError(
"ROC AUC score is not defined in that case: "
"y_true has {0} unique labels. ".format(len(np.unique(y_true))) +
"ROC AUC score is defined only for binary or multiclass "
"classification where the number of classes is greater than "
"one."
)
if multi_class == 'raise':
raise ValueError("multi_class must be in ('ovo', 'ovr')")
elif multi_class == 'ovo':
if is_multilabel(y_true):
# check if max_fpr is valid in this case
if max_fpr is not None and (max_fpr == 0 or max_fpr > 1):
raise ValueError("Expected max_fpr in range (0, 1], got: %f" %
max_fpr)
return _multiclass_roc_auc_score_ovr(y_true, y_score,
average, sample_weight,
max_fpr=max_fpr)
else:
return _binary_roc_auc_score(y_true, y_score, average,
sample_weight, max_fpr=max_fpr)
elif multi_class == 'ovr':
if is_multilabel(y_true):
return _multilabel_roc_auc_score_ovr(y_true, y_score,
average, sample_weight)
else:
return _multiclass_roc_auc_score_ovr(y_true, y_score,
average, sample_weight,
labels=labels)
else:
raise ValueError("Invalid multi_class parameter: {0}".format(multi_class))
```
这段代码实现了计算ROC AUC的功能,支持二元、多类和多标签分类。其中,分为'ovo'和'ovr'两种多类模式,'ovo'表示一对一,'ovr'表示一对多。
阅读全文