auc.pr.curve.factor.md
auc.pr.curve.factor
R Documentation
Area under the Precision Recall Curve
Description
A generic S3 function to compute the area under the precision recall
curve score for a classification model. This function dispatches to S3
methods in auc.pr.curve()
and performs no input validation. If you
supply NA values or vectors of unequal length (e.g.length(x) != length(y)
), the underlying C++
code may trigger
undefined behavior and crash your R
session.
Defensive measures
Because auc.pr.curve()
operates on raw pointers, pointer-level faults
(e.g. from NA or mismatched length) occur before any R
-level error
handling. Wrapping calls in try()
or tryCatch()
will not preventR
-session crashes.
To guard against this, wrap auc.pr.curve()
in a "safe" validator that
checks for NA values and matching length, for example:
safe_auc.pr.curve <- function(x, y, ...) {
stopifnot(
!anyNA(x), !anyNA(y),
length(x) == length(y)
)
auc.pr.curve(x, y, ...)
}
Apply the same pattern to any custom metric functions to ensure input
sanity before calling the underlying C++
code.
Visualizing area under the precision recall curve
Use pr.curve()
to construct the data.frame and use plot to visualize
the area under the curve.
Efficient multi-metric evaluation
To avoid sorting the same probability matrix multiple times (once per
class or curve), you can precompute a single set of sort indices and
pass it via the indices
argument. This reduces the overall cost from
O(K·N log N) to O(N log N + K·N).
## presort response
## probabilities
indices <- preorder(response, decreasing = TRUE)
## evaluate area under the precision recall curve
auc.pr.curve(actual, response, indices = indices)
Usage
## S3 method for class 'factor'
auc.pr.curve(
actual,
response,
estimator = 0L,
method = 0L,
indices = NULL,
...
)
Arguments
actual
A vector length n
, and k
levels. Can be of integer or factor.
response
A n \times k
<double>-matrix of
predicted probabilities. The i
-th row should
sum to 1 (i.e., a valid probability distribution over the k
classes). The first column corresponds to the
first factor level in actual
, the second column to the
second factor level, and so on.
estimator
An <integer>-value of length 1
(default: 0
).
0 - a named <double>-vector of length k (class-wise)
1 - a <double> value (Micro averaged metric)
2 - a <double> value (Macro averaged metric)
method
A <double> value (default: 0
).
Defines the underlying method of calculating the area under the curve.
If 0
it is calculated using thetrapezoid
-method, if 1
it is
calculated using the step
-method.
indices
An optional n \times k
matrix of
<integer> values of sorted response probability indices.
...
Arguments passed into other methods.
Value
If estimator
is given as
0: a named <double>-vector of length k
1: a <double> value (Micro averaged metric)
2: a <double> value (Macro averaged metric)
References
James, Gareth, et al. An introduction to statistical learning. Vol. 112. No. 1. New York: springer, 2013.
Hastie, Trevor. "The elements of statistical learning: data mining, inference, and prediction." (2009).
Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." the Journal of machine Learning research 12 (2011): 2825-2830.
Examples
## Classes and
## seed
set.seed(1903)
classes <- c("Kebab", "Falafel")
## Generate actual classes
## and response probabilities
actual_classes <- factor(
x = sample(
x = classes,
size = 1e2,
replace = TRUE,
prob = c(0.7, 0.3)
)
)
response_probabilities <- ifelse(
actual_classes == "Kebab",
rbeta(sum(actual_classes == "Kebab"), 2, 5),
rbeta(sum(actual_classes == "Falafel"), 5, 2)
)
## Construct response
## matrix
probability_matrix <- cbind(
response_probabilities,
1 - response_probabilities
)
## Evaluate performance
SLmetrics::auc.pr.curve(
actual = actual_classes,
response = probability_matrix
)
</div>
Last updated