jaccard.md
jaccard
R Documentation
Jaccard Index
Description
A generic S3 function to compute the jaccard index score for a
classification model. This function dispatches to S3 methods injaccard()
and performs no input validation. If you supply NA values or
vectors of unequal length (e.g. length(x) != length(y)
), the
underlying C++
code may trigger undefined behavior and crash your R
session.
Defensive measures
Because jaccard()
operates on raw pointers, pointer-level faults (e.g.
from NA or mismatched length) occur before any R
-level error handling.
Wrapping calls in try()
or tryCatch()
will not prevent R
-session
crashes.
To guard against this, wrap jaccard()
in a "safe" validator that
checks for NA values and matching length, for example:
Apply the same pattern to any custom metric functions to ensure input
sanity before calling the underlying C++
code.
Efficient multi-metric evaluation
For multiple performance evaluations of a classification model, first
compute the confusion matrix once via cmatrix()
. All other performance
metrics can then be derived from this one object via S3 dispatching:
The jaccard.factor()
method calls cmatrix()
internally, so
explicitly invoking jaccard.cmatrix()
yourself avoids duplicate
computation, yielding significant speed and memory effciency gains when
you need multiple evaluation metrics.
Usage
Arguments
...
Arguments passed on to jaccard.factor
,weighted.jaccard.factor
, jaccard.cmatrix
actual,predicted
A pair of <integer> or <factor> vectors of length n
, and k
levels.
estimator
An <integer>-value of length 1
(default: 0
).
0 - a named <double>-vector of length k (class-wise)
1 - a <double> value (Micro averaged metric)
2 - a <double> value (Macro averaged metric)
na.rm
A <logical> value of length 1
(default: TRUE). If TRUE, NA values are removed from the computation.
This argument is only relevant when micro != NULL
. Whenna.rm = TRUE
, the computation corresponds tosum(c(1, 2, NA), na.rm = TRUE) / length(na.omit(c(1, 2, NA)))
.
When na.rm = FALSE
, the computation corresponds tosum(c(1, 2, NA), na.rm = TRUE) / length(c(1, 2, NA))
.
w
A <double> vector of sample weights.
x
A confusion matrix created cmatrix()
.
Value
If estimator
is given as
0 - a named <double> vector of length k
1 - a <double> value (Micro averaged metric)
2 - a <double> value (Macro averaged metric)
Other names
The specificity has other names depending on research field:
Critical Success Index,
csi()
Threat Score,
tscore()
References
James, Gareth, et al. An introduction to statistical learning. Vol. 112. No. 1. New York: springer, 2013.
Hastie, Trevor. "The elements of statistical learning: data mining, inference, and prediction." (2009).
Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." the Journal of machine Learning research 12 (2011): 2825-2830.
Examples
Last updated