jaccard.md

jaccard

R Documentation

Jaccard Index

Description

A generic S3 function to compute the jaccard index score for a classification model. This function dispatches to S3 methods injaccard() and performs no input validation. If you supply NA values or vectors of unequal length (e.g. length(x) != length(y)), the underlying C++ code may trigger undefined behavior and crash your R session.

Defensive measures

Because jaccard() operates on raw pointers, pointer-level faults (e.g. from NA or mismatched length) occur before any R-level error handling. Wrapping calls in try() or tryCatch() will not prevent R-session crashes.

To guard against this, wrap jaccard() in a "safe" validator that checks for NA values and matching length, for example:

safe_jaccard <- function(x, y, ...) {
  stopifnot(
    !anyNA(x), !anyNA(y),
    length(x) == length(y)
  )
  jaccard(x, y, ...)
}

Apply the same pattern to any custom metric functions to ensure input sanity before calling the underlying C++ code.

Efficient multi-metric evaluation

For multiple performance evaluations of a classification model, first compute the confusion matrix once via cmatrix(). All other performance metrics can then be derived from this one object via S3 dispatching:

## compute confusion matrix
confusion_matrix <- cmatrix(actual, predicted)

## evaluate jaccard index
## via S3 dispatching
jaccard(confusion_matrix)

## additional performance metrics
## below

The jaccard.factor() method calls cmatrix() internally, so explicitly invoking jaccard.cmatrix() yourself avoids duplicate computation, yielding significant speed and memory effciency gains when you need multiple evaluation metrics.

Usage

## Generic S3 method
## for Jaccard Index
jaccard(...)

## Generic S3 method
## for weighted Jaccard Index
weighted.jaccard(...)

Arguments

...

Arguments passed on to jaccard.factor,weighted.jaccard.factor, jaccard.cmatrix

actual,predicted

A pair of <integer> or <factor> vectors of length n, and k levels.

estimator

An <integer>-value of length 1 (default: 0).

0 - a named <double>-vector of length k (class-wise)
1 - a <double> value (Micro averaged metric)
2 - a <double> value (Macro averaged metric)

na.rm

A <logical> value of length 1 (default: TRUE). If TRUE, NA values are removed from the computation. This argument is only relevant when micro != NULL. Whenna.rm = TRUE, the computation corresponds tosum(c(1, 2, NA), na.rm = TRUE) / length(na.omit(c(1, 2, NA))). When na.rm = FALSE, the computation corresponds tosum(c(1, 2, NA), na.rm = TRUE) / length(c(1, 2, NA)).

w

A <double> vector of sample weights.

x

A confusion matrix created cmatrix().

Value

If estimator is given as

0 - a named <double> vector of length k
1 - a <double> value (Micro averaged metric)
2 - a <double> value (Macro averaged metric)

Other names

The specificity has other names depending on research field:

Critical Success Index, csi()
Threat Score, tscore()

References

James, Gareth, et al. An introduction to statistical learning. Vol. 112. No. 1. New York: springer, 2013.

Hastie, Trevor. "The elements of statistical learning: data mining, inference, and prediction." (2009).

Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." the Journal of machine Learning research 12 (2011): 2825-2830.

Examples

## Classes and
## seed
set.seed(1903)
classes <- c("Kebab", "Falafel")

## Generate actual
## and predicted classes
actual_classes <- factor(
x = sample(x = classes, size = 1e3, replace = TRUE),
levels = c("Kebab", "Falafel")
)

predicted_classes <- factor(
x = sample(x = classes, size = 1e3, replace = TRUE),
levels = c("Kebab", "Falafel")
)

## Evaluate performance
SLmetrics::jaccard(
   actual    = actual_classes, 
   predicted = predicted_classes
)


</div>

Previousjaccard.factor.md Nextweighted.jaccard.factor.md

Last updated 2 months ago