Changelog

Version 0.3-4 is considered pre-release of {SLmetrics}. We do not expect any breaking changes, unless a major bug/issue is reported and its nature forces breaking changes.

🔖 Version 0.3-4

This update has been focused on two three things:

  1. Optimization of the back-end by using Armadillo instead of Eigen.

  2. Streamlining and extending the documentation

  3. Making functions more flexible

As an example on the increased flexibility is the introduction of theestimator-argument in classification metrics - the new approach enables new additions of aggregation methods as the field evolves. The “old” approach were limited to three values NULL, TRUE and FALSE. Furthermore the function signatures of the generics have been made more flexible - this will enable possible wrapping packages to freely implement argument names off the generic.

Improvements

  • Armadillo backend: All functions have been ported to the C++ Armadillo library, and are heavily templated and Object Oriented. The functions are 5-20x faster than before.

  • Streamlined documentation: All documentation have been reworked, and are now using generic {roxygen2} templates. The new structure of the documentation is focused on shared documentation and therefore equal metrics like recall and sensitivity are aliased, and referenced differently - as a result there should be less noise in the documentation. The creating factor has been removed, and all examples are simplified.

  • Efficient multi-metric evaluation: The Precision-Recall and Receiver Operator Characteristics functions now accepts an indices argument. The indices takes an integer-matrix of corresponding to the sorted probabilities column-wise. See below:

## Classes and
## seed
set.seed(1903)
classes <- c("Kebab", "Falafel")

## Generate actual classes
## and response probabilities
actual_classes <- factor(
    x = sample(
      x = classes, 
      size = 1e2, 
      replace = TRUE, 
      prob = c(0.7, 0.3)
    )
)

response_probabilities <- ifelse(
    actual_classes == "Kebab", 
    rbeta(sum(actual_classes == "Kebab"), 2, 5), 
    rbeta(sum(actual_classes == "Falafel"), 5, 2)
)

## Construct response
## matrix
probability_matrix <- cbind(
    response_probabilities,
    1 - response_probabilities
)

## Calculate Precision-Recall
stopifnot(
    all.equal(
        target  = SLmetrics::pr.curve(actual_classes, probability_matrix),
        current = SLmetrics::pr.curve(actual_classes, probability_matrix, indices = SLmetrics::preorder(probability_matrix, TRUE))
    )
)

Depending on the system and data, there is a 3x gain in speed. This approach is highly efficient for cases where multiple AUC or curves are to be computed as it avoids sorting the same probability matrix more than once.

🐛-fixes

  • Relative Root Mean Squared Error: Normalizing the RMSE using therange, the range is always calculated by the distance betweenmax(actual) - min(actual) instead of the weighted distance.

🚀 New features

  • Hamming Loss: The fraction of the wrong labels to the total number of labels, i.e. , where is the target, is the prediction, and is the “Exclusive, or” operator that returns zero when the target and prediction are identical and one otherwise. The interface tohammingloss() is given below:

  • Tweedie Deviance: The interface to tweedie.deviance() is given below:

  • Gamma Deviance: The interface to gamma.deviance() is given below:

  • Poisson Deviance: The interface to poisson.deviance() is given below:

  • Mean Arctangent Absolute Error: The metric can be calculated as follows:

  • Geometric Mean Squared Error: The function have been implemented with logs and antilogs and is robust to zero-valued vectors. The metric can be calculated as follows:

🐛 Bug-fixes

💥 Breaking changes

  • Area under the curve: The new interface is given below:

  • Receiver Operating Characteristics: The new interface is given below:

  • Precision-Recall Curve: The new interface is given below:

  • Entropy: entropy() has been renamed to shannon.entropy(). The new interface to shannon.entropy() is given below:

The entropy functions have had the base-argument removed, and a new argument has been introduced: normalize. The normalize-parameter averages the calculated entropy across the desired dimensions.

  • Aggregation in classification metrics: The aggregation flag in the classification functions micro have been replaced with theinteger-argument estimator which falls back to class-wise evaluation if misspecified. The new interface is given below and is applicable to all functions that has this argument:

  • Poisson Logloss: The logloss() for count datalogloss.integer() were taking a matrix of probabilities. This has been changed to a vector of probabilities.

🔖 Version 0.3-3

Improvements

  • Initial CRAN release: The R-package has (finally) been submitted to CRAN and was released on 2025-03-18 with the classic “Thanks, on its way to CRAN” message.

  • S3 signatures: All S3-methods now have a generic signature, making it easier to navigate the functions argument-wise.

  • Exported Data: Three new datasets have been introduced to the package; the Wine Quality-, Obesity- and Banknote Authentication datasets. Each dataset is comes in named list where features and targets are stored separately. Below is an example from the Obesity dataset:

🚀 New features

New metrics

  • Poisson LogLoss: The logloss for count data has been implemented. This metric shares the method of logloss and can be used as follows:

  • Area under the Curve: A new set of functions have been introduced which calculates the weighted and unweighted area under the Precision-Recall and Receiver Operator Characteristics curve. See below:

Metric tools

A new family of Tools-functions are introduced with this update. This addition introduces unexported functions for constructing fast and memory efficient proprietary metrics. These functions are rewritten built-in functions from {stats} and family.

  • Covariance Matrix: A re-written stats::cov.wt(), using Rcpp. Example usage:

  • Area under the curve (AUC): The function calculates the area under the plot for bivariate curves for ordered and unordered x and y pairs. The function assumes that values are ordered and calculates the AUC directly - to control this behaviour use the ordered-argument in the function. Below is an example:

  • Sorting algorithms: A set of sorting and ordering algorithms applicable to matrices have been implemented. The use-case is currently limited to auc.foo, ROC and prROC functions. The algorithms can be used as follows:

💥 Breaking changes

  • Logloss: The argument pk has been replaced by response.

🔖 Version 0.3-2

Improvements

  • Regression metrics (See PR https://github.com/serkor1/SLmetrics/pull/64): All regression metrics have had their back-end optimized and are now 2-10 times faster than prior versions.

  • LAPACK/BLAS Support (https://github.com/serkor1/SLmetrics/pull/65): Added LAPACK/BLAS support for efficient matrix-operations.

  • OpenMP: Enabling/disabling OpenMP is now handled on the R-side and obeys suppressMessages(). See below:

🚀 New features

  • Available threads: The available number of threads can be retrieved using the openmp.threads(). See below:

🐛 Bug-fixes

  • Diagnostic Odds Ratio: The dor() is now returning a single<[numeric]>-value instead of k number of identical<[numeric]>-values.

💥 Breaking Changes

  • OpenMP Interface: The interface to enabling/disabling OpenMP support has been reworked and has a more natural flow. The new interface is described below:

To set the number of threads use the openmp.threads() as follows:

🔖 Version 0.3-1

Improvements

  • OpenMP Support (PR https://github.com/serkor1/SLmetrics/pull/40): {SLmetrics} now supports parallelization through OpenMP. The OpenMP can be utilized as follows:

  • Entropy with soft labels (https://github.com/serkor1/SLmetrics/issues/37): entropy(),cross.entropy() and relative.entropy() have been introduced. These functions are heavily inspired by {scipy}. The functions can be used as follows:

🐛 Bug-fixes

  • Plot-method in ROC and prROC (https://github.com/serkor1/SLmetrics/issues/36): Fixed a bug inplot.ROC() and plot.prROC() where if panels = FALSE additional lines would be added to the plot.

💥 Breaking changes

  • logloss: The argument response have ben renamed to qk as in the entropy()-family to maintain some degree of consistency.

  • entropy.factor(): The function have been deleted and is no more. This was mainly due to avoid the documentation from being too large. The logloss()-function replaces it.

🔖 Version 0.3-0

Improvements

New features

  • Relative Root Mean Squared Error: The function normalizes the Root Mean Squared Error by a factor. There is no official way of normalizing it - and in {SLmetrics} the RMSE can be normalized using three options; mean-, range- and IQR-normalization. It can be used as follows,

  • Log Loss: Weighted and unweighted Log Loss, with and without normalization. The function can be used as follows,

  • Weighted Receiver Operator Characteristics: weighted.ROC(), the function calculates the weighted True Positive and False Positive Rates for each threshold.

  • Weighted Precision-Recall Curve: weighted.prROC(), the function calculates the weighted Recall and Precision for each threshold.

🐛 Bug-fixes

  • Return named vectors: The classification metrics whenmicro == NULL were not returning named vectors. This has been fixed.

💥 Breaking Changes

  • Weighted Confusion Matrix: The w-argument in cmatrix() has been removed in favor of the more verbose weighted confusion matrix call weighted.cmatrix()-function. See below,

Prior to version 0.3-0 the weighted confusion matrix were a part of the cmatrix()-function and were called as follows,

This solution, although simple, were inconsistent with the remaining implementation of weighted metrics in {SLmetrics}. To regain consistency and simplicity the weighted confusion matrix are now retrieved as follows,

🔖 Version 0.2-0

:hammer_and_wrench: General

  • documentation: The documentation has gotten some extra love, and now all functions have their formulas embedded, the details section have been freed from a general description of [factor] creation. This will make room for future expansions on the various functions where more details are required.

  • Unit-testing: All functions are now being tested for edge-cases in balanced and imbalanced classification problems, and regression problems, individually. This will enable a more robust development process and prevent avoidable bugs.

Improvements

  • weighted classification metrics: The cmatrix()-function now accepts the argument w which is the sample weights; if passed the respective method will return the weighted metric. Below is an example using sample weights for the confusion matrix,

Calculating weighted metrics using the <factor>- or<cmatrix>-method,

Please note, however, that it is not possible to pass cmatrix()-intoweighted.accuracy(). See below:

🐛 Bug-fixes

  • Floating precision: Metrics would give different results based on the method used. This means that foo.cmatrix() and foo.factor() would produce different results (See Issue https://github.com/serkor1/SLmetrics/issues/16). This has been fixed by using higher precision Rcpp::NumericMatrix instead ofRcpp::IntegerMatrix.

  • Miscalculation of Confusion Matrix elements: An error in how FN,TN, FP and TP were calculated have been fixed. No issue has been raised for this bug. This was not something that was caught by the unit-tests, as the total samples were too high to spot this error. It has, however, been fixed now. This means that all metrics that uses these explicitly are now stable, and produces the desired output.

  • Calculation Error in Fowlks Mallows Index: A bug in the calculation of the fmi()-function has been fixed. Thefmi()-function now correctly calculates the measure.

  • Calculation Error in Pinball Deviance and Concordance Correlation Coefficient: See issue https://github.com/serkor1/SLmetrics/issues/19. Switched to unbiased variance calculation in ccc()-function. The pinball()-function were missing a weighted quantile function. The issue is now fixed.

  • Calculation Error in Balanced Accuracy: See issue https://github.com/serkor1/SLmetrics/issues/24. The function now correctly adjusts for random chance, and the result matches that of {scikit-learn}

  • Calculation Error in F-beta Score: See issue https://github.com/serkor1/SLmetrics/issues/23. The function werent respecting na.rm and micro, this has been fixed accordingly.

  • Calculation Error in Relative Absolute Error: The function was incorrectly calculating means, instead of sums. This has been fixed.

💥 Breaking changes

  • All regression metrics have had na.rm- and w-arguments removed. All weighted regression metrics have a separate function on theweighted.foo() to increase consistency across all metrics. The new function call is given below:

  • The rrmse()-function have been removed in favor of therrse()-function. This function was incorrectly specified and described in the package.

🔖 Version 0.1-1

:hammer_and_wrench: General

  • Backend changes: All pair-wise metrics are moved from{Rcpp} to C++, this have reduced execution time by half. All pair-wise metrics are now faster.

Improvements

  • NA-controls: All pair-wise metrics that don’t have amicro-argument were handling missing values as according to C++ and {Rcpp} internals. SeeIssue. Thank you @EmilHvitfeldt for pointing this out. This has now been fixed so functions use an na.rm-argument to explicitly control for this. See below,

🐛 Bug-fixes

  • The plot.prROC()- and plot.ROC()-functions now adds a line to the plot when panels = FALSE. See Issue https://github.com/serkor1/SLmetrics/issues/9.

📦 {SLmetrics} Version 0.1-0

{SLmetrics} is a collection of Machine Learning performance evaluation functions for supervised learning written in C++ with{Rcpp}. Visit the online documentation on Github pages.

ℹ️ Basic usage

Classification metrics

Regression metrics

Last updated