Anomaly Detection in R

Abstract

Anomaly detection problems have many different facets, and the detection techniques can be highly influenced by the way we define anomalies, type of input data, expected output, etc. This leads to wide variations in problem formulations, which need to be addressed through different analytical approaches. At present, there is a fairly rich variety of R software packages supporting anomaly detection tasks within various disciplinary contexts using different analytical techniques. Some of these use an approach to anomaly detection based on a forecast distribution. We locate over 75 R packages with anomaly detection capabilities via a comprehensive online search. We first present a structured and comprehensive discussion on the functionality and capability of these publicly available R packages for anomaly detection. Despite the large number of packages available, there are some anomaly detection challenges that are not supported with existing packages. We reduce this gap by introducing three new R packages for anomaly detection, oddstream, oddwater and stray, with special reference to their capabilities, competitive features and target applications. Package oddstream introduces a framework that provides early detection of anomalous behaviours within a large collection of streaming time series. This includes a novel approach that adapts to non-stationarity in the time series. Package oddwater provides a framework for early detection of outliers in water-quality data from in situ sensors caused by technical issues. Package stray provides a framework to detect anomalies in high dimensional data. Using various synthetic and real datasets, we demonstrate the wide applicability and usefulness of our proposed frameworks.

Date
Jul 9, 2019 12:00 AM — Jul 12, 2019 12:00 AM
Location
The congress centre - Pierre Baudis, Toulouse, France
Toulouse, France

Avatar
Priyanga Dilini Talagala
PhD in Statistics

My research interests include Computational Statistics, Anomaly Detection, Time Series Analysis and Machine Learning.