class: center, middle, inverse, title-slide # Anomaly Detection in R ###
Priyanga Dilini Talagala
with
Rob J. Hyndman
Kate Smith-Miles ###
10.07.2019 --- # Anomaly Detection .pull-left[ ### Temporal data ] .pull-right[ ### High dimensional data ] <img src="figure/outtype-1.png" width="0.8\textwidth" style="display: block; margin: auto;" /><img src="figure/outtype-2.png" width="0.8\textwidth" style="display: block; margin: auto;" /><img src="figure/outtype-3.png" width="0.8\textwidth" style="display: block; margin: auto;" /> <img src="figure/outtype.png" width="100%" style="display: block; margin: auto;" /> --- class: center, clear <p><font size=12> <span style="color:blue"> stray (STR</span>eam <span style="color:blue">A</span>nomal <span style="color:blue">Y</span>) </p> <img src="fig/stray_logo.png" width="30%" style="display: block; margin: auto;" /> `devtools::install_github("pridiltal/stray")` --- # stray <img src="fig/stray_plot1.png" width="50%" style="display: block; margin: auto;" /> - Normalize the columns of the data. - This prevents variables with large variances having disproportional influence on Euclidean distances. --- # stray <img src="fig/stray_plot2b.png" width="50%" style="display: block; margin: auto;" /> - Calculate the nearest neighbour distance --- # Why not "nearest neighbour" distances? <img src="fig/stray_plot2.png" width="50%" style="display: block; margin: auto;" /> --- # stray <img src="fig/stray_plot5.png" width="50%" style="display: block; margin: auto;" /> - Select the <span style="color:red"> k nearest neighbour </span> distance with the <span style="color:red"> maximum gap </span> -- - Use extreme value theory (EVT) to calculate an anomalous threshold --- # stray <img src="fig/stray_plot6.png" width="50%" style="display: block; margin: auto;" /> `devtools::install_github("pridiltal/stray")` <br/> `outliers <- find_HDoutliers(data, method = "knn_maxdiff", knnsearchtype = "FNN_auto")` <br/> `display_HDoutliers(data, outliers)` --- background-image:url('fig/sydney.jpeg') background-position: 70% 110% background-size: 100% class: right, top, clear ### Anomalous series within a space of a collection of series --- # Feature based representation of time series .pull-left[ - Mean - Variance - Changing variance in remainder - Level shift using rolling window - Variance change - Strength of linearity - Strength of curvature ] .pull-right[ - Strength of spikiness - Burstiness of time series (Fano Factor) - Minimum - Maximum - The ratio between 50% trimmed mean and the arithmetic mean - Moment - Ratio of means of data that is below and above the global mean ] --- # Approach 1: Using stray .pull-left[ <img src="fig/P2_plot22.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="fig/stray.gif" width="50%" style="display: block; margin: auto;" /> ] -- - Apply stray algorithm to identify anomalous series `tsfeatures <- oddstream::extract_tsfeatures(ts_data)` <br/> `outliers <- stray::find_HDoutliers(tsfeatures, method = "knn_maxdiff", knnsearchtype = "FNN_auto")` <br/> `stray::display_HDoutliers(tsfeatures, outliers)` -- - Use a moving window to deal with streaming data. --- class:: center, clear .pull-left[ <img src="fig/P2_plot21a.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="fig/P2_plot21b.png" width="100%" style="display: block; margin: auto;" /> ] --- class: center, clear <p><font size=12> <span style="color:blue">oddstream </br> (O</span>utlier <span style="color:blue">D</span>etection in <span style="color:blue">D</span>ata <span style="color:blue">STREAM</span>s) </p> <img src="fig/oddstream_logo.png" width="30%" style="display: block; margin: auto;" /> `devtools::install_github("pridiltal/oddstream")` Priyanga Dilini Talagala, Rob J Hyndman, Kate Smith-Miles, Sevvandi Kandanaarachchi and Mario A Mu<f1>oz (2019) [Anomaly detection in streaming nonstationary temporal data](https://www.researchgate.net/publication/323694683_Anomaly_Detection_in_Streaming_Nonstationary_Temporal_Data). <span style="color:blue"> **Journal of Computational and Graphical Statistics, to appear.**</span> --- # Dimension reduction for time series .pull-left[ `load(train_data)` <img src="fig/4_typical.png" width="70%" style="display: block; margin: auto;" /> ] -- .pull-right[ `tsfeatures <- oddstream::extract_tsfeatures` </br> `(train_data)` <img src="fig/5_high_typical.gif" width="40%" style="display: block; margin: auto;" /> ] -- </br> `pc<- oddstream::get_pc_space(tsfeatures)`</br> `oddstream::plotpc(pc$pcnorm)` <img src="fig/6_typicalfeature.png" width="25%" style="display: block; margin: auto;" /> --- # Anomalous threshold calculation - **Anomalous threshold calculation `\(\longrightarrow\)` extreme value theory** - Estimate the probability density function of the 2D PC space `\(\longrightarrow\)` Kernel density estimation - Draw a large number N of extremes `\((arg min_{x\in X}[f_{2}(x)])\)` from the estimated probability density function - Define a `\(\Psi\)`-transform space, using the `\(\Psi\)`-transformation defined by (Clifton et al., 2011) <img src="fig/10_psitrans.png" width="50%" style="display: block; margin: auto;" /> - `\(\Psi\)`-transform maps the density values back into space into which a Gumbel distribution can be fitted. --- class: center, top, clear `oddstream::find_odd_streams(train_data, test_stream)` <img src="fig/18_oddstream_mvtsplot.gif" width="50%" style="display: block; margin: auto;" /> .pull-left[ <img src="fig/16_oddstream_out_loc.gif" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="fig/17_oddstream_pcplot.gif" width="90%" style="display: block; margin: auto;" /> ] --- class: clear, center, middle <img src="fig/JCGS.png" width="20%" style="display: block; margin: auto;" /> Priyanga Dilini Talagala, Rob J Hyndman, Kate Smith-Miles, Sevvandi Kandanaarachchi and Mario A Mu<f1>oz (2019) [Anomaly detection in streaming nonstationary temporal data](https://www.researchgate.net/publication/323694683_Anomaly_Detection_in_Streaming_Nonstationary_Temporal_Data). <span style="color:blue"> **Journal of Computational and Graphical Statistics**</span> --- # Anomaly detection in water quality data - Technical issues in the sensor equipment (low battery power, biofouling of the probes, errors in calibration, rust, sensor maintenance activities etc.) </br></br></br></br> <img src="fig/sensor_issues.png" width="100%" style="display: block; margin: auto;" /> --- # What is an anomaly - Water-quality observations that were affected by <span style="color:red">technical errors </span> in the sensor equipment <img src="fig/water_original.png" width="100%" style="display: block; margin: auto;" /> --- # What is an anomaly - Water-quality observations that were affected by <span style="color:red">technical errors </span> in the sensor equipment <img src="fig/water_out.png" width="100%" style="display: block; margin: auto;" /> --- class: center, clear <p><font size=12> <span style="color:blue"> oddwater </br> <p><font size=6> (<span style="color:blue">O</span>utlier <span style="color:blue">D</span>etection in <span style="color:blue">D</span>ata from <span style="color:blue">WATER</span>-quality sensors) </p> <img src="fig/oddwater_logo.png" width="30%" style="display: block; margin: auto;" /> `devtools::install_github("pridiltal/oddwater")` </br> `oddwater::explore_data()` --- class: clear **Identify the data features that differentiate outlying instances from typical behaviours** <img src="fig/water_out.png" width="100%" /> --- class: clear **Identify the data features that differentiate outlying instances from typical behaviours** <img src="fig/water_hd1.png" width="100%" /> --- class: clear **Apply statistical transformations to make the outlying instances stand out in transformed data space** <img src="fig/trans.png" width="100%" /> --- class: clear **Apply statistical transformations to make the outlying instances stand out in transformed data space** <img src="fig/water_hd2.png" width="100%" /> --- class: clear **Calculate unsupervised outlier scores for the observations in the transformed data space** `trans_data <- oddwater::transform_data(data)` </br> `outliers <- stray::find_HDoutliers(trans_data, method = "knn_sum", knnsearchtype = "FNN_brute")` <img src="fig/oddwater_plot.png" width="60%" style="display: block; margin: auto;" /> --- # What next? </br> <img src="fig/future.png" width="100%" style="display: block; margin: auto;" /> --- class: center, middle # Thank You <img src="fig/packages.png" width="90%" style="display: block; margin: auto;" />
dilini.talagala@monash.edu
pridiltal
https://prital.netlify.com </br> (Slides and papers available) Slides created via xaringan: (https://github.com/pridiltal/MonashEBS_xaringan)