TimeSeries

TimeSeries / OpenAnomalyDetection / 1.0.0

README.md

Open Anomaly Detection is an open source multivariate, portable and customizable Prediction based Anomaly Detection algorithm, powered by our OpenForecast model. You can see it in action here.

Introduction

That's a lot of stuff, lets break down each of those terms:

  • Multivariate - Being able to predict anomalies in univariate problems (situations with only 1 variable) are not that common. More often we need the capability to support multiple independent variables, this algorithm can do that.
  • Portable - This algorithm is designed to run best on the Algorithmia platform, however it's capable of running on any linux based system.
  • Open Source - As you can see, this algorithm is fully open source and available for public consumption! If you have any ideas on how to make it even better, please feel free to file a PR.
  • Customizable - This model is designed to be trained and specialized to your data. All variables are exposed and available for tinkering.
  • Prediction based - This algorithm uses a Prediction based replication failure metric for measuring anomalies, this means that we go deeper than just looking at periods or peaks - this algorithm learns what normal data looks like, and what constitutes an anomaly within your data.

To do all of this, we heavily the Pytorch machine learning framework, along with the OpenForecast Algorithm.

More Introduction

This algorithm requires an OpenForecast model to be constructed. If you already have a model constructed for your data, excellent! Otherwise, we'll have to create one on-the-fly. Once a OpenForecast model has been loaded, the algorithm begins to test every part of the sequence for anomalies. It does this by using the OpenForecast model to forecast the next n steps at each point along the sequence. If the predicted sequence is different than what is observed, we measure a deviation - if that deviation is very different from the average, we describe that as an anomaly.

Getting Started guide

This algorithm has 2 modes, and a few optional parameters. As previously mentioned, we require an openForecast model - and if you don't provide one, we train one for you. We also require that your data is formatted into our standard data format, this is to ensure compatibility with the openForecast model - if you need help here, please take a look at the two examples in the openForecast tools directory. Besides that, we also have an optional graphical representation output (where we create a nice graph like the one shown above for visual aide) - however for performance optimizations we made such a representation optional. Let's take a look at an example.

Examples

Example 1

Input:


{  
   "data_path":"data://TimeSeries/GenerativeForecasting/m4_daily.json",
   "model_input_path":"data://TimeSeries/GenerativeForecasting/m4_daily_0.1.0.zip",
   "graph_save_path":"data://.algo/TimeSeries/OpenAnomalyDetection/temp/graph_file.png",
   "sigma_threshold":3,
   "variable_index":4,
   "calibration_percentage":0.1
}

Output:

{  
   "anomalous_regions":[  
      {  
         "avg_sigma":3.660663907275452,
         "lower":101,
         "max_sigma":4.103103630663716,
         "upper":127
      },
      {  
         "avg_sigma":3.0623424236141226,
         "lower":213,
         "max_sigma":3.0882601499796563,
         "upper":234
      },
      {  
         "avg_sigma":3.733702727997744,
         "lower":295,
         "max_sigma":4.329303116744169,
         "upper":321
      },
      {  
         "avg_sigma":4.24959275689059,
         "lower":386,
         "max_sigma":5.114906226119579,
         "upper":412
      },
      {  
         "avg_sigma":3.3073228279258013,
         "lower":437,
         "max_sigma":3.509339547886696,
         "upper":460
      },
      {  
         "avg_sigma":3.9148479154691462,
         "lower":471,
         "max_sigma":4.258586517731104,
         "upper":498
      }
   ],
   "graph_save_path":"data://.algo/TimeSeries/OpenAnomalyDetection/temp/graph_file.png"
}

Example 2

Input:

{  
   "data_path":"data://TimeSeries/GenerativeForecasting/formatted_data_rossman_10.json",
   "graph_save_path":"data://.algo/TimeSeries/OpenAnomalyDetection/temp/rossman_graph.png",
   "calibration_percentage":0.1
}

Output:

{  
   "anomalous_regions":[  
      {  
         "avg_sigma":2.411330730458791,
         "lower":5,
         "max_sigma":2.842559054162925,
         "upper":26
      },
      {  
         "avg_sigma":2.9487772489340407,
         "lower":40,
         "max_sigma":4.369209703720929,
         "upper":64
      }
   ],
   "graph_save_path":"data://.algo/TimeSeries/OpenAnomalyDetection/temp/rossman_graph.png"
}

IO Schema

Input

ParameterTypeDescriptionDefault if applicable
data_pathStringThe data pathN/A
model_input_pathStringThe data API path to the trained model you've previously built.N/A
graph_save_pathStringThe output path for a visual graph describing the found Anomalies.N/A
sigma_thresholdFloatThe minimum sigma deviation from the mean to consider an event anomalous.2.0
variable_indexIntThe specific dimension to detect anomalies on for a given dataset.1
calibration_percentageFloatThe percentage of the start of the dataset used to calibrate the model. This data won't be scored for anomalies.0.1

Output

ParameterTypeDescription
graph_save_pathStringIf you set a graph_save_path, then we successfully saved a graph at this data API .
anomalous_regionsList[Anomaly]A json List object containing all detected anomalies.

Anomaly

ParameterTypeDescription
avg_sigmaFloatThe anomaly's average sigma deviation from the norm.
max_sigmaFloatThe anomaly's maximum measured sigma deviation from the norm.
upperIntThe anomaly's upper limit (along the x axis), as an index value
lowerIntThe anomaly's lower limit (along the x axis), as an index value