TimeSeries

TimeSeries / ForecastEvaluation / 0.3.3

README.md

Overview

This algorithm is used to evaluate time series / sequential forecasting algorithms for performance, accuracy and precision. This algorithm can also be used as an important component in an unsupervised anomaly detection process!

CSV Format

This algorithm takes evaluation data in csv form; the algorithm expects each variable to be delimited using a comma and each timestep to be on a separate line. For more information, take a look at our univariate and bivariate examples.

Algorithms FAQ

As each algorithm is different, this section will define any caveats or interesting attributes for our supported algorithms.

TimeSeries/Forecast

This algorithm uses a univariate polynomial approximation function to replicate any trend and periodicity in the training data to assist in forecasting.

What kind of data does this algorithm expect?

This algorithm expects univariate (1 variable) dataset. An example of this would be the univariate example.

Does this algorithm use model files?

No, it's not a machine learning based algorithm which means there are no RNN model files. Ensure that your eval_percentage variable reflects how much data you wish to use to create your polynomial approximation function.

If I want to know more about this algorithm, where should I look?

For more information on this algorithm, check out it's algorithm page here: TimeSeries/Forecast.

Any special caveats of this algorithm?

This algorithm is well suited for business based use cases with regular seasonality, trends etc. It does struggle with other types of sequential datasets.

TimeSeries/GenerativeForecast

This algorithm uses an LSTM neural network architecture that is capable of forecasting nonlinear and complex trends. When using custom data, you must build a model first. Read the algorithm documentation for more info.

Does this algorithm use model files?

Yes it does, this algorithm utilizes built in memory to remember and preserve data that it has already seen!

What kind of data does this algorithm expect?

This algorithm was designed to be robust and flexible in it's data inputs; it can take any sort of CSV file with continuous variables. However as it does use a checkpoint model, your evaluation data should proceed directly after the data used to train / update your checkpoint model.

Any special caveats of this algorithm?

Yes, the algorithm makes really great graphs of it's forecasts! After performing an evaluation in advanced mode, write down the UUID associated with any interesting forecast and check your TimeSeries/GenerativeForecasting temp collection to access your graph images.

Examples

Example 1

Sinewave timeseries/generativeForecast, simple mode

Input

{  
   "num_of_evals":5,
   "data_path":"data://timeseries/generativeforecasting/sinewave_v1.0_t1.csv",
   "forecast_length":35,
   "algorithm":"timeseries/generativeforecast",
   "model_path":"data://timeseries/generativeforecasting/sinewave_v1.0_t0.t7"
}

Output

{
  "error": {
    "max": 0.00857300213419307,
    "mean": 0.00299875514566992,
    "min": 0.0004365765109465927
  }
}

Example 2

Api Requests timeseries/generativeForecast, advanced mode

Input

{  
   "algorithm":"timeseries/generativeforecast",
   "model_path":"data://timeseries/generativeforecasting/apidata_v0.2.5_t0.t7",
   "advanced_mode":"true",
   "data_path":"data://timeseries/generativeforecasting/apidata_v0.2.5_t1.csv",
   "num_of_evals":10,
   "forecast_length":35
}

Output

{  
   "complete_data_path":"data://.algo/temp/57001458-c046-4de5-a031-a3fd6e3f6338.json",
   "summary":{  
      "error":{  
         "max":{  
            "id":"29e3db37-cf39-4388-996d-271bcd15781d",
            "value":0.02067989932028304
         },
         "mean":0.009846650607575207,
         "min":{  
            "id":"8043e3db-2043-4800-9f5c-94b080508e14",
            "value":0.002214220459913783
         },
         "std":0.0057925733397795574
      },
      "exec_time":{  
         "max":31.954874515533447,
         "mean":23.33813188076019,
         "min":14.700079441070557,
         "std":5.356676354101113
      }
   }
}

Example 3

API requests timeseries/forecast, advanced mode with custom output

Input

{  
   "algorithm":"timeseries/forecast",
   "model_path":"data://timeseries/generativeforecasting/apidata_v0.2.5_t0.t7",
   "advanced_mode":"true",
   "data_path":"data://timeseries/generativeforecasting/apidata_v0.2.5_t1.csv",
   "num_of_evals":10,
   "forecast_length":35,
   "data_output_path": "data://.my/example_collection/apidata_v0.2.5_eval.json"
}

Output

{
  "complete_data_path": "data://.my/example_collection/apidata_v0.2.5_eval.json",
  "summary": {
    "error": {
      "max": {
        "id": "34d5eadf-b112-48aa-955f-078b5796e64b",
        "value": 0.019366085544574863
      },
      "mean": 0.008697436763929537,
      "min": {
        "id": "32bd3a8e-c588-4d53-8a92-fb28be16192b",
        "value": 0.0007278606148913183
      },
      "std": 0.0058660767961214375
    },
    "exec_time": {
      "max": 1.7069735527038574,
      "mean": 0.9620259523391724,
      "min": 0.23518633842468264,
      "std": 0.5176264802321938
    }
  }
}

Error Formula

We calculate error by first normalizing the evaluation data. This is done to enable comparisons across datasets with potentially wildly diverging ranges. Once our data is normalized we then calculate the Mean Absolute Error (MAE) for each forecast. Which all together with our normalized data would be called the Normal Mean Absolute Error. In most cases the resulting error value will be less than 1, however in some cases the error term might be greater.

Complete evaluation data

You might be asking; what are those complete_data_path files, and what do they look like? In our evaluation, we make many forecast algorithm requests - we do this to flatten out the variance from any single good/bad forecast. But what do we do with rest of the algorithm data that is returned? We save all of it! When using advanced mode we return all forecast data in a single json object, which allows you to dive even further into the evaluation! Lets take a quick look at an example we already looked at:

{  
   "algorithm":"timeseries/generativeforecast",
   "model_path":"data://timeseries/generativeforecasting/apidata_v0.2.5_t0.t7",
   "advanced_mode":"true",
   "data_path":"data://timeseries/generativeforecasting/apidata_v0.2.5_t1.csv",
   "num_of_evals":2,
   "forecast_length":5
} 

As you might have noticed, we never defined the complete_data_path variable, this means that we need to look in our algorithm temp collection. Lets open up that file and see what it looks like:

[  
   {  
      "error":0.0018789254473265983,
      "exec_time":26.187873363494873,
      "id":"f5daf027-1c95-4d8b-bd54-040bc7b16dee",
      "algo_response":{  
         "envelope":[  
            {  
               "second_deviation":{  
                  "lower_bound":[  
                     -760.3936167986036,
                     -1095.5086156265447,
                     -939.5766748371902,
                     -1167.3063141199827,
                     -702.9676420146652
                  ],
                  "upper_bound":[  
                     308.5193675310255,
                     454.2168492691229,
                     402.373616975862,
                     742.9122223231078,
                     955.3420385295578
                  ]
               },
               "mean":[  
                  -225.93712463378907,
                  -320.64588317871096,
                  -268.6015289306641,
                  -212.1970458984375,
                  126.18719825744628
               ],
               "standard_deviation":[  
                  267.2282460824073,
                  387.43136622391694,
                  335.48757295326305,
                  477.55463411077267,
                  414.57742013605576
               ],
               "first_deviation":{  
                  "lower_bound":[  
                     -493.1653707161963,
                     -708.0772494026279,
                     -604.0891018839271,
                     -689.7516800092102,
                     -288.3902218786095
                  ],
                  "upper_bound":[  
                     41.29112144861821,
                     66.78548304520598,
                     66.88604402259898,
                     265.35758821233514,
                     540.764618393502
                  ]
               },
               "variable":"Requests made"
            }
         ],
         "saved_graph_path":"data://.algo/temp/f5daf027-1c95-4d8b-bd54-040bc7b16dee.png"
      },
      "index":1277
   },
   {  
      "error":0.007211677972411254,
      "exec_time":6.704785346984863,
      "id":"bc1b3da1-f29c-4e0b-b33e-4279a2c21cd4",
      "algo_response":{  
         "envelope":[  
            {  
               "second_deviation":{  
                  "lower_bound":[  
                     502.85361303671993,
                     912.1484019093724,
                     1704.9310144197402,
                     2221.680316049337,
                     2480.419836612773
                  ],
                  "upper_bound":[  
                     1049.152685791405,
                     1847.28668109844,
                     2413.6306066740094,
                     3213.890973013163,
                     3823.720007137227
                  ]
               },
               "mean":[  
                  776.0031494140625,
                  1379.7175415039062,
                  2059.280810546875,
                  2717.78564453125,
                  3152.069921875
               ],
               "standard_deviation":[  
                  136.57476818867127,
                  233.78456979726695,
                  177.17489806356724,
                  248.05266424095652,
                  335.8250426311135
               ],
               "first_deviation":{  
                  "lower_bound":[  
                     639.4283812253911,
                     1145.9329717066394,
                     1882.1059124833075,
                     2469.7329802902937,
                     2816.2448792438868
                  ],
                  "upper_bound":[  
                     912.5779176027338,
                     1613.502111301173,
                     2236.455708610442,
                     2965.8383087722063,
                     3487.8949645061134
                  ]
               },
               "variable":"Requests made"
            }
         ],
         "saved_graph_path":"data://.algo/temp/bc1b3da1-f29c-4e0b-b33e-4279a2c21cd4.png"
      },
      "index":1162
   }
]

Looks like a bunch of valuable information; if you compare above with the generativeForecast algorithm output schema, there is something extra here. The index variable defines the location in our evaulation data array that our forecast operation uses as a break point, we pass all data to the algorithm up to this point, and tell it to predict the next N steps. This means that if you want to replicate any particular forecast, you just need the index and the algorithm.

IO

Need more info? below are our API docs!

Input

ParameterDescriptionType
data_pathA data collection URI pointing to a properly formatted sequential csv file.String
algorithmThe name of the algorithm you wish to evaluate, check below for a list of currently supported algorithms.String
model_pathIf the chosen algorithm utilizes checkpoint models, this must be provided in your input.String
num_of_evalsTotal number of independent evaluations to perform. Each evaluation will have a different forecast point, taken at random from the dataset provided with data_path. Higher numbers yield more reliable results. defaults to 10Int
forecast_lengthThe amount of steps into the future to evaluate each forecast, choose a number that makes sense for your algorithm. defaults to 25.Int
advanced_modeIf you want all available information about your evaulation, set this to "true". Otherwise, you will receive a much simpler output. Defaults to "false".String
eval_percentageThe percentage of the provided data to use for evaluation. Lower values mean the forecasting algorithm is exposed to more data before evaluating, important for evaluating algorithms without checkpoint models. defaults to 0.85Float

Output

This algorithm returns two different types of output, depending on mode - simple and advanced.

Simple Output

{
  "error": {
    "max": 0.00857300213419307,
    "mean": 0.00299875514566992,
    "min": 0.0004365765109465927
  }
}
ParameterDescriptionType
errorThe error object wrapper, contains basic error info.Object
maxThe maximum detected error across all forecasts.Float
meanThe mean or average error across all forecasts.Float
minThe minimum detected error across all forecasts.Float

Advanced Output

{
  "complete_data_path": "data://.my/example_collection/apidata_v0.2.5_eval.json",
  "summary": {
    "error": {
      "max": {
        "id": "34d5eadf-b112-48aa-955f-078b5796e64b",
        "value": 0.019366085544574863
      },
      "mean": 0.008697436763929537,
      "min": {
        "id": "32bd3a8e-c588-4d53-8a92-fb28be16192b",
        "value": 0.0007278606148913183
      },
      "std": 0.0058660767961214375
    },
    "exec_time": {
      "max": 1.7069735527038574,
      "mean": 0.9620259523391724,
      "min": 0.23518633842468264,
      "std": 0.5176264802321938
    }
  }
}
ParameterDescriptionType
complete_data_paththe data collection URI pointing to where the Complete Evaluation Data file is located.String
summaryThe forecast summary objectObject
summary/errorThis algorithm measures the normal mean absolute error between forecasts and their expected targets.Object
summary/error/meanThe mean or average error across all forecasts.Float
summary/error/stdThe standard deviation of the error across all forecasts. A larger than normal std might hint at an anomaly in the data.Float
summary/error/(max or min)/idthe forecast uuid that where the (max or min) error was detected.String
summary/error/(max or min)/valueThe (max or min) error value for the evaluation, refer to the uuid to find the related forecast in the Complete Evaluation Data file.Float
summary/exec_timeThis algorithm measures algorithm execution time, which can be incredibly useful as a metric to compare performance between algorithms.Object
summary/exec_time/maxThe maximum execution time measured over all forecast algorithm requests.Float
summary/exec_time/meanthe mean or average execution time measured across all forecast algorithm requests.Float
summary/exec_time/minThe minimum execution time measured over all forecast algorithm requests.Float