sine wave splash

GenerativeForecast is a multivariate, incrementally trainable auto-regressive forecasting algorithm.


What does all that mean? lets break it down.

  • Multivatiate - This means that the algorithm can be trained to forecast multiple independent variables at once. This can be very useful for forecasting real world events like earthquake forecasting, along with more economically rewarding activities like economic asset price prediction.
  • Incrementally trainable - This algorithm can be incrementally trained with new data without needing to start from scratch. It's quite possible to automatically update your model or models on a daily/hourly/minute basis with new data that you can then use to quickly forecast the next N steps. It should also be mentioned that you don't have to update your model before making a forecast! That's right, you can pass new data into the 'forecast' method and it will update the model state without running a backpropegation operation.
  • Auto-regressive - Forecasting the future can be tricky, particularly when you aren't sure how far into the future you wish to look. This algorithm uses it's previously predicted data points to help understand what the future looks like. For more information check out this post.

Lets get started in figuring out how this all works.

Getting Started Guide

This algorithm has two main modes, forecast and train. To create a forecast you need to create a checkpoint model, which you can make by running a train operation over your input data. The pipelines you create can be somewhat complex here so we're going to go over everything as much as we can.

If at any time you are unsure as to what a particular variable does, be sure to take a look at the IO Schema at the bottom of this description.


First lets look at the train mode and how to get setup.

First time training

When training a model on your data for the first time, there are some important things to consider.

  • First and foremost, the data must be a file in csv format.
  • In this csv file each column denotes an independent variable, and each row denotes a data point.
  • Your data should be continuous, step wise operators make training more difficult.
  • If you'd like your variables to be described in forecasts, be sure to start your training data csv with headers that define your input.
  • Each point in your dataset must be in temporal order.

Let's take a quick look with an example of a sine curve:

initial training data for sine curve dataset

Simple right? Lets also explore another dataset with two independent variables (this one is based on bitcoin price and transaction volume):

initial training data for bitcoin dataset

Notice the headers? You only have to define headers when training a brand new model, the network file itself will store your headers to keep things simple. Don't worry if all your csv data has headers, our algorithm is smart enough to figure that out! What if you don't have headers? No problem, the algorithm has default variable names to use if they're missing.

Some important parameters initial training parameters to consider:

  • layer_width - Defines how much knowledge your model is able to grasp. It's tough to overfit with our data augmentation strategies so using a larger number here for challenging datasets can certainly help.
  • attention_width - Large attention vectors can help improve your models recall of events, but potentially at the expense of learning how to manage that internally in LSTM layers. In the initial release this is a hard attention vector pointed to the last N steps, where N is the width of the vector, this may change in the future.
  • future_beam_width - This provides us a tool to force the model to predict future events besides just the first step. We use a custom loss function to take large beams into consideration.

That is all we need to define our model. Here are the remaining training settings that need to be defined, they mostly pretain to the training process itself and can be adjusted in future training steps:

  • input_dropout - How can we make sure that our training process is kept on task? We do this by forcing the model to stoichastically feed it it's own predictions as input! This keeps the training model focused on auto-regressive forcasting, and along with io_noise prevents overfitting.
  • io_noise - We add gaussian noise to the input and output for our network during training, and can be kept during forecasting as well. We do this to prevent overfitting, and to force the model to learn large scale trends rather than micro-fluctuations. For most tasks 0.04 or 4% noise is sufficient.
  • checkpoint_output_path - the all important output path! We recommend a checkpoint name that contains version information and dataset information so you don't accidentally overwrite or misplace an important checkpoint in the future.
Example IO





For our initial training we specify all network definition parameters, along with a checkpoint output path, and a data path. Keep note of that saved filepath, we're going to need that later.

Incremental Training

So you have a model that you've already trained already, and it's been giving you great forecasts. But you've noticed new trends evolving in your timeseries that your model isn't able to predict. Wouldn't it be great if there was a way to incrementally update your model? There is! :smile:

When you already have a trained model, you can incrementally retrain it by simply providing that model URI with the checkpoint_input_path key in your input, that's it! All network definition parameters are preserved so there's no need to write them all out again. Here is a simple list of parameters you can adjust during incremental training:

  • input_dropout - How can we make sure that our training process is kept on task? We do this by forcing the model to stoichastically feed it it's own predictions as input. This keeps the training model focused on auto-regressive forcasting, and along with io_noise, prevents overfitting.
  • io_noise - We add gaussian noise to the input and output for our network during training, and can be kept during forecasting as well. We do this to prevent overfitting, and to force the model to learn large scale trends rather than micro-fluctuations. For most tasks 0.04 or 4% noise is sufficient.

Example IO





And just like that you've updated your model to detect new trends.


So you've trained a model and now you want to start exploring your data, lets take a look at how to make forecasts.

There are two ways to create a forecast, by using an up-to-date model, or by incrementally updating an existing model (no gradient updates) with new data. We call these two methods tip-of-checkpoint forecasting and incremental update forecasting. What's the difference? Simply by providing a data_path URL you're automatically telling the algorithm you'd like to create an incremental inference forecast! Let's take a look at some key parameters for forecasting:

  • forecast_size - Defines the number of steps into the future to forecast.
  • iterations - Defines the number of independently calculated forecast operations to perform, each forecast is initialized by perturbing the memory state of the checkpoint with io_noise to generate a monte carlo forecast envelope.
  • io_noise - Defines how much noise is added to the initial memory state, and the attention vector. Larger values force the network to deviate faster but may reflect in a more accurate forcast.
  • graph_save_path - If you'd like to have a pretty graph output like above, provide a data API URI here. Graphical output can be very useful for diagnosing and visualizing training issues.

For more information, take a look at the Forecasting IO table

Lets take a quick look at a tip-of-checkpoint example:

tip-of-checkpoint IO



    "forecast_size": 10,
    "iterations": 25,
    "io_noise": 0.05



So in this example we have the envelope coordinates defined as multiple lists of forecast_size in length.

Now lets take a look at an example with incremental update forecasting:

incremental update IO



    "forecast_size": 10,
    "iterations": 25,
    "io_noise": 0.05



The graphs are different! This is beacuse when you pass a data_path as input, it automatically updates the model state to the end of that data_path. When incrementally updating, always ensure that your next dataset is in sequential order from the previous dataset.

What happens if we reuse that saved model and run a tip-of-checkpoint forecast? Well let's find out!

inferred graph


It works! No backpropegation required for simple updates like this. If your datas signals or trends do change or drift overtime, it is highly recommended to run a incremental training operation perioidically to ensure forecast accuracy.

IO Schema

Common Table

ParameterTypeDescriptionDefault if applicable
checkpoint_output_pathStringDefines the output path for your trained model file.N/A
checkpoint_input_pathStringdefines the input path for your existing model file.N/A
data_pathStringThe data connector URI(data://, s3://, dropbox://, etc) path pointing to training or evaluation data.N/A

Forecasting Table


ParameterTypeDescriptionDefault if applicable
iterationsIntegerThe number of independent forecast operations used to create your monte carlo envelope10
graph_save_pathStringThe output path for your Monte Carlo forecast graph.N/A


checkpoint_output_pathStringThis is the path you provided as checkpoint_output_path, useful as a reminder
envelopeList[Envelope]A list of Envelope objects for each dimension of your data, see below for more info.

Envelope Object

Each independent variable has it's own Envelope object, with the variable name defined by variable.

variableStringThe name of the variable for this dimension, defined during initial training from your csv header.
meanList[Float]The mean for each point in your forecast, for this variable
standard_deviationList[Float]The Standard deviation for each point in your forecast, for this variable.
first_deviationDeviationThe upper and lower bounds for the first standard deviation from the mean, for this variable.
second_deviationDeviationThe upper and lower bounds for the second standard deviation from the mean, for this variable.

Deviation Object

upper_boundList[Float]The upper bound values for this deviation.
lower_boundList[Float]The lower bound values for this deviation.

Training Table


ParameterTypeDescriptionDefault if applicable
iterationsIntegerDefines the number of iterations per epoch for training your model. Bigger numbers makes training take longer, but can yield better results.10
layer_widthIntegerDefines your models layer width, layer depth is automatically determined by the number of independent variables in your dataset.51
attention_widthIntegerDefines your networks hard attention beam width. Larger beams are useful for complex data models.25
future_beam_widthIntegerSimilar to the attention_width but this defines how many steps we predict in one training step.10
input_dropoutFloatThis defines the percentage of input that we "drop out" during training.0.45
io_noiseFloatDefines the percentage of Gaussian noise added to the training data to perturb the results. Both noise and input_dropout help the model generalize to future trends.0.04


checkpoint_output_pathStringThis is the path you provided as checkpoint_output_path, useful as a reminder
final errorFloatThe best generated model's error, the lower the better.



Frequently Asked Questions

Why are my forecast images always scaled between 0 and 1?

Great question! We do this so that multivariate graphs are on the same scale. If you have 2 or more independent variables it can be quite difficult to represent them in their original domain. Rest assured that the envelope returns denormalized data.

How do I know what parameters to use for attention_width, future_beam_width, etc?

Unfortunately there is no one size fits all solution here, it's highly dependent on your data! The default network parameter values work pretty well for us, but we recommend exploring your data by creating multiple initial models with different network parameters and seeing what works best for you.

I know how well my model performs during training, but how can I calculate my model's generative forecast accuracy?

With input_dropout we can get pretty close to a real accuracy measurement, but for a more explicit calculation be on the lookout for a sister algorithm named ForecastAccuracy.

Writing good docs:

Communicate Value

A good introduction should make it clear why someone might use your API.

Show and Tell

Give examples of using your API and explain those examples.

Easy to Skim

Ensure your docs are structured such that familiar users can quickly jump to the content they want.


Revisit your docs after making breaking changes or adding new features to keep them up-to-date.