chrisrobbins157 / MeanReversionTest / 0.2.0

Uses Mean reversion analysis can be applied to any set of time series data. The question this algorithm attempts to answer is: "Given a value in a dataset, to what extent does the distance above or below its average influence the value of the data in the future?" The data's percentage above / below its average is viewed as the independent variable and the future outcome is the dependent variable. This type of analysis can be used in statistical arbitrage strategies where traders seek to identify investments or combinations of investments that exhibit mean reverting characteristics and are trading at a meaningful distance from their average.

What is Does This algorithm tests a time series to determine if it is mean reverting and provides details regarding how far away the most recent value is from the mean. There are three inputs required from the user: the time series, a lag period, and a forward period. The lag period is used to denote the average time frame to be examined for mean reversion. The forward period is the length of time the time series is directly influenced by its distance from the mean. The format for entering data is: [[time series data], lag period, forward period].

How it Works A moving average of the time series is constructed according to the lag period, and the percentage above or below the average is observed. The forward percent change is also recorded for the time series according to the forward period.

A traditional regression analysis is then done on these two datasets; the percent above / below the average and the forward percent change for the periods where there are overlapping observations for both sets of data. If there is a strong negative correlation between the two it is indicative of a mean reverting time series.

The result includes 4 data points in the following format: [r, r-squared, % above / below average, z-score].

r: This is the correlation coefficient of the regression. If a mean reverting relationship exists this number will be negative. r-squared: This is the coefficient of determination which measures how well forward performance during the forward period can be explained by the percentage distance from the mean that is given by the lag period. This number will be between 0 and 1. An r-squared of 0 indicates no explanatory power. This number will be close to 1 if there is a meaningful relationship present. % above / below average: This is the percent distance from the average for the last observation in the time series. If a mean reverting relationship is present then the inverse of this value represents the expected forward percent change during the next forward period. z-score: This is the number of standard deviations above or below the average for the last observation in the time series.

Example Suppose we have the following daily observations for a given dataset: [1, 2, 3, 4, 5, 6, 5, 4, 3, 4, 5] and we want to test whether or not the data has a tendency to revert to its three day average using a forward two day forward period. The algorithm's first step is to compute the trailing three day average which in this case would be [2, 3, 4, 5, 5.33, 5, 4, 3.66, 4]. The percentage above or below the moving average is calculated for each relevant point in time. Since we're using a three day trailing period the first item to be recorded is (3 / 2) - 1 = 0.50. Performing this operation for each item in the moving average list yields the following dataset: [0.5, 0.33, 0.25, 0.20, -0.0625, -0.2, -0.25, 0.09, 0.25].

The forward two day percentage change for each observation in the original dataset is also created. For example the first number would be (3 / 1) - 1 = 2. The resulting dataset for the percentage changes is [2, 1, 0.66, 0.5, 0, -0.33, -0.4, 0, 0.66].

This yields two important datasets, the percentage above / below the average of [0.5, 0.33, 0.25, 0.20, -0.0625, -0.2, -0.25, 0.09, 0.25] and the forward percentage changes of [2, 1, 0.66, 0.5, 0, -0.33, -0.4, 0, 0.66]. These lists need to be adjusted to only include moments in time where there are observations for both pieces of data before doing a regression analysis. For example, there is no observation for the percentage above / below the average for the first two pieces of data in the original dataset because there was no moving average in existence (since we used a 3 period average). Similarly, there is no value for the forward percent change for the last two items in the original dataset because we used two forward periods to calculate forward performance. The two lists are adjusted accordingly to [0.5, 0.33, 0.25, 0.20, -0.0625, -0.2, -0.25] and [0.66, 0.5, 0, -0.33, -0.4, 0, 0.66]. The last two items were trimmed from the list containing the percentage above / below the average and the first two items were trimmed from the forward percentage change list.

Doing a regression analysis on these two datasets yields an r of 0.22 and an r-squared of 0.048. The percentage above the average of 0.25 and the number of standard deviations from the mean of 1.22 is also computed. The output for this dataset will be [0.2201, 0.0484, 0.25, 1.2247]. The positive r and the low r-squared imply this time series is not mean reverting.