Effect of window size in moving average

TL;DR: In this post I illustrate the impact of the window size chosen for doing the moving average when extracting the trend-cycle from a time series dataset.

When dealing with time series data a very common task is to decompose the time series into several components. Usually the series is split into three components: a trend-cycle component (often just called the trend), a seasonal component, and a remainder component (sometimes called the random component, or noise). The idea of decomposition is that through splitting into different components we can more easily recognise (and name) specific patterns that occur in our data, and knowing the patterns might allow us to make more accurate forecasts (e.g. we could use different forecast methods for different components).

We will focus on one particular way to estimate the trend cycle component in this post.

The purpose of the trend-cycle is to capture the main trend of the data without getting distracted by minor fluctuations. Hence, it is smoother than the original data. A very common, simple technique for this purpose is the moving average (refer to the book for the formal definition). The order of the moving average (or in other words the window size) determines the smoothness of the curve. This technique is most commonly used for estimating the trend-cycle from seasonal data.

So estimating the order or the window size of the moving average (MA) will determine how well we can tease out the trend-cycle component. The wrong choice for the order of the MA will usually result in trend-cycle estimates being contaminated by the seasonality in the data.

Let’s see an example below.

Load example dataset

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

sns.set()

The code snippet below loads the cycling data from Seattle used by Jake Vanderplas and resample it to have daily and weekly counts of bikes (rather than hourly).

hourly = pd.read_csv('https://raw.githubusercontent.com/jakevdp/SeattleBike/master/FremontHourly.csv', index_col='Date', parse_dates=True)
hourly.columns = ['northbound', 'southbound']
hourly['total'] = hourly['northbound'] + hourly['southbound']
daily = hourly.resample('d').sum()
weekly = hourly.resample('w').sum()
daily['total'].plot(figsize=(20,10))
plt.title('Number of bikes in Seattle per day')
plt.show()

png

If we simply look at a plot of our data we can already discover a weekly pattern. Hence, for our data a moving average of window size 7 seems suitable for capturing the trend cycle.

Effect of wrong window size

What if we choose another window size that is not aligned with the weekly pattern? In the plot below we see both variations, on top with window size 7, on the bottom with window size 5.

fig, axes = plt.subplots(2, 1, figsize=(20,10))
daily['total'].plot(ax=axes[0], title='Moving average window = 7')
daily['total'].rolling(7, center=True).mean().plot(ax=axes[0])
daily['total'].plot(ax=axes[1], title='Moving average window = 5')
daily['total'].rolling(5, center=True).mean().plot(ax=axes[1])
axes[0].set_xlabel('')
axes[1].set_xlabel('')
plt.show()

png

We can clearly see that there are more seasonal fluctuations in the bottom graph, while the top graph captures the trend more smoothly without the fluctuations.

Symmetric window sizes

With our window size of 7 we are creating a symmetric window (for a given point, the window extends 3 points to the left and 3 points to the right). FPP2 makes the point that symmetric windows are preferred. It is suggested that if we have an even-numbered window size (e.g. when there is a quarterly trend in the data), we might want to do a moving average of moving averages. In the example with a quarterly pattern we could go for a 2x4 MA, meaning, we first do a MA with window 4, followed by another round of MA of window 2. The results will now be symmetric again. From the book:

In general, a 2×m-MA is equivalent to a weighted moving average of order m+1 where all observations take the weight 1/m, except for the first and last terms which take weights 1/(2m). So, if the seasonal period is even and of order m, we use a 2×m-MA to estimate the trend-cycle. If the seasonal period is odd and of order m, we use a m-MA to estimate the trend-cycle. For example, a 2×12-MA can be used to estimate the trend-cycle of monthly data and a 7-MA can be used to estimate the trend-cycle of daily data with a weekly seasonality.

Detrend with the trend-cycle component

What do we actually do once we have a good moving average estimate of the trend-cycle? We could for example detrend the original time series with this component. How to do this will be subject of another post, but let’s for now assume we use a multiplicative model. This means our components are combined multiplicatively. Hence, in order to remove one of the components from the series, we need to remove it with division. In the plot below we have plotted the detrended data:

(daily['total'] / daily['total'].rolling(7, center=True).mean()).plot(title='Detrended data', figsize=(20, 10))
plt.show()

png

Note that the detrended data is centered around 1 and doesn’t show the trend of the original data anymore (as we have removed it). The weekly pattern is still in the data as we haven’t deseasoned the data (which we won’t do here to keep this post focused).

Moving average for non-seasonal data

With the weekly data we created in the first step we don’t see any obvious periodic (seasonal) patterns. This makes the choice of the window size less critical.

weekly['total'].plot(figsize=(20,10), title='Number of bikes in Seattle per week')
plt.show()

png

Let’s see what happens if we choose some random window sizes.

window_sizes = [13, 7, 3]
fig, axes = plt.subplots(len(window_sizes), 1, figsize=(20,14))

for ax_idx, window_size in enumerate(window_sizes):
    weekly['total'].plot(ax=axes[ax_idx])
    weekly['total'].rolling(window_size, center=True).mean().plot(ax=axes[ax_idx])

    axes[ax_idx].set_title('Moving average window = {}'.format(window_size))
    axes[ax_idx].set_xlabel('')

png

Since there is no periodic pattern to tease out here, the choice of the window size seems less crucial in the sense that there is nothing obvious that we’d want to avoid. The desired smoothness of the curve remains a consideration in choosing a window size. In this case, a window size of 3 is probably not smooth enough.

What if we use these estimated trend-cycles to detrend the original data:

fig, axes = plt.subplots(len(window_sizes), 1, figsize=(20,14))

for ax_idx, window_size in enumerate(window_sizes):
    (weekly['total'] / weekly['total'].rolling(window_size, center=True).mean()).plot(
        ax = axes[ax_idx],
        title = 'Detrended with MA window = {}'.format(window_size)
    )
    axes[ax_idx].set_xlabel('')

png

The difference between the window size of 7 and 13 doesn’t seem very large. Just keep in mind that the larger the window size, the more gaps you will have towards the edges of your dataset, as at the edge you won’t have enough data points to do the estimate (with a window size 13 you won’t have any estimates for the first and the last 6 data points in your dataset).

This notebook draws heavily from the book 'Forecasting: Principles and Practice', Chapter 6.2. You can view and download this post as a Jupyter notebook on Github Gist.