Facebook Prophet: Diagnostics

example_wp_log_peyton_manning.csv

from fbprophet import Prophet
import pandas as pd
from matplotlib import pyplot as plt
import logging
logging.getLogger('fbprophet').setLevel(logging.ERROR)
import warnings
warnings.filterwarnings("ignore")
df = pd.read_csv(example_wp_log_peyton_manning.csv
)
m = Prophet()
m.fit(df)
future = m.make_future_dataframe(periods=366)

6.1s

Python

library(ggplot2) # Required for Nextjournal plotting
library(prophet)
df <- read.csv(example_wp_log_peyton_manning.csv
)
m <- prophet(df)
future <- make_future_dataframe(m, periods=366)

2.8s

Prophet includes functionality for time series cross validation to measure forecast error using historical data. This is done by selecting cutoff points in the history, and for each of them fitting the model using data only up to that cutoff point. We can then compare the forecasted values to the actual values. This figure illustrates a simulated historical forecast on the Peyton Manning dataset, where the model was fit to a initial history of 5 years, and a forecast was made on a one year horizon.

from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(
    m, '365 days', initial='1825 days', period='365 days')
cutoff = df_cv['cutoff'].unique()[0]
df_cv = df_cv[df_cv['cutoff'].values == cutoff]

fig = plt.figure(facecolor='w', figsize=(10, 6))
ax = fig.add_subplot(111)
ax.plot(m.history['ds'].values, m.history['y'], 'k.')
ax.plot(df_cv['ds'].values, df_cv['yhat'], ls='-', c='#0072B2')
ax.fill_between(df_cv['ds'].values, df_cv['yhat_lower'],
                df_cv['yhat_upper'], color='#0072B2',
                alpha=0.2)
ax.axvline(x=pd.to_datetime(cutoff), c='gray', lw=4, alpha=0.5)
ax.set_ylabel('y')
ax.set_xlabel('ds')
ax.text(x=pd.to_datetime('2010-01-01'),y=12, s='Initial', color='black',
       fontsize=16, fontweight='bold', alpha=0.8)
ax.text(x=pd.to_datetime('2012-08-01'),y=12, s='Cutoff', color='black',
       fontsize=16, fontweight='bold', alpha=0.8)
ax.axvline(x=pd.to_datetime(cutoff) + pd.Timedelta('365 days'), c='gray', lw=4,
           alpha=0.5, ls='--')
ax.text(x=pd.to_datetime('2013-01-01'),y=6, s='Horizon', color='black',
       fontsize=16, fontweight='bold', alpha=0.8);
fig

14.4s

Python

The Prophet paper gives further description of simulated historical forecasts.

This cross validation procedure can be done automatically for a range of historical cutoffs using the cross_validation function. We specify the forecast horizon (horizon), and then optionally the size of the initial training period (initial) and the spacing between cutoff dates (period). By default, the initial training period is set to three times the horizon, and cutoffs are made every half a horizon.

The output of cross_validation is a dataframe with the true values y and the out-of-sample forecast values yhat, at each simulated forecast date and for each cutoff date. In particular, a forecast is made for every observed point between cutoff and cutoff + horizon. This dataframe can then be used to compute error measures of yhat vs. y.

Here we do cross-validation to assess prediction performance on a horizon of 365 days, starting with 730 days of training data in the first cutoff and then making predictions every 180 days. On this 8 year time series, this corresponds to 11 total forecasts.

df.cv <- cross_validation(m, initial = 730, period = 180, horizon = 365, units = 'days')
head(df.cv)

32.5s

0 items

from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(m, initial='730 days', period='180 days', horizon = '365 days')
df_cv.head()

44.1s

Python

	ds	yhat	yhat_lower	yhat_upper	y	cutoff
0	2010-02-16	8.957184179282013	8.430233017685936	9.482680698348524	8.24249315318763	2010-02-15
1	2010-02-17	8.723619014738189	8.224718688191926	9.274467092840466	8.00803284696931	2010-02-15
2	2010-02-18	8.607377994278602	8.127235591440513	9.136375796224975	8.0452677166078	2010-02-15
3	2010-02-19	8.52924959775511	8.018565052673274	9.017300803808713	7.9287663216267	2010-02-15
4	2010-02-20	8.271228206689234	7.753761029592097	8.754457835227155	7.745002803515839	2010-02-15

5 items

In R, the argument units must be a type accepted by as.difftime, which is weeks or shorter. In Python, the string for initial, period, and horizon should be in the format used by Pandas Timedelta, which accepts units of days or shorter.

The performance_metrics utility can be used to compute some useful statistics of the prediction performance (yhat, yhat_lower, and yhat_upper compared to y), as a function of the distance from the cutoff (how far into the future the prediction was). The statistics computed are mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), and coverage of the yhat_lower and yhat_upper estimates. These are computed on a rolling window of the predictions in df_cv after sorting by horizon (ds minus cutoff). By default 10% of the predictions will be included in each window, but this can be changed with the rolling_window argument.

df.p <- performance_metrics(df.cv)
head(df.p)

1.0s

0 items

from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)
df_p.head()

1.1s

Python

	horizon	mse	rmse	mae	mape	coverage
0	37 days 00:00:00.000000000	0.4949883145519459	0.7035540594381827	0.5060049785293128	0.05864370722732544	0.6740520785746917
1	38 days 00:00:00.000000000	0.5007953027629376	0.7076689217161777	0.5109994369161246	0.05922248252782794	0.6740520785746917
2	39 days 00:00:00.000000000	0.5230079019314471	0.7231928525168422	0.5171091106760463	0.05982426227547436	0.6726815897670171
3	40 days 00:00:00.000000000	0.5302683094436705	0.7281952412943046	0.5200028729604774	0.06013396794645505	0.6763362265874827
4	41 days 00:00:00.000000000	0.5377061010034533	0.7332844611768705	0.5210076205416105	0.06021418358986944	0.6813613522156236

5 items

Cross validation performance metrics can be visualized with plot_cross_validation_metric, here shown for MAPE. Dots show the absolute percent error for each prediction in df_cv. The blue line shows the MAPE, where the mean is taken over a rolling window of the dots. We see for this forecast that errors around 5% are typical for predictions one month into the future, and that errors increase up to around 11% for predictions that are a year out.

plot_cross_validation_metric(df.cv, metric = 'mape')

0.8s

from fbprophet.plot import plot_cross_validation_metric
fig = plot_cross_validation_metric(df_cv, metric='mape')
fig

1.4s

Python

The size of the rolling window in the figure can be changed with the optional argument rolling_window, which specifies the proportion of forecasts to use in each rolling window. The default is 0.1, corresponding to 10% of rows from df_cv included in each window; increasing this will lead to a smoother average curve in the figure.

The initial period should be long enough to capture all of the components of the model, in particular seasonalities and extra regressors: at least a year for yearly seasonality, at least a week for weekly seasonality, etc.