Forecasting Time Series Data With Prophet III

Originally published as Forecasting Time-Series data with Prophet - Part 3 at pythondata.com.

Introduction

This is the third in a series of posts about using Prophet to forecast time series data. Follow this link for parts 1 & 2 of Forecasting Time-Series Data With Prophet.

In those previous posts, I looked at forecasting monthly sales data 24 months into the future. In this post, I wanted to look at using the ‘holiday’ construct found within the Prophet library to try to better forecast around specific events. If we look at our sales data (you can find it here), there’s an obvious pattern each December. That pattern could be for a variety of reasons, but lets assume that its due to a promotion that is run every December.

Import necessary libraries

import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt
 
plt.rcParams['figure.figsize']=(20,10)
plt.style.use('ggplot')

0.2s

Python

Matplotlib must be manually registered with Pandas due to a conflict between Prophet and Pandas.

pd.plotting.register_matplotlib_converters()

0.2s

Python

Read in the data

Read the data in from the retail sales CSV file in the examples folder then set the index to the 'date' column. We are also parsing dates in the data file.

retail_sales.csv

sales_df = pd.read_csv(retail_sales.csv
, index_col='date', parse_dates=True)

0.2s

Python

sales_df.head()

0.2s

Python

date	sales
2009-10-01	338630
2009-11-01	339386
2009-12-01	400264
2010-01-01	314640
2010-02-01	311022

5 items

Prepare for Prophet

As explained in previous prophet posts, for prophet to work, we need to change the names of these columns to ds and y.

df = sales_df.reset_index()

0.2s

Python

df.head()

0.2s

Python

	date	sales
0	2009-10-01	338630
1	2009-11-01	339386
2	2009-12-01	400264
3	2010-01-01	314640
4	2010-02-01	311022

5 items

Let's rename the columns as required by fbprophet. Additioinally, fbprophet doesn't like the index to be a datetime...it wants to see ds as a non-index column, so we won't set an index differnetly than the integer index.

df=df.rename(columns={'date':'ds', 'sales':'y'})

0.2s

Python

df.head()

0.2s

Python

	ds	y
0	2009-10-01	338630
1	2009-11-01	339386
2	2009-12-01	400264
3	2010-01-01	314640
4	2010-02-01	311022

5 items

Now's a good time to take a look at your data. Plot the data using Pandas' plot function

df.set_index('ds').y.plot().figure

1.2s

Python

Reviewing the Data

We can see from this data that there is a spike in the same month each year. While spike could be due to many different reasons, let's assume its because there's a major promotion that this company runs every year at that time, which is in December for this dataset.

Because we know this promotion occurs every December, we want to use this knowledge to help prophet better forecast those months, so we'll use Prohpet's holiday construct (explained here).

The holiday constrict is a Pandas dataframe with the holiday and date of the holiday. For this example, the construct would look like this:

promotions = pd.DataFrame({
  'holiday': 'december_promotion',
  'ds': pd.to_datetime(['2009-12-01', '2010-12-01', '2011-12-01', '2012-12-01',
                        '2013-12-01', '2014-12-01', '2015-12-01']),
  'lower_window': 0,
  'upper_window': 0,
})

0.2s

Python

This promotions dataframe consisists of promotion dates for Dec in 2009 through 2015. The lower_window and upper_window values are set to zero to indicate that we don't want Prophet to consider any other months than the ones listed.

promotions

0.2s

Python

	holiday	ds
0	december_promotion	2009-12-01
1	december_promotion	2010-12-01
2	december_promotion	2011-12-01
3	december_promotion	2012-12-01
4	december_promotion	2013-12-01
5	december_promotion	2014-12-01
6	december_promotion	2015-12-01

7 items

To continue, we need to log-transform our data:

df['y'] = np.log(df['y'])

0.2s

Python

df.tail()

0.3s

Python

	ds	y
67	2015-05-01	13.044650453675313
68	2015-06-01	13.013059541513272
69	2015-07-01	13.033991074775358
70	2015-08-01	13.030993424699561
71	2015-09-01	12.973670775134828

5 items

Running Prophet

Now, let's set Prophet up to begin modeling our data using our promotions dataframe as part of the forecast

Note: Since we are using monthly data, you'll see a message from Prophet saying Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. This is OK since we are working with monthly data but you can disable it by using weekly_seasonality=True in the instantiation of Prophet.

model = Prophet(holidays=promotions, weekly_seasonality=True, daily_seasonality=True)
model.fit(df)

1.2s

Python

<fbprophet.fo...x7f0c5691fc50>

We've instantiated the model, now we need to build some future dates to forecast into.

future = model.make_future_dataframe(periods=24, freq = 'm')
future.tail()

0.2s

Python

	ds
91	2017-04-30
92	2017-05-31
93	2017-06-30
94	2017-07-31
95	2017-08-31

5 items

To forecast this future data, we need to run it through Prophet's model.

forecast = model.predict(future)

3.4s

Python

The resulting forecast dataframe contains quite a bit of data, but we really only care about a few columns. First, let's look at the full dataframe:

forecast.tail()

1.2s

Python

We really only want to look at yhat, yhat_lower and yhat_upper, so we can do that with:

forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

0.6s

Python

	ds	yhat	yhat_lower	yhat_upper
91	2017-04-30	13.069092385393887	12.666347160467316	13.461268900286738
92	2017-05-31	13.071883163433725	12.637346529844663	13.491574989616737
93	2017-06-30	13.05773483351489	12.590041629879847	13.506115228104957
94	2017-07-31	13.06466187201633	12.565657922306391	13.540365994068482
95	2017-08-31	13.012393257087005	12.479345577381606	13.525504713353449

5 items

Plotting Prophet results

Prophet has a plotting mechanism called plot. This plot functionality draws the original data (black dots), the model (blue line) and the error of the forecast (shaded blue area).

model.plot(forecast);

0.8s

Python

Personally, I'm not a fan of this visualization but I'm not going to build my own...you can see how I do that here.

Additionally, Prophet lets us take a at the components of our model, including the holidays. This component plot is an important plot as it lets you see the components of your model including the trend and seasonality (identified in the yearly pane).

model.plot_components(forecast);

2.6s

Python

Comparing holidays vs no-holidays forecasts

Let's re-run our prophet model without our promotions/holidays for comparison.

model_no_holiday = Prophet()
model_no_holiday.fit(df);

0.9s

Python

<fbprophet.fo...x7f0c563c4898>

future_no_holiday = model_no_holiday.make_future_dataframe(periods=24, freq = 'm')
future_no_holiday.tail()

0.2s

Python

	ds
91	2017-04-30
92	2017-05-31
93	2017-06-30
94	2017-07-31
95	2017-08-31

5 items

forecast_no_holiday = model_no_holiday.predict(future)

3.0s

Python

Let's compare the two forecasts now. Note: I doubt there will be much difference in these models due to the small amount of data, but its a good example to see the process. We'll set the indexes and then join the forecast dataframes into a new dataframe called compared_df.

forecast.set_index('ds', inplace=True)
forecast_no_holiday.set_index('ds', inplace=True)
compared_df = forecast.join(forecast_no_holiday, rsuffix="_no_holiday")

0.2s

Python

We are only really interested in the yhat values, so let's remove all the rest and convert the logged values back to their original scale.

compared_df= np.exp(compared_df[['yhat', 'yhat_no_holiday']])

0.2s

Python

Now, let's take the percentage difference and the average difference for the model with holidays vs that without.

compared_df['diff_per'] = 100*(compared_df['yhat'] - compared_df['yhat_no_holiday']) / compared_df['yhat_no_holiday']
compared_df.tail()

0.7s

Python

ds	yhat	yhat_no_holiday	diff_per
2017-04-30	474061.52194792114	469583.26560153335	0.9536660853216669
2017-05-31	475386.3702518066	467836.5237404679	1.6137787727593058
2017-06-30	468707.8037344029	477502.74244912295	-1.8418614036875616
2017-07-31	471965.8319525281	467920.13808058767	0.8646120443834493
2017-08-31	447930.451468062	454689.61942474794	-1.4865454736436081

5 items

compared_df['diff_per'].mean()

0.2s

Python

31627.529378734773

This isn't an enormous difference, (<1%) but there is some difference between using holidays and not using holidays.

If you know there are holidays or events happening that might help/hurt your forecasting efforts, prophet allows you to easily incorporate them into your modeling.