Eric D. Brown / Oct 08 2019

# Forecasting Time Series Data With Prophet I & II

## Introduction

A lot of what I do in my data analytics work is understanding time series data, modeling that data and trying to forecast what might come next in that data. Over the years I’ve used many different approaches, libraries, and modeling techniques for modeling and forecasting with some success… and a lot of failure.

Recently, I’ve been looking for a simpler approach for my initial modeling and think I’ve found a very nice library in Facebook’s Prophet (available for both Python and R). While this particular library isn’t terribly robust, it is quick and gives some very good results for that initial pass at modeling / forecasting time series data. An added bonus with Prophet for those that like to understand the theory behind things is this white paper with a very good description of the math / statistical approach behind Prophet.

If you are interested in learning more about time-series forecasting, check out the books / websites below.

## Part I

### Getting Started

Using Prophet is extremely straightforward. You import it, load some data into a Pandas dataframe, set the data up into the proper format and then start modeling / forecasting.

First, import the module (plus some other modules that we’ll need):

```import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize']=(20,10)
plt.style.use('ggplot')```

Matplotlib must be manually registered with Pandas due to a conflict between Prophet and Pandas.

`pd.plotting.register_matplotlib_converters()`

retail_sales.csv

Read the data in from the `retail_sales.csv` file and set the index to the `date` column. We are also parsing dates in the data file.

`sales_df = pd.read_csv(retail_sales.csv, index_col='date', parse_dates=True)`

Now, we have a Pandas dataframe with our data that looks something like this:

`sales_df.head()`
datesales
2009-10-01338630
2009-11-01339386
2009-12-01400264
2010-01-01314640
2010-02-01311022
5 items

### Prepare for Prophet

For Prophet to work, we need to change the names of these columns to `ds` and `y`, so lets just create a new dataframe and keep our old one handy (you'll see why later). The new dataframe will initially be created with an integer index so we can rename the columns

`df = sales_df.reset_index()`

Your dataframe should now look like the following:

`df.head()`
datesales
02009-10-01338630
12009-11-01339386
22009-12-01400264
32010-01-01314640
42010-02-01311022
5 items

Let's rename the columns as required by `fbprophet`. Additionally, `fbprophet` doesn't like the index to be a datetime... it wants to see `ds` as a non-index column, so we won't set an index differently than the integer index.

`df=df.rename(columns={'date':'ds', 'sales':'y'})`
`df.head()`
dsy
02009-10-01338630
12009-11-01339386
22009-12-01400264
32010-01-01314640
42010-02-01311022
5 items

Now's a good time to take a look at your data. Plot the data using Pandas' `plot` function.

`df.set_index('ds').y.plot().get_figure()`

When working with time series data, its good to take a look at the data to determine if trends exist, whether it is stationary, has any outliers and/or any other anomalies. Facebook Prophet's example uses the log-transform as a way to remove some of these anomalies but it isn't the absolute 'best' way to do this... but given that its the example and a simple data series, I'll follow their lead for now. Taking the log of a number is easily reversible to be able to see your original data.

To log-transform your data, you can use Numpy's `log()` function

`df['y'] = np.log(df['y'])`
`df.tail()`
dsy
672015-05-0113.044650453675313
682015-06-0113.013059541513272
692015-07-0113.033991074775358
702015-08-0113.030993424699561
712015-09-0112.973670775134828
5 items
`df.set_index('ds').y.plot().get_figure()`

As you can see in the above chart, the plot looks the same as the first one but just at a different scale.

## Part II

### Running Prophet

Now, let's set Prophet up to begin modeling our data.

Note: Since we are using monthly data, you'll see a message from Prophet saying `Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.` This is OK since we are working with monthly data but you can disable it by using `weekly_seasonality=True` in the instantiation of Prophet.

```model = Prophet()
model.fit(df);```
<fbprophet.fo...x7f2c11972f60>

Forecasting is fairly useless unless you can look into the future, so we need to add some future dates to our dataframe. For this example, I want to forecast 2 years into the future, so I'll built a future dataframe with 24 periods since we are working with monthly data. Note the `freq='m'` inclusion to ensure we are adding 24 months of data.

This can be done with the following code:

```future = model.make_future_dataframe(periods=24, freq = 'm')
future.tail()```
ds
912017-04-30
922017-05-31
932017-06-30
942017-07-31
952017-08-31
5 items

To forecast this future data, we need to run it through Prophet's model.

`forecast = model.predict(future)`

The resulting forecast dataframe contains quite a bit of data, but we really only care about a few columns. First, let's look at the full dataframe:

`forecast.tail()`

We really only want to look at `yhat`, `yhat_lower`, and `yhat_upper`, so we can do that with:

`forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()`
dsyhatyhat_loweryhat_upper
912017-04-3013.0596009114299212.86983077136156713.255463589763021
922017-05-3113.05587420563058412.85043726905510513.270689789903539
932017-06-3013.07632518223537812.85253366550388813.31003696452107
942017-07-3113.05605291521129812.82136319062715313.308538566200477
952017-08-3113.02737031004898112.77331656504801313.292583771318434
5 items

### Plotting Prophet results

Prophet has a plotting mechanism called `plot`. This plot functionality draws the original data (black dots), the model (blue line), and the error of the forecast (shaded blue area).

`model.plot(forecast);`

Personally, I'm not a fan of this visualization so I like to break the data up and build a chart myself. The next section describes how I build my own visualization for Prophet modeling.

### Visualizing Prophet models

In order to build a useful dataframe to visualize our model versus our original data, we need to combine the output of the Prophet model with our original data set, then we'll build a new chart manually using Pandas and Matplotlib.

First, let's set our dataframes to have the same index of `ds`.

```df.set_index('ds', inplace=True)
forecast.set_index('ds', inplace=True)```

Now, we'll combine the original data and our forecast model data.

`viz_df = sales_df.join(forecast[['yhat', 'yhat_lower','yhat_upper']], how = 'outer')`

If we look at the `head()`, we see the data has been joined correctly but the scales of our original data (`sales`) and our model (`yhat`) are different. We need to rescale the `yhat` column(s) to get the same scale, so we'll use Numpy's `exp` function to do that.

`viz_df.head()`
salesyhatyhat_loweryhat_upper
2009-10-01338630.012.7289162996766412.71936452418300312.73949007973038
2009-11-01339386.012.74943508787369212.73876288538901512.75984775702255
2009-12-01400264.012.88744365640661512.8772276748452612.899177627363834
2010-01-01314640.012.66246940243891712.6519408107834712.67338564231148
2010-02-01311022.012.65582528105592912.6456361895563212.665771587151028
5 items
`viz_df['yhat_rescaled'] = np.exp(viz_df['yhat'])`
`viz_df.head()`
salesyhatyhat_loweryhat_upperyhat_rescaled
2009-10-01338630.012.7289162996766412.71936452418300312.73949007973038337363.51235967537
2009-11-01339386.012.74943508787369212.73876288538901512.75984775702255344357.3095608864
2009-12-01400264.012.88744365640661512.8772276748452612.899177627363834395317.159207676
2010-01-01314640.012.66246940243891712.6519408107834712.67338564231148315675.2904628142
2010-02-01311022.012.65582528105592912.6456361895563212.665771587151028313584.8577497736
5 items

Let's take a look at the `sales` and `yhat_rescaled` data together in a chart.

`viz_df[['sales', 'yhat_rescaled']].plot().get_figure()`

You can see from the chart that the model (blue) is pretty good when plotted against the actual signal (orange) but I like to make my visualizations a little easier to understand. To build my 'better' visualization, we'll need to go back to our original `sales_df` and `forecast` dataframes.

First things first - we need to find the 2nd to last date of the original sales data in `sales_df` in order to ensure the original sales data and model data charts are connected.

```sales_df.index = pd.to_datetime(sales_df.index) #make sure our index as a datetime object
connect_date = sales_df.index[-2] #select the 2nd to last date```

Using the `connect_date` we can now grab only the model data that after that date (you'll see why in a minute). To do this, we'll mask the forecast data.

```mask = (forecast.index > connect_date)
`predict_df.head()`

Now, let's build a dataframe to use in our new visualization. We'll follow the same steps we did before.

```viz_df = sales_df.join(predict_df[['yhat', 'yhat_lower','yhat_upper']], how = 'outer')
viz_df['yhat_scaled']=np.exp(viz_df['yhat'])```

Now, if we take a look at the `head()` of `viz_df` we'll see `NaN`s everywhere except for our original data rows.

`viz_df.head()`
salesyhatyhat_loweryhat_upperyhat_scaled
2009-10-01338630.0
2009-11-01339386.0
2009-12-01400264.0
2010-01-01314640.0
2010-02-01311022.0
5 items

If we take a look at the `tail()` of the `viz_df` you'll see we have data for the forecast data and `NaNs` for the original data series.

`viz_df.tail()`
salesyhatyhat_loweryhat_upperyhat_scaled
2017-04-3013.0596009114299212.86983077136156713.255463589763021469583.26560153335
2017-05-3113.05587420563058412.85043726905510513.270689789903539467836.5237404679
2017-06-3013.07632518223537812.85253366550388813.31003696452107477502.74244912295
2017-07-3113.05605291521129812.82136319062715313.308538566200477467920.13808058767
2017-08-3113.02737031004898112.77331656504801313.292583771318434454689.61942474794
5 items

### Time to plot

Now, let's plot everything to get the 'final' visualization of our sales data and forecast with errors.

```fig, ax1 = plt.subplots()
ax1.plot(viz_df.sales)
ax1.plot(viz_df.yhat_scaled, color='black', linestyle=':')
ax1.fill_between(viz_df.index, np.exp(viz_df['yhat_upper']), np.exp(viz_df['yhat_lower']), alpha=0.5, color='darkgray')
ax1.set_title('Sales (Orange) vs Sales Forecast (Black)')
ax1.set_ylabel('Dollar Sales')
ax1.set_xlabel('Date')

L=ax1.legend() #get the legend
L.get_texts()[0].set_text('Actual Sales') #change the legend text for 1st plot
L.get_texts()[1].set_text('Forecasted Sales') #change the legend text for 2nd plot
fig```

This visualization is much better (in my opinion) than the default `fbprophet` plot. It is much easier to quickly understand and describe what's happening. The orange line is actual sales data and the black dotted line is the forecast. The gray shaded area is the uncertainty estimation of the forecast.