Parens for Python - Predicting Sportsball & UFOs
Forecasting time series
We are going to explore some more Python libraries through the use of libpython-clj.
This time, we are going to look at Facebook Prophet
{:deps
{org.clojure/clojure {:mvn/version "1.10.1"}
clj-python/libpython-clj {:mvn/version "1.36"}}}
Install the python dependencies and model
pip3 install fbprophet
pip3 install holidays==0.9.12
The main tutorial is on https://facebook.github.io/prophet/docs/quick_start.html#python-api. We'll be following along until we break off to look at UFOs.
Quick Start with some Sports Ball Stuff
The tutorial has to do with some forecasting for sports stuff. Honestly, I don't really follow any sports so it makes no sense to me, but the important bit is that Prophet takes in a csv file in the form of two columns ds and y. The ds one is the date in the format of YYYY-MM-dd and the y is the numeric value. That's it really.
We are going to load in the namespaces to start it all off as well as a macro to help with plotting.
(ns gigasquid.facebook-prophet
(:require [libpython-clj.require :refer [require-python]]
[libpython-clj.python :as py :refer [py. py.. py.-]]))
;;;; have to set the headless mode before requiring pyplot
(def mplt (py/import-module "matplotlib"))
(py. mplt "use" "Agg")
(require-python matplotlib.pyplot)
(require-python matplotlib.backends.backend_agg)
(defmacro with-show
"Takes forms with mathplotlib.pyplot to then show locally"
[& body]
(let [_# (matplotlib.pyplot/clf)
fig# (matplotlib.pyplot/figure)
agg-canvas# (matplotlib.backends.backend_agg/FigureCanvasAgg fig#)]
(cons do body)
(py. agg-canvas# "draw")
(matplotlib.pyplot/savefig (str "results/" gensym ".png"))))
(require-python [pandas :as pd])
(require-python [fbprophet :as fbprophet])
(require-python [matplotlib.pyplot :as pyplot])
Let's download the sportsball data for some guy named Manning.
(def csv-file (slurp "https://raw.githubusercontent.com/facebook/prophet/master/examples/example_wp_log_peyton_manning.csv"))
(spit "manning.csv" csv-file)
(def df (pd/read_csv "manning.csv"))
(py.- df head)
Now the predicting bit. We create a Prophet mode and fit the dataframe to it.
(def m (fbprophet/Prophet))
(py. m fit df)
Predictions are then made on the dataframe for a number of periods
(def future (py. m make_future_dataframe :periods 365))
(def forecast (py. m predict future))
The important columns are yhat, yhat_upper, and yhat_lower. We can easily extract them into a Clojure format and do what we like with them.
(def predicted-vals (mapv (fn [x y y1 y2] {:ds x :y y :yhat-upper y1 :yhat-lower y2})
(py/get-item forecast "ds")
(py/get-item forecast "yhat")
(py/get-item forecast "yhat_upper")
(py/get-item forecast "yhat_lower")))
(println (last predicted-vals))
We can plot the forecast.
(with-show
(py. m plot forecast))
And we can plot the components of the forecast.
(with-show
(py. m plot_components forecast))
Bring on the UFOs!
Ok. Enough with the Sportsball, let look at some more interesting data like UFO sightings from http://www.nuforc.org/webreports/ndxevent.html.
The data is different from the first example in that the stats are monthly. We are going to take a look at the sightings from 2010 to today.
(def csv-file (slurp "https://raw.githubusercontent.com/gigasquid/libpython-clj-examples/master/resources/ufosightings-since-2010.csv"))
(spit "ufosightings-since-2010.csv" csv-file)
(def df (pd/read_csv "ufosightings-since-2010.csv"))
We are going to do two things differently with this. First, we are going to factor in seasonality. The second thing is that we are going to make our predictions monthly since the data itself is monthly.
(def m (fbprophet/Prophet :seasonality_mode "multiplicative")) ;;; Let's factor in some holiday effects
(py. m fit df)
(def future (py. m make_future_dataframe :periods 48 :freq "M")) ;;; note Monthly prediction
(def forecast (py. m predict future))
Now, we can see what the the UFO sightings look like.
(with-show
(py. m plot forecast))
We can see there is definitely a yearly pattern to it, but luckily it seems that the UFOs are on the decrease in the future.
(with-show
(py. m plot_components forecast))
It seems like July is the peak time for UFO sightings. It confirms the suspicion that the Independence Day movie was onto something. It also looks like January and February are low times for sightings. This makes sense with the classical view of cold-blooded reptilian aliens, but we must keep our minds open to other life forms based on temperature. For example, there could be butter based aliens out there that also would have movement hampered at colder temperatures.
Conclusion
Facebook prophet is a powerful tool for forecasting series data. It would work really well on everyday problems like sales data and page views.
It can also be applied to reassure ourselves that whatever we are doing in the fight against outer-space aliens seems to be working. Keep it up!