Gaussian Processes

Practical introduction to Gaussian Processes (GP) in Clojure. Part 1/2.

Short introduction

First of all, I don't want to describe any internals of GP. There are plenty of great articles and videos listed below. Read and watch them now. What is important:

  • GP generates random and continuous functions

  • When you take all of possible functions, function values for one particular point are from normal distribution.

  • You can select subset of functions by using known values or setting noise.

  • Smoothness of the function is defined by covariance which is calculated from kernels

  • Kernel defines similarity between points, closer points give higher covariance

  • There are plenty of kernels

  • Regression line (or interpolation) is just a mean from a subset of functions

  • For every point you can calculate standard deviation, this way you may know uncertainty of prediction

In this article I mainly focus on regression and interpolation. Addtionally I'll explore kernels and parametrization.

Second article will focus on Bayessian Optimization where GP is used.

Fastmath implements two GPs in fastmath.regression namespace. One is backed by SMILE library and is limited to regression. Second one is implemented directly in Clojure and brings more information like standard deviation and prior and posterior samplers.

Links to external resources:

Dependencies

Let's load dependencies. Three libraries are used:

  • cljplot - for charts

  • fastmath - for kernels and GP itself

  • clojure2d - for colors, gradients

{:deps
 {org.clojure/clojure {:mvn/version "1.10.1"}
  cljplot {:mvn/version "0.0.2-SNAPSHOT"}
  generateme/fastmath {:mvn/version "1.4.0-SNAPSHOT"}
  clojure2d {:mvn/version "1.2.0-SNAPSHOT"}}}
deps.edn
Extensible Data Notation
(require '[cljplot.build :as b]
         '[cljplot.core :refer :all]
         '[fastmath.core :as m]
         '[fastmath.kernel :as k]
         '[fastmath.random :as r]
         '[fastmath.regression :as reg]
         '[clojure2d.color :as c])
25.5s
Gaussian Processes (Clojure)

GP

Let's start with SMILE version first and let's explore a little bit regression and interpolation between points.

This version accepts two parameters:

  • kernel - kernel object (default: gaussian)

  • lambda - noise factor (default: 0.5)

To predict just call predict / predict-all functions or regression object itself (all regression objects implement IFn).

(def domain [-3.0 3.0]) ;; our domain, 1d
(def xs [0 1 -2 -2.001]) ;; 4 points
(def ys [-2 3 0.5 -0.6]) ;; 4 values
(def scatter-data (map vector xs ys)) ;; reorganize data for scatter plot
(def gp1 (reg/gaussian-process xs ys))
(def gp2 (reg/gaussian-process {:kernel (k/kernel :gaussian 0.8)
                                :lambda 0.000001} xs ys))
(reg/predict gp1 1.5) ;; prediction!
0.5s
Gaussian Processes (Clojure)
1.8472037067061506
(gp1 1.5)
0.1s
Gaussian Processes (Clojure)
1.8472037067061506
(gp2 1.5)
0.1s
Gaussian Processes (Clojure)
13.289038788347785
(reg/predict-all gp2 [0 1 -2 -2.001])
0.1s
Gaussian Processes (Clojure)
List(4) (-1.9999423944882437, 2.9999719139365197, 0.18904211011249572, -0.28904456237796694)
(defn draw-gp
  "Draw multi function chart with legend.
  
  * xs - domain points
  * ys - values
  * title - chart title
  * legend-name - legend title
  * labels - legend labels
  * gps - map of gaussian processes to visualize
  * h, w - optional height and width of the chart"
  [xs ys title legend-name labels gps & [h w]]
  (let [pal (c/palette-presets :category10) ;; use category10 palette
        leg (map #(vector :line %1 {:color %2}) labels pal) ;; construct legend
        scatter-data (map vector xs ys)] ;; prepare scatter data
    (-> (xy-chart {:width (or w 700) :height (or h 300)}
                  (-> (b/series [:grid]
                                [:scatter scatter-data {:size 8}])
                      (b/add-multi :function gps {:samples 400
                                                  :domain domain} {:color pal}))
                  (b/add-axes :bottom)
                  (b/add-axes :left)
                  (b/add-legend legend-name leg)
                  (b/add-label :top title)))))
0.1s
Draw GPGaussian Processes (Clojure)
user/draw-gp
(let [gp (reg/gaussian-process {:lambda 0.5} xs ys)] ;; create gaussian process
  (save (draw-gp xs ys "Default GP regression with Smile"
         "Lambda" [0.5] {:r1 gp})
      "/results/default-smile.jpg"))
1.9s
Gaussian Processes (Clojure)

Lambda / noise

You may observe that the line doesn't go through the points. There is a result of using lambda parameter with quite high value.

Lambda parameter measures level of noise in our data and regulates fitness. Let's see how does it work.

(let [lambdas [0.00005 0.1 2.0]]
  (save (draw-gp xs ys "GP regression with different lambda"
                 "Lambda" lambdas 
                 (map vector lambdas 
                         (map #(reg/gaussian-process {:lambda %} xs ys) lambdas)))
      "/results/smile-different-lambda.jpg"))
1.2s
Gaussian Processes (Clojure)

As you can see, lower lambda, better function interpolation but also leads to overfitting of the model.

Let's try with more points from some noisy toy function.

(defn toyfn [x] (+ (r/grand) (m/exp x) (* -5 (m/cos (* 3 x)))))
(save (xy-chart {:width 700 :height 300}
                (b/series [:grid] [:function toyfn {:domain domain}])
                (b/add-axes :bottom)
                (b/add-axes :left)) "/results/toyfn.jpg")
1.1s
Gaussian Processes (Clojure)
(let [lambdas [0.00005 0.1 2.0]
      xs (repeatedly 30 #(apply r/drand domain))
      ys (map toyfn xs)]
  (save (draw-gp xs ys "GP regression with different lambda"
                 "Lambda" lambdas 
                 (map vector lambdas 
                         (map #(reg/gaussian-process {:lambda %} xs ys) lambdas)))
      "/results/toy-different-lambda.jpg"))
1.6s
Gaussian Processes (Clojure)

To summarize: lower lambda, better fit.

Kernels

The other important parameter is kernel. Kernel is a measure of correlation between points. Usually kernels are constructed to give closer points higher correlation.

fastmath.kernel namespace defines plenty of kernels. Most of the can be used in GP, some of them don't. To work with GP kernel should be symmetric and positive semi-definite.

(sort (keys (methods k/kernel)))
0.0s
Gaussian Processes (Clojure)
List(34) (:anova, :bessel, :cauchy, :chi-square-cpd, :chi-square-pd, :circular, :dirichlet, :exponential, :gaussian, :generalized-histogram, :generalized-t-student, :hellinger, :histogram, :hyperbolic-secant, :hyperbolic-tangent, :inverse-multiquadratic, :laplacian, :linear, :log, :mattern-12, 14 more...)

Kernel multimethod creates actual kernel function which operates on vectors. Multimethod accepts kernel name as keyword and additional parameters (mostly scaling factor).

(def k1 (k/kernel :gaussian)) ;; default scaling = 1.0
(def k2 (k/kernel :mattern-32)) ;; mattern 3/2 kernel
(def k3 (k/kernel :mattern-32 0.5)) ;; scaling parameter
{:k1 (k1 [0] [1])
 :k2 (k2 [0] [1])
 :k3 (k3 [0] [1])}
0.1s
Gaussian Processes (Clojure)
Map {:k1: 0.6065306597126334, :k2: 0.4833577245965077, :k3: 0.1397313501923147}

Let's see how they look like. Blue with scaling 1.0, red with scaling 0.5:

(def kernels [:gaussian :cauchy :anova :linear :inverse-multiquadratic
              :mattern-12 :mattern-52 :periodic :exponential])
(let [fk (fn [f] #(f [0.1] [%]))
      make-data #(map (fn [k] [(str k)
                        (fk (k/kernel k %))]) kernels)
      cfg {:domain [-3 3] :samples 200 :stroke {:size 2}}]
  (-> (xy-chart {:width 700 :height 600}
                (-> (b/lattice :function (make-data 1) cfg {:grid true})
                    (b/add-series (b/lattice :function (make-data 0.5) 
                                             (assoc cfg :color (c/color 245 80 60)) {:label name})))
                (b/update-scales :x :ticks 5)
                (b/add-label :top "Various kernels around 0.1")
                (b/add-axes :bottom)
                (b/add-axes :left)
                (b/add-axes :right))
      (save "/results/kernels.jpg")))
1.6s
Gaussian Processes (Clojure)

Let's look at covariance matrices.

(let [fk (fn [f] #(f [0] [%]))
      r (range -1 1 0.025)
      data (map (fn [k] [(str k)
                        (let [kernel (k/kernel k)]
                          (for [x r y r] [[x y] (kernel [x] [y])]))]) kernels)]
  (-> (xy-chart {:width 700 :height 600}
                (b/lattice :heatmap data {} {:label name :grid true})
                (b/add-label :top "Covariance matrices"))
      (save "/results/covariances.jpg")))
2.7s
Gaussian Processes (Clojure)

Now I use some of the kernels to play with regression.

(save (draw-gp xs ys "GP regression with different kernel"
                 "Kernel" kernels 
                 (map vector kernels 
                         (map #(reg/gaussian-process {:lambda 0.01
                                                      :kernel (k/kernel %)} xs ys) kernels))) 
      "/results/smile-different-kernels.jpg")
1.4s
Gaussian Processes (Clojure)

Exploration

Now a little bit more kernel. See how parameters

(defn draw-gp-lattice
  [title gps]
  (xy-chart {:width 700 :height 500}
                (-> (b/lattice :function gps {:domain domain :samples 400} 
                               {:shape [-1 1] :grid true})
                    (b/add-series (b/lattice :scatter (zipmap (map first gps) (repeat scatter-data)) {:size 8} {:label name :shape [-1 1]})))
                (b/add-axes :bottom)
                (b/add-axes :left)
                (b/add-label :top title)))
(defn gen-gps
  "Generate gaussian processes for given parametrization."
  [lambdas kernels kernel-params]
  (map (fn [l k p]
         (let [n (str "Kernel: " k "(" p "), lambda=" l)
               kernel (k/kernel k p)]
           [n (reg/gaussian-process {:kernel kernel :lambda l} xs ys)])) (cycle lambdas) (cycle kernels) kernel-params))
0.1s
Gaussian Processes (Clojure)
user/gen-gps
(save (draw-gp-lattice "Various gaussian kernels"
                       (gen-gps [0.001] [:gaussian] [0.1 1 4])) "/results/gk.jpg")
1.3s
Gaussian Processes (Clojure)
(save (draw-gp-lattice "Various mattern kernels"
                       (gen-gps [0.0001] [:mattern-12] [0.1 1 4])) "/results/mk.jpg")
1.1s
Gaussian Processes (Clojure)
(save (draw-gp-lattice "Various periodic kernels"
                       (gen-gps [0.0001] [:periodic] [0.1 1 4])) "/results/pk.jpg")
1.4s
Gaussian Processes (Clojure)

GP+

gaussian-process+ is the enhanced version of GP which exposes some additional information, like:

prior distribution

  • posterior distribution

  • standard deviation

Let's see how all above depend on kernels and noise.

This version accepts following parameters as a map:

  • kernel - the same as above (default: gaussian)

  • kscale - variance scaler, result of the kernel is multiplied by this value (default: 1.0)

  • noise - the same as lambda, just different name for the same parameter (default 1.0e-6)

  • normalize? - should ys be normalized to have zero mean or not? (default: false)

(def gp1+ (reg/gaussian-process+ xs ys))
(def gp2+ (reg/gaussian-process+ {:normalize? true} xs ys))
{:gp1+ (gp1+ 1.5)
 :gp2+ (gp2+ 1.5)}
0.1s
Gaussian Processes (Clojure)
Map {:gp1+: 28.0917548685336, :gp2+: 28.140993916867593}

Prior

Prior samples are functions drawn from GP without any knowledge. Every point has N(0,1) distrtibution.

We can't get explicit function so we have to sample points and interpolate between values.

(let [gp (reg/gaussian-process+ xs ys)]
  (reg/prior-samples gp (range 0 1 0.1)))
0.1s
Gaussian Processes (Clojure)
List(11) (1.6958987845760611, 1.6495013724423162, 1.5842914749900234, 1.4942209175866497, 1.3849741800275543, 1.2512255880562821, 1.0973007160335424, 0.923244929171621, 0.7409109706633382, 0.5479505257111852, 0.35682043189116697)
(defn draw-prior
  ([gp] (draw-prior gp 20))
  ([gp cnt]
   (let [xs (range -3 3.1 0.1)
         priors (map #(vector % (map vector xs (reg/prior-samples gp xs))) (range cnt))]
     (xy-chart {:width 700 :height 300}
               (-> (b/series [:grid])
                   (b/add-multi :line priors {} {:color (cycle (c/palette-presets :category20c))})
                   (b/add-serie [:hline 0 {:color :black}]))
               (b/add-axes :bottom)
               (b/add-axes :left)
               (b/add-label :top "Priors")))))
0.1s
Gaussian Processes (Clojure)
user/draw-prior
(save (draw-prior (reg/gaussian-process+ xs ys)) "/results/prior.jpg")
1.7s
Gaussian Processes (Clojure)

When we make noise level higher, all priors is more random

(save (draw-prior (reg/gaussian-process+ {:noise 0.5} xs ys)) "/results/priorn.jpg")
1.2s
Gaussian Processes (Clojure)

When we use different kernel, function structure will reflect kernel shape

(save (draw-prior (reg/gaussian-process+ {:kernel (k/kernel :periodic 2)
                                          :noise 0.01} xs ys) 5)
      "/results/priorp.jpg")
1.0s
Gaussian Processes (Clojure)
(save (draw-prior (reg/gaussian-process+ {:kernel (k/kernel :linear)} xs ys))
      "/results/priorl.jpg")
1.2s
Gaussian Processes (Clojure)

Posteriors

When you have some points known, you can draw posterior functions using posterior-samples. Posteriors represent a selection of possible functions for given kernels and noise value.

(def xs [-4 -1 2])
(def ys [-5 -1 2])
(defn draw-posterior
  ([gp] (draw-posterior gp 20))
  ([gp cnt]
   (let [xxs (range -5 5.1 0.2)
         posteriors (map #(vector % (map vector xxs (reg/posterior-samples gp xxs))) (range cnt))]
     (xy-chart {:width 700 :height 300}
               (-> (b/series [:grid])
                   (b/add-multi :line posteriors {} {:color (cycle (c/palette-presets :category20c))})
                   (b/add-serie [:function gp {:domain [-5 5] :color :black :stroke {:size 2 :dash [5 2]}}])
                   (b/add-serie [:scatter (map vector xs ys) {:size 8}]))
               (b/add-axes :bottom)
               (b/add-axes :left)
               (b/add-label :top "Posteriors")))))
0.1s
Gaussian Processes (Clojure)
user/draw-posterior
(save (draw-posterior (reg/gaussian-process+ {:kernel (k/kernel :gaussian 0.5)}
                                             xs ys))
      "/results/posterior.jpg")
1.4s
Gaussian Processes (Clojure)
(save (draw-posterior (reg/gaussian-process+ {:kernel (k/kernel :periodic 0.2 6.5)}
                                             xs ys))
      "/results/posteriorp.jpg")
1.1s
Gaussian Processes (Clojure)

Standard deviation

Instead of drawing posterior samples, we can draw confidence intervals using standard deviation. This can give information about areas of uncertainty. This information will be important in the next part about Bayessian Optimization.

Below charts show 50% and 95% of confidence and predicted function (mean).

(def xs [-5 1 2])
(def ys [17 10 12])
(defn draw-stddev
  [gp]
  (let [xxs (range -5 5.1 0.2)
        pairs (reg/predict-all gp xxs true)
        mu (map first pairs)
        stddev (map second pairs)
        s95 (map (partial * 1.96) stddev)
        s50 (map (partial * 0.67) stddev)]
    (xy-chart {:width 700 :height 300}
              (b/series [:grid]
                        [:ci [(map vector xxs (map - mu s95)) (map vector xxs (map + mu s95))] {:color (c/color :lightblue 180)}]
                        [:ci [(map vector xxs (map - mu s50)) (map vector xxs (map + mu s50))] {:color (c/color :lightblue)}]
                        [:line (map vector xxs mu) {:color :black :stroke {:size 2 :dash [5 2]}}]
                        [:scatter (map vector xs ys) {:size 8}])
              (b/add-axes :bottom)
              (b/add-axes :left)
              (b/add-label :top "Confidence intervals"))))
0.2s
Gaussian Processes (Clojure)
user/draw-stddev
(save (draw-stddev(reg/gaussian-process+ xs ys)) "/results/predict.jpg")
1.1s
Gaussian Processes (Clojure)

You can see that function goes down towards zero value in areas without known points. This is cause by lack of normalization.

(save (draw-stddev(reg/gaussian-process+ {:normalize? true} xs ys)) "/results/predictn.jpg")
1.0s
Gaussian Processes (Clojure)

With data normalization line is more flat, but this creates higher uncertainty.

Let's try different kernels and options

(save (draw-stddev(reg/gaussian-process+ {:kernel (k/kernel :mattern-32)
                                          :noise 0.2
                                          :normalize? true}
                   xs ys)) "/results/predictnm.jpg")
1.0s
Gaussian Processes (Clojure)
(save (draw-stddev(reg/gaussian-process+ {:kernel (k/kernel :gaussian 0.2)
                                          :normalize? false}
                   xs ys)) "/results/predictb.jpg")
1.0s
Gaussian Processes (Clojure)
(save (draw-stddev(reg/gaussian-process+ {:kscale 0.1}
                   xs ys)) "/results/predict-low-scale.jpg")
1.0s
Gaussian Processes (Clojure)

Play yourself.

PART 2 - Bayesian Optimization (soon)

Runtimes (1)