Oz & Vega Examples

Simple Example

(require '[oz.core :as oz])
(defn play-data [& names]
  (for [n names
        i (range 20)]
    {:time i :item n :quantity (+ (Math/pow (* i (count n)) 0.8) (rand-int (count n)))}))
    
(def stacked-bar
  {:data {:values (play-data "munchkin" "witch" "dog" "lion" "tiger" "bear")}
   :mark "bar"
   :encoding {:x {:field "time"}
              :y {:aggregate "sum"
                  :field "quantity"
                  :type "quantitative"}
              :color {:field "item"}}})
(oz/export! stacked-bar "/results/temp.html")
0.1s
Clojure
nil

Advanced Example

The groundwork for this notebook was laid out in COVID19 Visualizations in Clojure with Vega by Applied Science Studio. Special thanks to Dave Liepmann and Jack Rusher for their help along the way.

Load Oz,a CSV reader, and a JSON parser.

(require '[jsonista.core :as json]
         '[meta-csv.core :as mcsv]
         '[oz.core :as oz])
0.1s
Clojure
nil

Helper Functions

These functions normalize names across the three main datasets:

  1. COVID-19 cases in Italy

  2. geo.json data for regions and provinces in Italy

  3. Populations in each region and province in Italy

Most of the problems come with irregularities in naming across the three datasets.

;;;;;;;;;;;;;;;;;;;;
;; Conform Functions
(def fields-it->en
  "Mapping of CSV header names from Italian to their English translation."
  {;; For provinces (and some for regions too)
   "data"                    :date
   "stato"                   :state
   "codice_regione"          :region-code
   "denominazione_regione"   :region-name
   "codice_provincia"        :province-code
   "denominazione_provincia" :province-name
   "sigla_provincia"         :province-abbreviation
   "lat"                     :lat
   "long"                    :lon
   "totale_casi"             :cases
   ;; For regions
   "ricoverati_con_sintomi"      :hospitalized
   "terapia_intensiva"           :icu
   "totale_ospedalizzati"        :tot-hospitalized
   "isolamento_domiciliare"      :quarantined
   "totale_attualmente_positivi" :tot-positives
   "nuovi_attualmente_positivi"  :new-positives
   "dimessi_guariti"             :recovered
   "deceduti"                    :dead
   "tamponi"                     :tests})
(defn normalize-province-names
  "These region names a different on the population data files, geo.json files, and the covid-19 files."
  [province]
  (get
   {"Aosta" "Valle d'Aosta/Vallée d'Aoste"
    "Massa Carrara" "Massa-Carrara"
    "Bolzano" "Bolzano/Bozen"} province))
(defn normalize-region-names
  "These region names a different on the population data files, geo.json files, and the covid-19 files."
  [region]
  (get
   {"Friuli Venezia Giulia" "Friuli-Venezia Giulia"
    "Emilia Romagna" "Emilia-Romagna"
    "Valle d'Aosta" "Valle d'Aosta/Vallée d'Aoste"} region))
;; TODO: This is a nasty, unexpected edge case that would require a refactor to handle correctly
(defn normalize-trentino
  "REGION: The COVID numbers from Italy split one region into two. 'P.A. Bolzano' and 'P.A. Trento' should be combined into 'Trentino-Alto Adige/Südtirol.'"
  [region-covid-data]
  (let [keys-to-sum [:hospitalized :icu :tot-hospitalized :quarantined :tot-positives :new-positives :recovered :dead :cases :tests]
        regions-to-combine (filter #(or (= "P.A. Bolzano" (:region-name %))
                                        (= "P.A. Trento" (:region-name %)))
                                   region-covid-data)]
    (->> (map #(select-keys % keys-to-sum) regions-to-combine) ; grab the important keys from the regions we want to combine
         (reduce #(merge-with + %1 %2))                        ; add the info from each key together
         (conj (first regions-to-combine))                     ; add in the rest of the information (name, region number, etc...)
         ((fn name-combined-region [combined-regions] (assoc combined-regions :region-name "Trentino-Alto Adige/Südtirol")))
         ((fn remove-inexact-data [combined-regions] (dissoc combined-regions :lat :lon))) ; lat/lon is no longer correct data
         (conj region-covid-data))))                           ; add this back into the main collection
(defn conform-to-territory-name
  "Index each map of territory information by territory name."
  [territories territory-key]
  (into {} (map #(vector (territory-key %) %) territories)))
(defn compute-cases-per-100k
  "If the data provided includes a valid population number, calculate the number for :cases-per-100k. Else set :cases-per-100k to nil.
   By retruning a non-number, any calculations or plotting will error or crash rather than work with bad data."
  [province-data-with-pop]
  (map #(let [cases (% :cases)
              population (% :population)
              calc-cases (fn [x] (double (/ cases x)))
              per-100k (fn [x] (/ x 100000))]
          (->> (if population
                 ((comp calc-cases per-100k) population)
                 nil)
               (assoc % :cases-per-100k))) province-data-with-pop))
0.3s
Clojure
user/compute-cases-per-100k

COVID Data

Corona virus cases across Italy, distributed by provinces and regions.

(def province-covid-data-march-8
  (->> (mcsv/read-csv "/COVID-19/dati-province/dpc-covid19-ita-province-20200309.csv"
                      {:field-names-fn fields-it->en})
       (map #(update % :province-name (fn [territory-name]
                                        (if-let [update-territory-name (normalize-province-names territory-name)]
                                          update-territory-name
                                          territory-name))))))
(def province-covid-data
  (->> (mcsv/read-csv "/COVID-19/dati-province/dpc-covid19-ita-province-latest.csv"
                      {:field-names-fn fields-it->en})
       (map #(update % :province-name (fn [territory-name]
                                        (if-let [update-territory-name (normalize-province-names territory-name)]
                                          update-territory-name
                                          territory-name))))))
(def region-covid-data
  (->> (mcsv/read-csv "/COVID-19/dati-regioni/dpc-covid19-ita-regioni-latest.csv"
                      {:field-names-fn fields-it->en})
       (map #(update % :region-name (fn [region-name]
                                      (if-let [update-region-name (normalize-region-names region-name)]
                                        update-region-name
                                        region-name))))
       (normalize-trentino)))
0.2s
Clojure
user/region-covid-data

Populations Data

italy.region-population.csv
italy.province-population.csv
(def region-population-data
  "From http://www.comuni-italiani.it/province.html with updates to Trentino-Alto Adige/Südtirol and Valle d'Aosta/Vallée d'Aoste."
  (-> (mcsv/read-csv 
italy.region-population.csv
{:fields [:region-name :population :number-of-provinces]})
      (conform-to-territory-name :region-name)))
(def province-population-data
  "From http://www.comuni-italiani.it/province.html. Italy changed how provinces are structured in Sardina in 2016.
   Some are manually updated using the data here: https://en.wikipedia.org/wiki/Provinces_of_Italy"
  (-> (mcsv/read-csv 
italy.province-population.csv
{:fields [:province-name :population :abbreviation]})
      (conform-to-territory-name :province-name)))
(defn add-population-to-territories [all-territory-data all-territory-population territory-key]
  (map #(let [territory-to-update (% territory-key)]
          (->> (all-territory-population territory-to-update)
               (:population)
               (assoc % :population)))
       all-territory-data))
0.3s
Clojure
user/add-population-to-territories

Compute Final Data to Map

(def region-data "For use with resources/public/public/data/limits_IT_regions-original.geo.json"
  (-> (add-population-to-territories region-covid-data region-population-data :region-name)
      (compute-cases-per-100k)
      (conform-to-territory-name :region-name)
      (dissoc "P.A. Bolzano" "P.A. Trento")))
(def province-data-march-8
  "For use with resources/public/public/data/limits_IT_provinces-original.geo.json"
  (-> (remove (comp #{"In fase di definizione/aggiornamento"} :province-name) province-covid-data-march-8)
      (add-population-to-territories province-population-data :province-name)
      (compute-cases-per-100k)
      (conform-to-territory-name :province-name)))
(def province-data
  "For use with resources/public/public/data/limits_IT_provinces-original.geo.json"
  (-> (remove (comp #{"In fase di definizione/aggiornamento"} :province-name) province-covid-data)
      (add-population-to-territories province-population-data :province-name)
      (compute-cases-per-100k)
      (conform-to-territory-name :province-name)))
0.3s
Clojure
user/province-data

The Famous Applied Science Look and Feel

Again, from Applied Science Studio. Originally published here.

(def applied-science-palette
  {:pink   "#D46BC8"
   :green  "#38D996"
   :blue   "#4FADFF"
   :purple "#9085DA"
   ;; This gray is not normally part of our palette, but is useful for
   ;; map visualizations for places without data:
   :gray "#F3F3F3"})
(def applied-science-font
  {:mono "IBM Plex Mono"
   :sans "IBM Plex Sans"})
(def oz-config
  "Default settings for Oz visualizations"
  {:config {:style {:cell {:stroke "transparent"}}
            :legend {:labelFont (:mono applied-science-font)
                     :labelFontSize 12
                     :titleFont (:mono applied-science-font)
                     :gradientThickness 40}
            :axis {:labelFont (:mono applied-science-font)
                   :titleFont (:mono applied-science-font)
                   :titleFontSize 20}}
   :title {:font (:sans applied-science-font)
           :fontSize 14
           :anchor "middle"}})
0.1s
Clojure
user/oz-config

Final Plot

All the code exists to plot the province data, but for this demo, I'll just plot the region data.

limits_IT_provinces-original.geo.json
limits_IT_regions-original.geo.json
(def italia-region-geojson-with-data
  (update (json/read-value (java.io.File. 
limits_IT_regions-original.geo.json
)
                           (json/object-mapper {:decode-key-fn true}))
          :features
          (fn [features]
            (mapv (fn [feature]
                    (assoc feature
                           :reg_name     (:reg_name (:properties feature))
                           :Cases          (get-in region-data [(:reg_name (:properties feature)) :cases] 0)
                           :Cases-per-100k (get-in region-data [(:reg_name (:properties feature)) :cases-per-100k] 0)))
                  features))))
(def italy-dimensions
  {:width 500 :height 700})
(def italy-region-map
 (merge-with merge oz-config italy-dimensions
             {:title {:text "COVID19 cases in Italy, by province, per 100k inhabitants"}
              :data {:name "italy"
                     :values italia-region-geojson-with-data
                     :format {:property "features"}}
              :mark {:type "geoshape" :stroke "white" :strokeWidth 1}
              :encoding {:color {:field "Cases-per-100k"
                                 :type "quantitative"
                                 :scale {:domain [0 (apply max (map :cases-per-100k (vals region-data)))]}}
                         :tooltip [{:field "reg_name" :type "nominal"}
                                   {:field "Cases" :type "quantitative"}]}
              :selection {:highlight {:on "mouseover" :type "single"}}}))
(oz/export! italy-region-map "/results/temp.html")
0.8s
Clojure
nil

Appendix

{:deps
 {org.clojure/clojure {:mvn/version "1.10.0"}
  org.clojure/tools.deps.alpha {:mvn/version "0.8.677"}
  meta-csv {:mvn/version "0.1.0"}
  metasoarous/oz {:mvn/version "1.6.0-alpha6"} ;; for output to Vega
  metosin/jsonista {:mvn/version "0.2.5"}
  compliment {:mvn/version "0.3.9"}}}
deps.edn
Extensible Data Notation

Bring in the data

This repo is mounted by: Clojure
ls /COVID-19
1.2s
Bash in Clojure
Runtimes (1)