Applying Clojure Spec
I will explore and code in realistic domain. Clojure Spec will help in understand and documenting the domain while coding the "business logic".
The following dependencies are used throughout this article.
It is highly recommended to experiment with the examples in a REPL session.
{:deps
{org.clojure/clojure {:mvn/version "1.10.0"}
org.clojure/data.json {:mvn/version "1.0.0"}
org.clojure/test.check {:mvn/version "0.10.0"}
clj-http {:mvn/version "3.10.1"}
tick {:mvn/version "0.4.23-alpha"}
camel-snake-kebab {:mvn/version "0.4.1"}
expound {:mvn/version "0.8.4"}
compliment {:mvn/version "0.3.9"}}}
(require [clojure.spec.alpha :as s]
[clojure.spec.gen.alpha :as gen]
[clojure.spec.test.alpha :as stest]
[clojure.string :as str]
[clojure.pprint :refer [pprint]]
[clojure.data.json :as json]
[clj-http.client :as client]
[tick.alpha.api :as t]
[camel-snake-kebab.core :as csk]
[expound.alpha :as expound])
Clojure Spec
I am not a Clojure practitioner, i.e. don't use it at work. However, Clojure has been an inspiration since I've become more involved in UI development (about the time React came out).
Many of the Clojure practices are applicable in other languages.
functional programming
persistent-data structures
They have found a way to become everyday use for me.
One feature that I haven't seen being used outside of the the Clojure realm is Clojure Spec. This article is an exploration of using it in an almost realistic scenario.
We won't introduce what spec is here. The official rationale and guide are very clear and easy to follow.
Spec-ing a Domain
Our goal is to explore a simple domain. One that is not hard to grasp, but has enough substance to show possible usage contexts and how they can be coded. Along the way, we will see how Clojure Spec (and also Clojure itself) will help us in writing code.
Let's familiarize ourselves with Pulse Eco. It's a great platform for gathering environmental data from cities around the world. It helps in raising the awareness, and also fostering improvement and development.
We start exploring by "just talking" with the REST services provided.
Sensor
The first available resource from the API documentation is about Sensor data. It's rather simple for consumption. There are two variants: "give me all the sensors" and "give me a specific one by id". In terms of input, there is not much to discuss, so the documentation focuses on the shape of the output.
Clojure doesn't have types. When we code our data operations, it might not be so evident what is the shape of the data. This happens mostly when we are reading Clojure codebase that we haven't written, or are coming back to it after a while. So, let's try to specify what we know.
The documentation gives info about two specific sensor properties, type
and status
. They are sets of possible values, so lets spec them:
(s/def ::type {"0" "1" "2" "3"})
(s/def ::status {"ACTIVE" "INACTIVE" "NOT_CLAIMED" "BANNED"})
All the other properties, the previous two as well, are strings. We don't have more info. For now, we spec them as:
(s/def ::sensor-id string?)
(s/def ::position string?)
(s/def ::comments string?)
(s/def ::description string?)
At a point when information is sparse, we can do some exploration to dig deeper. Clojure REPL is perfect for this.
(def base-uri "https://skopje.pulse.eco/rest")
(def basic-auth {:basic-auth ["andon" (System/getenv "PULSE-ECO-PASSWORD")]})
(defn fetch [uri]
(-> (client/get uri basic-auth)
(:body)
(json/read-str :key-fn csk/->kebab-case-keyword)))
(defn fetch-sensors []
(fetch (str base-uri "/sensor")))
(def sensors (fetch-sensors))
(pprint sensors)
In the above example we are:
Using http-client to get the results from the sensor end-point.
Taking the body from the response.
Transforming the camelCase keys of the the maps in kebab-case clojure keywords.
Fetch
sensors
and explore.
Some quick "data slicing" might give a clearer picture about what we are seeing.
Number of sensors in Skopje:
(count sensors)
In what status are the sensors:
(frequencies (map :status sensors))
How many are WiFi-based:
(->> (map :type sensors)
(filter {"2" "3"})
(count))
What are the locations of the sensors in human-readable form:
(pprint (map :description sensors))
Show me the sensor data of my sensor:
(def my-sensor (first (filter (="Novo Lisice" (:description %)) sensors)))
(pprint my-sensor)
Here we can see a single sensor and its properties. We decide that we want to be stricter then just string?
-validation for some of them. sensor-id
and position
are good candidates. It is clear that sensor-id
is a UUID and position
a latitude-longitude tuple. Let's improve ::sensor-id
first.
Clojure has a predicate uuid? that is used for checking if a value is UUID. Unfortunately, when reading the JSON response, we didn't transform it to a UUID, but left it as a string. We are deciding now to create a new predicate that checks for a UUID represented as a string:
(defn uuid-string? [s]
(try
(uuid? (java.util.UUID/fromString s))
(catch IllegalArgumentException _ false)))
Now we can redefine ::sensor-id
and also create a spec for sensor(s):
(s/def ::sensor-id uuid-string?)
(s/def ::sensor
(s/keys :req-un [::sensor-id
::position
::comments
::type
::description
::status]))
(s/def ::sensors (s/coll-of ::sensor))
With these specs in place, we can try to validate the data.
First just one sensor:
(s/valid? ::sensor my-sensor)
And then all of them:
(s/valid? ::sensors sensors)
Hmm... The result being false
is unexpected. Let's see what spec says about it:
(s/explain ::sensors sensors)
Spec is providing us here with nice facility that shows errors and paths where the problems were found.
In case the above official explanation from spec isn't clear enough, we can use the expound library for a more human-readable output:
(set! s/*explain-out* expound/printer)
(s/explain ::sensors sensors)
For some reason, certain sensor-id
s are not UUIDs. They are just four-digit string numbers. Lets say this is a legacy leftover. The API producers are promising that these are the only such cases. We extend the spec for it and validate again:
(s/def ::sensor-id
(s/or :uuid-string uuid-string?
:legacy-id {"1000" "1001" "1002" "1003" "1004" "1005"}))
(s/explain ::sensors sensors)
The location
property we leave for later.
Raw Data
Off to the next service to find a bit more complexity.
We can see that the raw-data API has more input parameters:
the already introduced
sensor-id
type
for the different types of measurementsfrom/to date-time
to specify the requested period of measurements
Since we will use date-times, and we will need to perform validation, let's use a library and some helpers:
(defn date-on [date]
(-> (t/time "00:00")
(t/on date)
(t/in "Europe/Skopje")))
(defn format-date [date]
(str/replace (t/format :iso-offset-date-time date) "\+" "%2b"))
; showcasing usage and conforming to the formatting (required by api doc)
(= (format-date (date-on "2017-03-15")) "2017-03-15T00:00:00%2b01:00")
In case we are building a Clojure-client for fetching raw-data, we might create a function to do that. The API documentation has example requests. They tell us that the sensor-id
and type
are semi-optional inputs, i.e. at least one has to be provided. It's an interesting function, so lets give it a go:
(defn querify
"Creates the URL query part from the props in opts"
[opts]
(->> (filter (some? (second %)) opts)
(map (str (name (first %)) "=" (second %)))
(interpose "&")
(apply str)))
(defn fetch-raw-data
[from to {:keys [sensor-id type]}]
(fetch (str base-uri "/dataRaw?"
(querify {:from (format-date from)
:to (format-date to)
:sensorId sensor-id
:type type}))))
; measurements input from rest-api documentation
(def measurements-for-sensor-1001-type-pm10
(fetch-raw-data (date-on "2017-03-15")
(date-on "2017-03-19")
{:sensor-id "1001"
:type "pm10"}))
(def measurements-for-sensor-1001
(fetch-raw-data (date-on "2017-03-15")
(date-on "2017-03-19")
{:sensor-id "1001"}))
(def measurements-for-type-pm10
(fetch-raw-data (date-on "2017-03-15")
(date-on "2017-03-19")
{:type "pm10"}))
While this is working now, for someone that might try to use or maintain this code, it would certainly help to:
know about the relation of the inputs (
from
beforeto
, one ofsensor-id
ortype
mandatory)shape of data
These leads us to creation of a function spec. But before we tackle that, lets see if we are on a good track with the shape of the data:
; input
(s/def ::date-time (instance? java.time.ZonedDateTime %))
; input & output
(s/def ::type {"pm10" "pm25" "temperature" "humidity" "noise"})
; INFO: name collision with the ::type defined previouslly for sensor
; output
(s/def ::stamp (try (t/parse %) true
(catch Exception _ false)))
(s/def ::year (s/and int? pos?)) ; quick and dirty
(s/def ::value string?)
(s/def ::measurement
(s/keys :req-un [::sensor-id
::position
::stamp
::type
::value]
:opt-un [::year]))
(s/def ::measurements (s/coll-of ::measurement))
(s/valid? ::measurements measurements-for-sensor-1001)
Again, something is invalid, when we expected it to be valid. What's the explanation?
(s/explain ::measurements measurements-for-sensor-1001)
We see a lot of values for ::type
that are not in the expected set. To be precise:
(frequencies (map :type measurements-for-sensor-1001))
These are air-pollution metrics. Maybe the sensor only reported them in 2017, let's check what the sensor reported in 2020:
(def measurements-for-sensor-1001-in-2020
(fetch-raw-data (date-on "2020-03-15")
(date-on "2020-03-19")
{:sensor-id "1001"}))
(frequencies (map :type measurements-for-sensor-1001-in-2020))
Still the same. OK, let's see my sensor, since it's probably a newer version:
(def measurements-for-my-sensor
(fetch-raw-data (date-on "2020-03-15")
(date-on "2020-03-19")
{:sensor-id (:sensor-id my-sensor)}))
(pprint (set (map :type measurements-for-my-sensor)))
This confirms that the newer sensors are measuring values of the documented types. Our subjective opinion is that only the measurements documented and present in my sensor are the important ones. We filter the others and verify the output:
(defn fetch-raw-data-filtered
[from to {:keys [sensor-id type]}]
(->> (fetch-raw-data from to {:sensor-id sensor-id :type type})
(filter (s/valid? ::type (:type %)))))
(def measurements-for-my-sensor
(fetch-raw-data-filtered (date-on "2020-03-15")
(date-on "2020-03-19")
{:sensor-id (:sensor-id my-sensor)}))
(s/valid? ::measurements measurements-for-my-sensor)
Now that we know a bit more about the data, let's get back to the main challenge, providing a function spec:
(s/def ::raw-data-opts
(s/or :sensor-id-opt (s/keys :req-un [::sensor-id])
:type-opt (s/keys :req-un [::type])))
(s/fdef fetch-raw-data-filtered
:args (s/and (s/cat :from ::date-time
:to ::date-time
:opts ::raw-data-opts)
(t/< (:from %) (:to %)))
:ret ::measurements)
(stest/instrument user/fetch-raw-data-filtered)
In the function spec definition above we can notice the following:
The output is validated using
::measurements
with:ret
The arguments of the function are validated using
:args
The shape of the arguments is validated by using
::date-time
spec for thefrom
andto
arguments. We are using using the more complex::raw-data-opts
spec for checking that at least one of (valid)sensor-id
ortype
is present in optsIn the end, we also make sure that the
to
date-time "is later" thenfrom
Function specs can validate not only arguments and return of function, but their relation as well. In our case, such an example might be to check that the :stamp
time in each measurement is in the range [from-to]
.
The instrument
function sets up fetch-raw-data-filtered
to be validated with its own spec. This is very useful during development, where "wrong usage" of the function can be caught early. After the instrumentation setup, the function spec will make the code to error in case it does not conform. We can test this behavior with some of the following:
; "happy path"
(fetch-raw-data-filtered (date-on "2020-03-15")
(date-on "2020-03-19")
{:sensor-id (:sensor-id my-sensor)})
; "unhappy path" - from > to
(try
(fetch-raw-data-filtered (date-on "2020-03-19")
(date-on "2020-03-15")
{:sensor-id (:sensor-id my-sensor)})
(throw (Exception. "Should have failed"))
(catch clojure.lang.ExceptionInfo e
(s/explain-out (:data (Throwable->map e)))))
; "unhappy path" - no sensor-id or type provided
(try
(fetch-raw-data-filtered (date-on "2020-03-19")
(date-on "2020-03-15")
{:wrong "thing"})
(throw (Exception. "Should have failed"))
(catch clojure.lang.ExceptionInfo e (s/explain-out (:data (Throwable->map e)))))
The function spec is part of the function documentation. A potential users might look for it, and they will get a very useful insights:
(doc fetch-raw-data-filtered)
Average Data
Reading the average data service seems to bring few differences from the raw-data:
The
sensor-id
can also have additional value-1
, indicating average from all city sensorsIt is not explicitly written if
sensor-id
andtype
are mandatory or notThe service has 3 variants: daily, weekly or monthly average
Quick fetch-calls to find answers. This is what we got:
type
is mandatorysensor-id
is optional; omitting it gives separate averages for all sensors (not like-1
that returns aggregated value as one measurement)
With this in mind, here is an implementation (and specification) of the function:
(defn fetch-average-data
([period from to type]
(fetch-average-data period from to type nil))
([period from to type sensor-id]
(fetch (str base-uri "/avgData/" (name period) "?"
(querify {:from (format-date from)
:to (format-date to)
:sensorId sensor-id
:type type})))))
(s/def ::period {:day :week :month})
(s/def ::average-sensor-id (s/or :sensor ::sensor-id
:all {"-1"}))
(s/fdef fetch-average-data
:args (s/and (s/cat :period ::period
:from ::date-time
:to ::date-time
:type ::type
:sensor-id (s/? (s/nilable ::average-sensor-id)))
(t/< (:from %) (:to %)))
:ret ::measurements)
The function and spec are similar with the one from raw-data. The argument specification is simple, except the new specification for the sensor-id
. It has few caveats:
It includes
"-1"
as a valid valueIt can be optional (by spec-ing it with
s/?
)It can be
nil
-able (because of the implementation of the 4-arity function)
If we want to be more clear, we should also update the return type specification. The currently defined ::measurements
need to have different spec for sensor-id
. The spec-for position
will work, but if we want to be precise it has to be noted that when "-1"
is used (to get data for aggregated sensors) it returns empty string. All of these can be specified! It might be even cool to specify the relation "when sensor-id arg is -1 position in return will be empty string".
As we don't see the immediate need to do this, we leave it like it is.
Instrumentation will not fail when the return result does not conform with the spec. This is by design. It is expected that :ret
specs are validated using tests.
Latest Data
This is the last service from Pulse Eco API. It has methods that are suited as a first values to show the current state in a city.
They are easy to be implemented and specified with the infrastructure we already have.
(s/def ::type (s/or :main {"pm10" "pm25" "temperature" "humidity" "noise"}
:additional {"o3" "pressure" "no2" "noise_dba"}))
(defn fetch-24h []
(fetch (str base-uri "/data24h")))
(defn fetch-current []
(fetch (str base-uri "/current")))
(defn fetch-overall []
(fetch (str base-uri "/overall")))
(s/fdef fetch-24h
:args (s/cat)
:ret ::measurements)
(s/fdef fetch-current
:args (s/cat)
:ret ::measurements)
(s/fdef fetch-overall
:args (s/cat))
(fetch-overall)
We use the opportunity to update the spec for the measurement ::type
. I find that filtering was not adequate. Finding out the possible values is easy, so better to be correct.
The overall function has a (new) different response, but we will not need it, so we don't bother with spec-ing :ret
.
Business Requirements
Our goal is to draw charts with the data from Pulse-Eco API.
The charts that are available for Skopje Pulse-Eco site are interactive with the map being the center of attention. There are not many types of charts available. There is the basic daily view (with possibility of selecting sensors).
We want to use the Vega-Lite library to create interactive charts that will showcase additional views of the data. Hopefully this will be useful for visually exploring and getting insights from the data.
Daily Plot
We start by the most general case, the daily line plot. We want to see the data for a specific pollutant, I mean measurement type, during the day.
Here is a very basic implementation:
(require [clojure.set :as set])
(defn plot [spec] (with-meta spec {:nextjournal/viewer "vega-lite"}))
(defn line-plot [measurements sensors]
(let [values (set/join measurements
(map (dissoc % :type) sensors))
type (:type (first measurements))]
{:data {:values values}
:encoding {:x {:field "stamp" :type "temporal" :title "Time"}
:y {:field "value" :type "quantitative" :title type}
:color {:field "description" :type "nominal" :title "Location"}}
:mark {:type "line"}
:width 650
:height 400}))
(defn daily-plot [date type]
(let [from (date-on date)
to (t/+ from (t/new-period 1 :days))]
(line-plot (fetch-raw-data from to {:type type})
(fetch-sensors))))
(plot (daily-plot "2020-04-14" "pm25"))
Short summary of the code and reasoning:
The
plot
function is a helper from Nextjournal for drawing vega-lite chartsline-plot
is our main (somewhat) generic function. It is meant to plot "lines" from a set of measurements.Even though the measurements contain the sensor, we wanted to display a sensor-name in a human-readable form, so we are adding also the list of sensors here. We have a collision for the
:type
, so we remove it before we perform the join. This looks like a good example that we should try to prefer a usage of namespaced keywords.Expectation is that all the measurements are for a same measurement type. We might try to spec this at some point. The implementation takes this for granted and uses the type from the first measurement as a description of the y-axis.
daily-plot
takes a date and type, fetches the data and sensors and usesline-plot
.
Up until now, we implemented logic that communicates with 3rd-party API. We leverage spec to:
perform data validation
conform the usage of function
documentation
With the above plot, we have now some basic logic. It would be nice to try out some property-based testing. We want to test the logic in the line-plot
function. Its arguments are already spec-ed. The beauty is that we can relatively easy generate example data and verify correct behavior.
Let's try to generate some data from the specs ::measurements
and ::sensors
.
We start off, by looking what can be generated. For example, UUIDs can be generated from the corresponding predicate:
(pprint (gen/sample (s/gen uuid?) 3))
Of course, not every predicate will "magically work" and generate data that conforms to it. For details about this, check the documentation about using and creating custom generators.
A typical example that doesn't work is to use the pos?
predicate. It will not work:
(try (gen/sample (s/gen pos?) 3)
(catch clojure.lang.ExceptionInfo e (pprint (:cause (Throwable->map e)))))
On the other hand:
(gen/sample (s/gen (s/and int? pos?)) 3)
If you start with integers, positive can be generated (in the 100 tries that are going to be attempted). Still, it's not that simple. If you start with string
, uuid
-strings probably can not be generated.
(defn uuid-string? [s]
(try (java.util.UUID/fromString s) true
(catch IllegalArgumentException _ false)))
(s/def ::uuid-string (s/and string? uuid-string?))
(gen/sample (s/gen ::uuid-string) 3)
We can, however, start with generators and map them to a string. So, the following is a way of generating a sample of few uuid-strings:
(s/def ::uuid-string
(s/with-gen uuid-string? (gen/fmap str (s/gen uuid?))))
(gen/sample (s/gen ::uuid-string) 3)
Combined with the "legacy sensors", we can create a fairly reliable generator for sensor-ids for Pulse Eco, that can be used for any testing needs:
(s/def ::sensor-id
(s/or :uuid ::uuid-string
:legacy-id {"1000" "1001" "1002" "1003" "1004" "1005"}))
(gen/sample (s/gen ::sensor-id))
Position
We want a generator for sensor (measurement) position. Let's see how what the real data looks like:
(map :position sensors)
We will improve the spec for position, and create a generator like this:
(s/def ::latitude (s/double-in
:min -90.0 :max 90.0
:NaN? false :infinite? false))
(s/def ::longitude (s/double-in
:min -180.0 :max 180.0
:NaN? false :infinite? false))
(s/def ::position
(s/with-gen
(try
(let [pair (str/split % ",")
lat (. Double parseDouble (first pair))
lon (. Double parseDouble (second pair))]
(true? (and (>= lat -90.0) (<= lat 90.0)
(>= lon -180.0) (<= lon 180.0))))
(catch Exception _ false))
(gen/fmap
(fn [[lat lon]] (str lat "," lon))
(gen/tuple (s/gen ::latitude) (s/gen ::longitude)))))
(gen/sample (s/gen ::position))
A possible (naive) generator for Skopje, might be:
(s/def ::skopje-position
(s/with-gen
::position
(gen/fmap
(fn [[lat lon]] (str lat "," lon))
(gen/tuple
(s/gen (s/double-in :min 41.900833 :max 42.076585
:NaN? false :infinite? false))
(s/gen (s/double-in :min 21.477083 :max 21.667961
:NaN? false :infinite? false))))))
Interactive Plot
The following is an example of interactive plot. It is shown with a certain sensor preselected, but you can use it to explore the others by hovering over.
(def pm25 (map
(assoc % :stamp2 (->> (:stamp %)
(java.time.ZonedDateTime/parse)
(t/time)
(str)))
(fetch-raw-data (date-on "2020-03-01")
(date-on "2020-03-02")
{:type "pm25"})))
pm25
(require [clojure.set :as set])
(defn plot [spec] (with-meta spec {:nextjournal/viewer "vega-lite"}))
(def line-plot
{:data {:values (set/join
pm25
(map
(dissoc % :type :position :comments :status)
sensors))}
:encoding {:color {:condition {:field "description"
:type "nominal"
:selection "hover"
:legend nil}
:value "grey"}
:opacity {:condition {:selection "hover"
:value 1}
:value 0.2}}
:layer [{:encoding {:x {:field "stamp"
:type "temporal"
:title "hour"}
:y {:field "value"
:type "quantitative"
:title "temperature"}}
:layer [{:selection {:hover {:type "single"
:on "mouseover"
:empty "all"
:fields ["description"]
:init {:description "Novo Lisice"}}
}
:mark {:type "line"
:strokeWidth 4
:stroke "transparent"}}
{:mark "line"}]
}
{:encoding {:x {:aggregate "max"
:field "stamp"
:type "temporal"}
:y {:aggregate {:argmax "stamp"}
:field "value"
:type "quantitative"}}
:layer [{:mark {:type "circle"}}
{:mark {:type "text" :align "left" :dx 4}
:encoding {:text {:field "description"
:type "nominal"}}}]}]
:config {:view {:stroke nil}}
:width 650
:height 400})
(plot line-plot)
Final Words
This exploration into Clojure Spec was very insightful. I find spec-ing to be very useful and I didn't have too much hurdles in doing what I wanted. The power of what you can spec is enormous. You get to define much more than you would do with type systems, and I really like how spec is orthogonal over the main business code (which stays clean and uncluttered).
If I had more time, I would like to explore more property-based tests. Would have been nicer to compile an example of finding a minimal set of data that "breaks" the business logic based on some of the generators here.
For some near future, I hope I can create plots for use-cases that are not yet in pulse.eco.