One feature that I haven't seen being used outside of the the Clojure realm is Clojure Spec. This article is an exploration of using it in an almost realistic scenario.
We won't introduce what spec is here. The official rationale and guide are very clear and easy to follow.
Spec-ing a Domain
Our goal is to explore a simple domain. One that is not hard to grasp, but has enough substance to show possible usage contexts and how they can be coded. Along the way, we will see how Clojure Spec (and also Clojure itself) will help us in writing code.
Let's familiarize ourselves with Pulse Eco. It's a great platform for gathering environmental data from cities around the world. It helps in raising the awareness, and also fostering improvement and development.
We start exploring by "just talking" with the REST services provided.
The first available resource from the API documentation is about Sensor data. It's rather simple for consumption. There are two variants: "give me all the sensors" and "give me a specific one by id". In terms of input, there is not much to discuss, so the documentation focuses on the shape of the output.
Clojure doesn't have types. When we code our data operations, it might not be so evident what is the shape of the data. This happens mostly when we are reading Clojure codebase that we haven't written, or are coming back to it after a while. So, let's try to specify what we know.
The documentation gives info about two specific sensor properties, type and status. They are sets of possible values, so lets spec them:
Here we can see a single sensor and its properties. We decide that we want to be stricter then just string?-validation for some of them. sensor-id and position are good candidates. It is clear that sensor-id is a UUID and position a latitude-longitude tuple. Let's improve ::sensor-id first.
Clojure has a predicate uuid? that is used for checking if a value is UUID. Unfortunately, when reading the JSON response, we didn't transform it to a UUID, but left it as a string. We are deciding now to create a new predicate that checks for a UUID represented as a string:
For some reason, certain sensor-ids are not UUIDs. They are just four-digit string numbers. Lets say this is a legacy leftover. The API producers are promising that these are the only such cases. We extend the spec for it and validate again:
In case we are building a Clojure-client for fetching raw-data, we might create a function to do that. The API documentation has example requests. They tell us that the sensor-id and type are semi-optional inputs, i.e. at least one has to be provided. It's an interesting function, so lets give it a go:
"Creates the URL query part from the props in opts"
This confirms that the newer sensors are measuring values of the documented types. Our subjective opinion is that only the measurements documented and present in my sensor are the important ones. We filter the others and verify the output:
In the function spec definition above we can notice the following:
The output is validated using ::measurements with :ret
The arguments of the function are validated using :args
The shape of the arguments is validated by using ::date-time spec for the from and to arguments. We are using using the more complex ::raw-data-opts spec for checking that at least one of (valid) sensor-id or type is present in opts
In the end, we also make sure that the to date-time "is later" then from
Function specs can validate not only arguments and return of function, but their relation as well. In our case, such an example might be to check that the :stamp time in each measurement is in the range [from-to].
The instrument function sets up fetch-raw-data-filtered to be validated with its own spec. This is very useful during development, where "wrong usage" of the function can be caught early. After the instrumentation setup, the function spec will make the code to error in case it does not conform. We can test this behavior with some of the following:
The function and spec are similar with the one from raw-data. The argument specification is simple, except the new specification for the sensor-id. It has few caveats:
It includes "-1" as a valid value
It can be optional (by spec-ing it with s/?)
It can be nil-able (because of the implementation of the 4-arity function)
If we want to be more clear, we should also update the return type specification. The currently defined ::measurements need to have different spec for sensor-id. The spec-for position will work, but if we want to be precise it has to be noted that when "-1" is used (to get data for aggregated sensors) it returns empty string. All of these can be specified! It might be even cool to specify the relation "when sensor-id arg is -1 position in return will be empty string".
As we don't see the immediate need to do this, we leave it like it is.
Instrumentation will not fail when the return result does not conform with the spec. This is by design. It is expected that :ret specs are validated using tests.
This is the last service from Pulse Eco API. It has methods that are suited as a first values to show the current state in a city.
They are easy to be implemented and specified with the infrastructure we already have.
We use the opportunity to update the spec for the measurement ::type. I find that filtering was not adequate. Finding out the possible values is easy, so better to be correct.
The overall function has a (new) different response, but we will not need it, so we don't bother with spec-ing :ret.
Our goal is to draw charts with the data from Pulse-Eco API.
The charts that are available for Skopje Pulse-Eco site are interactive with the map being the center of attention. There are not many types of charts available. There is the basic daily view (with possibility of selecting sensors).
We want to use the Vega-Lite library to create interactive charts that will showcase additional views of the data. Hopefully this will be useful for visually exploring and getting insights from the data.
We start by the most general case, the daily line plot. We want to see the data for a specific pollutant, I mean measurement type, during the day.
The plot function is a helper from Nextjournal for drawing vega-lite charts
line-plot is our main (somewhat) generic function. It is meant to plot "lines" from a set of measurements.
Even though the measurements contain the sensor, we wanted to display a sensor-name in a human-readable form, so we are adding also the list of sensors here. We have a collision for the :type, so we remove it before we perform the join. This looks like a good example that we should try to prefer a usage of namespaced keywords.
Expectation is that all the measurements are for a same measurement type. We might try to spec this at some point. The implementation takes this for granted and uses the type from the first measurement as a description of the y-axis.
daily-plot takes a date and type, fetches the data and sensors and uses line-plot.
Up until now, we implemented logic that communicates with 3rd-party API. We leverage spec to:
perform data validation
conform the usage of function
With the above plot, we have now some basic logic. It would be nice to try out some property-based testing. We want to test the logic in the line-plot function. Its arguments are already spec-ed. The beauty is that we can relatively easy generate example data and verify correct behavior.
Let's try to generate some data from the specs ::measurements and ::sensors.
We start off, by looking what can be generated. For example, UUIDs can be generated from the corresponding predicate:
If you start with integers, positive can be generated (in the 100 tries that are going to be attempted). Still, it's not that simple. If you start with string, uuid-strings probably can not be generated.
This exploration into Clojure Spec was very insightful. I find spec-ing to be very useful and I didn't have too much hurdles in doing what I wanted. The power of what you can spec is enormous. You get to define much more than you would do with type systems, and I really like how spec is orthogonal over the main business code (which stays clean and uncluttered).
If I had more time, I would like to explore more property-based tests. Would have been nicer to compile an example of finding a minimal set of data that "breaks" the business logic based on some of the generators here.
For some near future, I hope I can create plots for use-cases that are not yet in pulse.eco.