# Clojure Spec for Data Science

At the SciCloj meetup in Berlin the idea came up to use `clojure.spec` to validate input data and then use generators from the same specs to fill in for the invalid data. Here we explore a proof of concept.

`{:deps`
` {org.clojure/clojure {:mvn/version "1.10.1"}`
`  org.clojure/test.check {:mvn/version "0.10.0-alpha3"}}}`
deps.edn
Clojure
`(require '[clojure.spec.alpha :as s])`
`(require '[clojure.spec.gen.alpha :as gen])`
0.2s
Clojure

Let's have an example data set with a bunch of values

`(def data [1 2 1 4 5 1 999 ""])`
0.1s
Clojure
user/data

Define the specs on what we consider valid data

`(s/def ::n (s/int-in 1 6))`
`(s/def ::input (s/coll-of ::n))`
0.1s
Clojure
:user/input

and see if our input data is considered valid

`(s/valid? ::input data)`
0.1s
Clojure
false
`(s/explain-data ::input data)`
0.1s
Clojure
Map {:clojure.spec.alpha/problems: List(2), :clojure.spec.alpha/spec: :user/input, :clojure.spec.alpha/value: Vector(8)}

We see it is not valid, and `clojure.spec` can give us exact information on what problems it found

`(def explained *1)`
0.1s
Clojure
`(get-in data (-> explained ::s/problems first :in))`
0.1s
Clojure
999

To use the generators, we can just give it the name of the spec for which we need a value

`(gen/generate (s/gen ::n))`
1.9s
Clojure
3

We can put this together, to validate the input data and automatically fill in for the values which failed validation.

`(reduce (fn [d p]`
`          (update-in d `
`                     (:in p)`
`                     (fn [n] (gen/generate (s/gen (-> p :via last))))))`
`        data`
`        (-> explained ::s/problems))`
0.1s
Clojure
Vector(8) [1, 2, 1, 4, 5, 1, 5, 4]
`(def valid-data *1)`
`(s/valid? ::input valid-data)`
0.1s
Clojure
true

And our data is now valid!

### TODO:

• write a custom generator which analyzes valid input data and generate the most common value