Sorting maps by key with rewrite-clj
Eric Dallo asked me on Slack how he might sort a map by its keys using rewrite-clj.
Here's an initial naive stab at it. I typically reach for the zip API when solving a problem. Here I've opted to instead work with rewrite-clj nodes more directly to give me some rewrite-clj node exercise.
This initial work does not look at namespaced maps, but I assume that it will not be terribly complicated to support these as well.
{:deps {org.clojure/clojure {:mvn/version "1.10.3"}
rewrite-clj/rewrite-clj {:mvn/version "1.0.699-alpha"}
;; complient is used for autocompletion
;; add your libs here (and restart the runtime to pick up changes)
compliment/compliment {:mvn/version "0.3.9"}}}
Requires
(require
[rewrite-clj.zip :as z]
[rewrite-clj.node :as n])
Rules for sortability
What would make the current node in a zipper map-sortable? Eric and I thought the following rules are reasonable for a first crack:
must be a balanced map (rewrite-clj allows for unbalanced maps, ex {:a 1 :b})
cannot include discards, i.e. #_
Let's set up a function:
(defn sortable-map?
"A node is a candidate for sorting if it is a map that contains no discards, and has more than 0 key-value pairs"
[zloc]
(boolean
(and (= :map (z/tag zloc))
(not (z/find-tag (z/down* zloc) z/right* :uneval))
(-> zloc z/down)
(even? (->> zloc
z/down
(iterate z/right)
(take-while identity)
count)))))
A helper to analyze map child nodes
A helper function to analyze and partition map child nodes. We could maybe simplify/clean up after we finalize an approach.
(defn partition-map-children
"Partition and distinguish <k><other nodes>+<v>(<whitespace><comment>)* from <nodes outside key-value pair>+
[{:k <key-node> :s [<key-node> <other nodes> <value-node> <optionally trailing comment>]}
{:s [<space/comment nodes outside key-value pair>]}
{:k <key-node> :s [...]}]"
[map-children]
(loop [children map-children
k nil
aseq []
res []]
(let [n (first children)
t (when n (n/tag n))]
(cond
(nil? n)
(if (seq aseq)
(conj res {:s aseq})
res)
;; spaces and such
(some {t} [:comment :whitespace :comma :newline])
(if (and (= :comment t)
(not k)
(:k (last res))
(not (seq (filter (= :newline (n/tag %)) aseq))))
;; tack on found trailing comment nodes to last kv seq
(recur (rest children)
k
;; comments include newlines, imperfect to leave a compensating newline
;; but better than not leaving one..
[(n/newlines 1)]
(conj (vec (butlast res))
(merge-with concat (last res) {:s (conj aseq n)})))
(recur (rest children)
k
(conj aseq n)
res))
;; if we found a key, then we must be on a value
k
(recur (rest children)
nil
[]
(conj res {:k k :s (conj aseq n)}))
;; not key, so must be a value
:else
(recur (rest children)
n
[n]
(if (seq aseq)
(conj res {:s aseq})
res))))))
And our map sorter
(defn sort-map-by-key
"Return zloc with current map node sorted.
If current node is not a sortable map, returns zloc.
- map is sorted by string version of its key
- no nested sorting
- leaves spacing surrounding key-value pairs as is.
- no special handling for multi-line nodes, some formatting cleanup may be required after sort."
[zloc]
(if (not (sortable-map? zloc))
zloc
(let [children (->> zloc z/node n/children)
node-seqs (partition-map-children children)
kv-seqs (filter :k node-seqs)
kv-seqs-sorted (sort-by (-> % :k n/string) kv-seqs)
;; assemble sorted kv seqs back preserving any surrounding whitespace
children-sorted (loop [node-seqs node-seqs
kv-seqs-sorted kv-seqs-sorted
res []]
(let [orig-seq (first node-seqs)]
(cond
(nil? orig-seq)
res
(:k orig-seq)
(recur
(rest node-seqs)
(rest kv-seqs-sorted)
(concat res (-> kv-seqs-sorted first :s)))
:else
(recur
(rest node-seqs)
kv-seqs-sorted
(concat res (-> node-seqs first :s))))))]
(z/replace* zloc (-> zloc
z/node
(n/replace-children children-sorted))))))
Let's try it
Easy cases
We'll start real simple:
(-> "{:z 4 :a 1 :x 3 :b 2}"
z/of-string
sort-map-by-key
z/print-root)
Ok good, how about if we add some newlines?
(-> "{:z 4
:a 1
:x 3
:b 2}"
z/of-string
sort-map-by-key
z/print-root)
Ok, good. What if indentation is irregular? Our initial strategy is to try to preserve surrounding whitespace and only move key-value pairs. Let's see how that works:
(-> "{ :d 4
:c 3
:b 2
:a 1 }"
z/of-string
sort-map-by-key
z/print-root)
And commas, they handled?
(-> "{,,,,,:e 5,,,,:d 4,
,,,:c 3,
,,:b 2,
,:a 1,}"
z/of-string
sort-map-by-key
z/print-root)
Any spacing between key and value should be preserved
(-> "{:d ,,,, 4
:c ,,, 3
:b ,, 2
:a , 1 }"
z/of-string
sort-map-by-key
z/print-root)
Imperfect handling
There are some cases that my initial stab does not cover perfectly.
Well, some that I know about, I'm sure there are also some that I don't know about yet!
Eric was thinking he might post-format to handle these imperfections.
Nodes that wrap
If a node contains newlines, alignment will not always be preserved.
Notice here that :c is aligned under :x on input but not on output.
(def in "{:z 1 :b {:x 4
:c 7}
:a [1 2
3 7]}")
(println "-in->")
(println in)
(println "-out->")
(-> in
z/of-string
sort-map-by-key
z/print-root)
Maps that contain comments
If a map contains a comment that trails a key-value pair, we'll assume it belongs to that key-value pair. Otherwise, comments will be left in their same original positions for the user to manually adjust after the sort.
In rewrite-clj a comment includes its terminating newline.
Let's try an example:
(def in "{;; some map comment
:x 3 ;; x comment
:z 5 ;; z comment
;; some comment after kv pair 2
:a 1,,, ;; a comment
:y 4 ;; y comment
:b 2}")
(println "-in->")
(println in)
(println "-out->")
(-> in
z/of-string
sort-map-by-key
z/print-root)
You are probably noticing those extra newlines.
These were added to compensate for the newlines moving with trailing comments. We might be able to refine this, but perhaps it is good enough.