Sorting maps by key with rewrite-clj

Eric Dallo asked me on Slack how he might sort a map by its keys using rewrite-clj.

Here's an initial naive stab at it. I typically reach for the zip API when solving a problem. Here I've opted to instead work with rewrite-clj nodes more directly to give me some rewrite-clj node exercise.

This initial work does not look at namespaced maps, but I assume that it will not be terribly complicated to support these as well.

{:deps {org.clojure/clojure {:mvn/version "1.10.3"}
        rewrite-clj/rewrite-clj {:mvn/version "1.0.699-alpha"}
        ;; complient is used for autocompletion
        ;; add your libs here (and restart the runtime to pick up changes)
        compliment/compliment {:mvn/version "0.3.9"}}}
Extensible Data Notation

Requires

(require
  '[rewrite-clj.zip :as z]
  '[rewrite-clj.node :as n])
0.0s

Rules for sortability

What would make the current node in a zipper map-sortable? Eric and I thought the following rules are reasonable for a first crack:

  • must be a balanced map (rewrite-clj allows for unbalanced maps, ex {:a 1 :b})

  • cannot include discards, i.e. #_

Let's set up a function:

(defn sortable-map?
  "A node is a candidate for sorting if it is a map that contains no discards, and has more than 0 key-value pairs"
  [zloc]
  (boolean
   (and (= :map (z/tag zloc))
        (not (z/find-tag (z/down* zloc) z/right* :uneval))
        (-> zloc z/down)
        (even? (->> zloc
                    z/down
                    (iterate z/right)
                    (take-while identity)
                    count)))))
0.0s

A helper to analyze map child nodes

A helper function to analyze and partition map child nodes. We could maybe simplify/clean up after we finalize an approach.

(defn partition-map-children
  "Partition and distinguish <k><other nodes>+<v>(<whitespace><comment>)* from <nodes outside key-value pair>+
    [{:k <key-node> :s [<key-node> <other nodes> <value-node> <optionally trailing comment>]}
     {:s [<space/comment nodes outside key-value pair>]}
     {:k <key-node> :s [...]}]"
  [map-children]
  (loop [children map-children
         k nil
         aseq []
         res []]
    (let [n (first children)
          t (when n (n/tag n))]
      (cond
        (nil? n)
        (if (seq aseq)
          (conj res {:s aseq})
          res)
        ;; spaces and such
        (some #{t} [:comment :whitespace :comma :newline])
        (if (and (= :comment t)
                 (not k)
                 (:k (last res))
                 (not (seq (filter #(= :newline (n/tag %)) aseq))))
          ;; tack on found trailing comment nodes to last kv seq
          (recur (rest children)
                 k
                 ;; comments include newlines, imperfect to leave a compensating newline
                 ;; but better than not leaving one..
                 [(n/newlines 1)] 
                 (conj (vec (butlast res))
                       (merge-with concat (last res) {:s (conj aseq n)})))
          (recur (rest children)
                 k
                 (conj aseq n)
                 res))
        ;; if we found a key, then we must be on a value
        k
        (recur (rest children)
               nil
               []
               (conj res {:k k :s (conj aseq n)}))
        ;; not key, so must be a value
        :else
        (recur (rest children)
               n
               [n]
               (if (seq aseq)
                 (conj res {:s aseq})
                 res))))))
0.0s

And our map sorter

(defn sort-map-by-key
  "Return zloc with current map node sorted.
  If current node is not a sortable map, returns zloc.
  - map is sorted by string version of its key
  - no nested sorting
  - leaves spacing surrounding key-value pairs as is.
  - no special handling for multi-line nodes, some formatting cleanup may be required after sort."
  [zloc]
  (if (not (sortable-map? zloc))
    zloc
    (let [children (->> zloc z/node n/children)
          node-seqs (partition-map-children children)
          kv-seqs (filter :k node-seqs)
          kv-seqs-sorted (sort-by #(-> % :k n/string) kv-seqs)
          ;; assemble sorted kv seqs back preserving any surrounding whitespace
          children-sorted (loop [node-seqs node-seqs
                                 kv-seqs-sorted kv-seqs-sorted
                                 res []]
                            (let [orig-seq (first node-seqs)]
                              (cond
                                (nil? orig-seq)
                                res
                                (:k orig-seq)
                                (recur
                                 (rest node-seqs)
                                 (rest kv-seqs-sorted)
                                 (concat res (-> kv-seqs-sorted first :s)))
                                :else
                                (recur
                                 (rest node-seqs)
                                 kv-seqs-sorted
                                 (concat res (-> node-seqs first :s))))))]
      (z/replace* zloc (-> zloc
                           z/node
                           (n/replace-children children-sorted))))))
0.1s

Let's try it

Easy cases

We'll start real simple:

(-> "{:z 4 :a 1 :x 3 :b 2}"
  z/of-string
  sort-map-by-key
  z/print-root)
0.4s

Ok good, how about if we add some newlines?

(-> "{:z 4
 :a 1
 :x 3
 :b 2}"
  z/of-string
  sort-map-by-key
  z/print-root)
0.4s

Ok, good. What if indentation is irregular? Our initial strategy is to try to preserve surrounding whitespace and only move key-value pairs. Let's see how that works:

(-> "{    :d 4
    :c 3
   :b 2
  :a 1 }"
  z/of-string
  sort-map-by-key
  z/print-root)
9.0s

And commas, they handled?


(-> "{,,,,,:e 5,,,,:d 4,
,,,:c 3,
,,:b 2,
,:a 1,}"
  z/of-string
  sort-map-by-key
  z/print-root)
0.3s

Any spacing between key and value should be preserved

(-> "{:d ,,,, 4
 :c ,,, 3
 :b ,, 2
 :a , 1 }"
  z/of-string
  sort-map-by-key
  z/print-root)
0.4s

Imperfect handling

There are some cases that my initial stab does not cover perfectly.

Well, some that I know about, I'm sure there are also some that I don't know about yet!

Eric was thinking he might post-format to handle these imperfections.

Nodes that wrap

If a node contains newlines, alignment will not always be preserved.

Notice here that :c is aligned under :x on input but not on output.

(def in "{:z 1 :b {:x 4
          :c 7}
 :a [1 2
     3 7]}")
(println "-in->")
(println in)
(println "-out->")
(-> in
  z/of-string
  sort-map-by-key
  z/print-root)
0.4s

Maps that contain comments

If a map contains a comment that trails a key-value pair, we'll assume it belongs to that key-value pair. Otherwise, comments will be left in their same original positions for the user to manually adjust after the sort.

In rewrite-clj a comment includes its terminating newline.

Let's try an example:

(def in "{;; some map comment
 :x 3 ;; x comment
 :z 5 ;; z comment
 ;; some comment after kv pair 2
 :a 1,,, ;; a comment
 :y 4 ;; y comment
 :b 2}")
(println "-in->")
(println in)
(println "-out->")
(-> in
  z/of-string
  sort-map-by-key
  z/print-root)
0.4s

You are probably noticing those extra newlines.

These were added to compensate for the newlines moving with trailing comments. We might be able to refine this, but perhaps it is good enough.

Runtimes (1)