Clojure object size estimation

Some part of this post were adapted from a discussion on the clojure mailing list.

In a language such as Clojure that handles memory sharing of data structures transparently, it is a bit difficult to find a good measure of how much memory an object occupies. How does one count some substructure that is referenced by multiple objects?

A first option is to use java.io.ObjectOutputStream

(defn total-memory [obj]
  (let [baos (java.io.ByteArrayOutputStream.)]
    (with-open [oos (java.io.ObjectOutputStream. baos)]
      (.writeObject oos obj))
    (count (.toByteArray baos))))
0.1s
Clojure
'user/total-memory
(let [a [1 2 [1 2]]
      b [2 [1 2]]
      c (next a)]
  [a b c])
0.1s
Clojure
Vector(3) [Vector(3), Vector(2), List(2)]
(total-memory *1)
0.2s
Clojure
891
(spit "results/ds.edn" (pr-str *2))
0.0s
Clojure
nil
empty

A second option is to use the JVM's build-in memory tools to get a size estimate. As we will see below, this is I think a better estimate of how much an object actually increases the memory footprint.

(defn gc []
  (dotimes [_ 4] (System/gc)))
(defn used-memory []
  (let [runtime (Runtime/getRuntime)]
    (gc)
    (- (.totalMemory runtime) (.freeMemory runtime))))
0.1s
Clojure
'user/used-memory
(defn measure [f]
  (let [before (used-memory)
        _ (def foo (binding [*in* (java.io.PushbackReader.
                                   (clojure.java.io/reader f))]
                     (read)))
        after (used-memory)]
    (- after before)))
0.1s
Clojure
'user/measure

One can clearly see the structural sharing if one runs the following cell twice.

(measure 
ds.edn
)
1.0s
Clojure
792
(measure 
ds.edn
)
1.0s
Clojure
144
{:deps {com.clojure-goes-fast/clj-memory-meter {:mvn/version "0.1.3"}
        compliment {:mvn/version "0.3.10"}}}
deps.edn
Extensible Data Notation
Runtimes (1)