Datafy/nav implementations for java.io.File

{:deps {org.clojure/clojure {:mvn/version "1.11.1"}
        com.hyperfiddle/rcf {:mvn/version "20220902-130636"}
        compliment/compliment {:mvn/version "0.3.9"}}}
Extensible Data Notation
(ns user.datafy-fs
  "nav implementation for java file system traversals"
  (:require [clojure.core.protocols :as ccp :refer [nav]]
            [clojure.datafy :refer [datafy]]
            [clojure.spec.alpha :as s]
            [hyperfiddle.rcf :refer [tests]])
  (:import [java.nio.file Path Paths Files]
           java.io.File
           java.nio.file.LinkOption
           [java.nio.file.attribute BasicFileAttributes FileTime]))
(hyperfiddle.rcf/enable!)
2.3s

(defn get-extension [path]
  (let [found (last (re-find #"(\.[a-zA-Z0-9]+)$" path))
        ext (and found (subs found 1))]
    (or ext nil)))
(tests
  "get-extension"
  (tests
    "empty"
    (get-extension "") := nil
    (get-extension ".") := nil
    (get-extension "..") := nil
    (get-extension "image") := nil
    (get-extension "image.") := nil
    (get-extension "image..") := nil)
  (tests
    "found"
    (get-extension "image.png") := "png"
    (get-extension "image.blah.png") := "png"
    (get-extension "image.blah..png") := "png"))
0.7s

Document how java.io.File works

(tests
  "java.io.File interop"
  (def h (clojure.java.io/file "/etc"))
  (.getName h) := "etc"
  (.getPath h) := "/etc"
  (.isDirectory h) := true
  (.isFile h) := false
  ;(.getParent h) := nil -- ??
  ;(.getParentFile h) := nil -- ??
)
0.5s

What methods are available on File objects?

  (-> (datafy java.io.File) :members keys)
0.2s

java.io.File and java.nio.file.Path are entangled machinery, sort it out

(defn file-path [^File f]
  (-> f .getAbsolutePath (java.nio.file.Paths/get (make-array String 0))))
(tests
  (def p (file-path (clojure.java.io/file "/etc")))
  (instance? Path p) := true
  (-> (datafy Path) :members keys)
  (-> p .getRoot str) := "/"
  (-> p .getFileName str) := "etc"
  (-> p .getParent .getFileName str) := ""
  (-> p .getParent .toFile .getName) := ""
  #_(-> p .getParent .toFile datafy))
0.6s

File metadata is on Path

(defn path-attrs [^Path p]
  (Files/readAttributes p BasicFileAttributes (make-array java.nio.file.LinkOption 0)))
(tests
  (def attrs (path-attrs (file-path (clojure.java.io/file "/etc"))))
  (instance? BasicFileAttributes attrs) := true
  (.isDirectory attrs) := true
  (.isSymbolicLink attrs) := false
  (.isRegularFile attrs) := false
  (.isOther attrs) := false)
0.5s

Helper to get file metadata from a file. Todo extend this class with datafy. We don't need it here because it's an internal type.

(defn file-attrs [^File f] (path-attrs (file-path f)))
(tests
  (file-attrs (clojure.java.io/file "/etc")))
0.0s

Datafy impl here

(def ... `...) ; define a value for easy test assertions
(extend-protocol ccp/Datafiable
  java.nio.file.attribute.FileTime
  (datafy [o] (-> o .toInstant java.util.Date/from)))
(extend-protocol ccp/Datafiable
  java.io.File
  (datafy [^File f]
    ; represent object's top layer as EDN-ready value records, for display
    ; datafy is partial display view of an object as value records
    ; nav is ability to resolve back to the underlying object pointers
    ; they compose to navigate display views of objects like a link
    (let [attrs (file-attrs f)
          n (.getName f)]
      (as-> {::name n
             ::kind (cond (.isDirectory attrs) ::dir
                          (.isSymbolicLink attrs) ::symlink
                          (.isOther attrs) ::other
                          (.isRegularFile attrs) (if-let [s (get-extension n)]
                                                   (keyword (namespace ::foo) s)
                                                   ::unknown-file-type)
                          () ::unknown-kind)
             ::absolute-path (-> f .getAbsolutePath)
             ::created (-> attrs .creationTime .toInstant java.util.Date/from)
             ::accessed (-> attrs .lastAccessTime .toInstant java.util.Date/from)
             ::modified (-> attrs .lastModifiedTime .toInstant java.util.Date/from)
             ::size (.size attrs)} %
            (merge % (if (= ::dir (::kind %))
                       {::children (lazy-seq (.listFiles f))
                        ::parent `...}))
            (with-meta % {`ccp/nav
                          (fn [xs k v]
                            (case k
                              ; reverse data back to object, to be datafied again by caller
                              ::modified (.lastModifiedTime attrs)
                              ::created (.creationTime attrs)
                              ::accessed (.lastAccessTime attrs)
                              ::children (some-> v vec)
                              ::parent (-> f file-path .getParent .toFile)
                              v))})))))
0.1s

Does it work?

(tests
  ; careful, calling seq loses metas on the underlying
  (def h (clojure.java.io/file "/etc"))
  (type h) := java.io.File
  "(datafy file) returns an EDN-ready data view that is one layer deep"
  (datafy h)
  := #:user.datafy-fs{:name "etc",
                      :absolute-path _,
                      :size _,
                      :modified _,
                      :created _,
                      :accessed _,
                      :kind ::dir,
                      :children _
                      :parent ...})
0.9s

Nav, note that the (nav) contract is to return the underlying object instance not the data view. Call datafy again on the object if you want.

(tests
  "datafy of a directory includes a Clojure coll of children, but child elements are native file
  objects"
  (as-> (datafy h) %
        (nav % ::children (::children %))
        (datafy %)
        (take 2 (map type %)))
  := [java.io.File java.io.File]
  "nav to a leaf returns the native object"
  (as-> (datafy h) %
        (nav % ::modified (::modified %)))
  (type *1) := java.nio.file.attribute.FileTime
  "datafy again to get the plain value"
  (type (datafy *2)) := java.util.Date)
1.1s

Nav into the filesystem directories. You can see in the notebook result that ::children is a list of objects, not data. The ::children list is lazy

(tests
  (as-> (datafy h) %
        (nav % ::children (::children %))
        (datafy %) ; can skip - simple data
        (nav % 0 (% 0))
        (datafy %)
        #_(s/conform ::file %))
  := #:user.datafy-fs{:name "logrotate.d",
                      :absolute-path _,
                      :size _,
                      :modified _,
                      :created _,
                      :accessed _,
                      :kind ::dir,
                      :children _
                      :parent ...})
0.4s

Nav back into parent – ::parent is not realized until you actually nav it.

(tests
  "nav into children and back up via parent ref"
  (def m (datafy h))
  (::name m) := "etc"
  (as-> m %
        (nav % ::children (::children %))
        (datafy %) ; dir
        (nav % 0 (get % 0)) ; first file in dir
        (datafy %)
        (nav % ::parent (::parent %)) ; dir (skip level on way up)
        (datafy %)
        (::name %))
  := "etc")
0.5s
Runtimes (1)