Pitch: First Class Environments

Background

Article nodes like code cells and environments are currently only stored as a transit encoded blob (or transit encoded change events). This makes them opaque from Datomic's point of view, we can't query them, or create first class references.

The idea to represent articles nodes as first class Datomic entities has been around since the switch to Datomic, and has lately been explored by Martin and Arne, see First Class Graph Nodes and the graph-nodes git branch.

There are two big motivating reasons behind this push: it allows us to query nodes directly, for instance to find all files or environments created by a user, and it allows leveraging Datomic's graph database capabilities, so we can model article graphs as functional data structures, allowing fine grained versioning, with the logical conclusion being a "fork the world" model or representation.

Use Case: Importing Environments

To break out of analysis paralysis we want to start by implementing a single feature that improves the user experience, while at the same time allowing us to experiment with this new data representation.

When a user publishes an article then any environments created by that article become available, and can be imported into another notebook. This used to be called "transcluding" an environment, and is still referred to as such in the code. In the UI we now talk about "Importing (an) environment".

This can be done through the gutter action menu on a code block, or through the Runtime Settings dialog. In both cases you go through several layers of slow autocompleters to find the runtime, which does not provide great UX.

It is also necessary to publish an article before an exported environment becomes visible, even though publishing is not necessary to create the environment's docker image.

Finally, since creating an environment and renaming the runtime are separate steps this loses the correlation between the two and hence the user's intention. Is this conceptually a new environment (with a new name), or a newer version of the same environment. Currently only the latest version of an exported runtime is made available, so it's not possible to see historic versions or how they relate.

Baseline

This dialog exports the runtime as an environment that can be transcluded (i.e. creates and pushed a docker image).

Currently selecting an environment to import happens in two steps: first all article that the current profile has access to are fetched (GET /graph), and presented in a filterable list. When an article is selected all environments in that article article are fetched (GET /graph?article_id=...) and presented in a filterable list.

From the gutter actions

Or from the runtime settings. Note that "import runtime" is at the bottom here.

Proposal

This is what we currently have. Note that there are two places that list the runtime, each with slightly different affordances. On the other hand exported environments do not have a place of their own, but the docker image for the most recently exported environment is shown as part of the runtime.

This is also reflected in the data representation, where an exported environment merely means a docker-image property on the runtime, the environment is not a node by itself. This means there is no explicit versioning of exported environments possible, and metadata like the name can only be inferred from the name that the runtime has at a certain point in time.

I think it would make sense to make an exported environment a first class node, so that each export adds a new one, and users can see the list of exported environments that are created based on a given runtime.

This also makes it possible to give an environment a name independent of the runtime name.

Once there are exported environments however another question arises: is the export now the creation of a new environment, or is it conceptually an update of the previous one? We could model this explicitly, having separate update vs export affordances.

But I think that it might make more sense to fold these into one, because from our point of view they both do the same thing, they run the notebook and afterwards export the filesystem as a docker image.

However for the user we can collapse versions that have the same name, so only the most recent export for a given name is visible, but you can expand the list to see all versions, see their docker image, or navigate to the article/change that created that environment.

As a strawman, this is for example what the Runtimes UI could look like (not pictured: docker image url).

Structural sharing

An important goal on the data level is to provide functional, i.e. immutable semantics of nodes and the trees/graphs they form. This can be achieved by using Copy on Write (CoW) for nodes, as well as any nodes on the parent path up to the root of the tree. Sibling nodes are not copied but referenced, and so a degree of Structural Sharing is achieved.

In the original proposal each node property is represented as a Datomic entity. This has the benefit that properties (e.g. large blobs) can be shared rather than copied, however for smaller values like simple integers the overhead is significant and it makes dealing with graph data significantly more complex. An alternative could be to represent node properties directly as datomic properties, but possibly adding a wrapper entity (e.g. "blob") that allows sharing of a reference.

Another question to be explored is at what level to allow for mutation. In other words: which entities are pure, immutable graph nodes, and which are references to nodes whose value can be updated over time.

Affected code

- UI
com.nextjournal.editor.components.runtime/select-environment-choices
com.nextjournal.editor.components.transclusion/completer

- Backend
com.nextjournal.journal.handlers.graph/search (/graph endpoint)

/usr/games/cowsay -f stegosaurus "ksksksksks"

1.0s

Bash