Einsum as a Domain-Specific Language in julia

Einsum

A domain-specific language (DSL) is, according to Wikipedia, "a computer language specialized to a particular application domain." In this blogpost, we will look at how to implement the einsum-DSL in julia.

For an explanation of what an einsum-specification is, see my earlier blogpost. Just as a refresher:

In einsum, we specify operations on multidimensional arrays with strings that map to sums over products. As an example, the string "ij,jk -> ik" specifies an operation between two matrices with indices labelled "ij" and "jk" respectively. The resulting tensor with indices "ik" is constructed by summing over all values for the index-labels and taking the products between the matrix-values, thus "ij,jk -> ik" translates to - a simple matrix product.

The reference einsum-implementation is numpy.einsum which takes a string as input. We could go that route too, taking a string, parsing it and evaluating the operations it encodes but in julia, that approach has drawbacks.

The Problem with String-Specifications

An important factor for julia's speed is the concept of type stability. Type stability is about the predictability of variable types - if a variable changes type in a function body, it is hard for the compiler to make the various guarantees about types that enable optimisations. If, on the other hand, the type of a variable is stable within a function body, the compiler can avoid dynamic method-lookups and possibly inline code and fix container-types among others. These optimisations can lead to massive speedups.

Why this detour when we're talking about einsum? The problem is that the type of the einsum-specification - a string - carries no information what-so-ever about the operation it encodes. A string might specify an identity, a matrix product, star-contraction, trace or one of many other operations each mapping to possibly different output-types. Thus, functions containing such an einsum can not be type stable because their output-type can not be predicted from the types of the inputs.

The Solution

So we shouldn't use strings, what now? There are alternative ways to specify the operations. Under the hood, I decided to have whatever-input turned into a custom structure, containing the tuples of labelled indices, such that e.g. "ij,jk -> ik" is turned into EinCode{(('i','j'),('j','k')), ('i','k')}().

EinCode{ixs,iy}() is a struct I defined to hold a tuple of the input-indices ixs and the output indices iy. Similar to the MIME type (defined here), EinCode contains information as type parameters: they are thus available at compile time, enabling us to infer some information about the return type (its dimensionality or rank) and dispatch an efficient implementation purely from the specification.

The parser may be implemented with a simple regular expression and packing the index-labels into tuples and then wrap them in EinCode:

struct EinCode{ixs,iy} end
EinCode(ixs,iy) = EinCode{ixs,iy}()

function parse_einstring(s::AbstractString)
  # remove whitespace
	s = replace(s, " " => "")
  # match and group input and output indices
  m = match(r"([a-z,α-ω]+)->([a-zα-ω]*)", s)
  # extract in and out indices
  sixs, siy = m.captures
	return EinCode(Tuple(Tuple(ix) for ix in split(sixs,',')), Tuple(siy))
end

0.9s

Julia

parse_einstring (generic function with 1 method)

parse_einstring("ij,jk -> ik")

1.0s

Julia

EinCode{(('i', 'j'), ('j', 'k')),('i', 'k')}()

This does not buy us too much - parse_einstring is still not-typestable but its output can be fed to a next function that might be type stable (i.e. we use a function barrier). But if we use multiple einsums in a function we might still pay huge costs for the necessary dynamic dispatch.

Writing the EinCode directly would help this but is a hassle and very error-prone (as I learned myself, testing things before I introduced another specification format). There is an inherent tension between

a human providing enough information to express what he wants done (with e.g. a string)
a compiler getting enough information to optimally compile a function

while the former aims for brevity and clarity, the latter wants as much information as possible (with some caveats).

One solution in julia is string-literals. String literals are strings that are prefixed by an identifier, in our case ein, and they behave quite differently than regular strings. Above you might have noticed that in the regular expression in parse_einstring we already used a string-literal: regular expressions can be defined with the identifier r as in r"([a-z,α-ω]+)->([a-zα-ω]*)".

We want a string-literal that turns the einsum-specification directly into the EinCode-representation that we can then use for type stable operations.

We choose the prefix ein, writing ein"ij,jk -> ik" instead of "ij,jk -> ik" as before. A tiny change but a big effect. With the identifier, we can directly pass the string to a macro at compile time, parse the string to the EinCode-form and it will look to the compiler as if we wrote down the struct ourselves.

To achieve that, we simply define

macro ein_str(s::AbstractString)
	parse_einstring(s)
end

0.7s

Julia

@ein_str (macro with 1 method)

and from now on we can specify an einsum with

ein"ij,jk -> ik"

0.2s

Julia

EinCode{(('i', 'j'), ('j', 'k')),('i', 'k')}()

There is no new result

parse_einstring("ij,jk -> ik") == ein"ij,jk -> ik"

0.5s

Julia

true

but instead of at run-time, the string is parsed at compile-time! (Due to the just-in-time compilation in julia, this distinction is a bit subtle but for the purpose of this post, let's take it simple.) This shows itself when timing the code:

using BenchmarkTools
@btime ein"ij,jk -> ik";
@btime parse_einstring("ij,jk -> ik");

8.7s

Julia

The string-literal does not require any time. The "measured" 0.031ns for runtime is way below any measurement precision, basically noise. What this shows is that the evaluation happens at compile time which saves us around 6 μs every time we evaluate an einsum in our code. On top of that, the compiler then sees a EinCode in the code, enabling type-stable operations.

So we can simply specify an input with a string-literal while for the compiler it looks like we explicitly supplied all the necessary information for it to work well. And the required change was minimal.

This is of course only the first step in evaluating the einsum, but we might also use the string-literal to parse the input further and decide which operations it maps to at compile-time.

Limitations

While we gained a lot of expressiveness with the string-literal, there are a number of reasons why string-literals and macros in general are not the be-all end-all of julia.

First, we can not create a string-literal at runtime, because we can not interpolate into string-literals. While libraries that require this could directly use the intermediate representation, this means that there need to be two (or more!) interfaces to the functionality which requires more work for maintainer and user.

Second, a DSL is by definition not just regular julia. It is practically a mini-language within julia, adding cognitive load and removing some of the tools we could use were we using regular functions such as e.g. debuggers or syntax highlighting (both within the string). And if all small domains had their own DSL, julia wouldn't solve the two-language problem but instead introduce the 1001-language problem. DSLs should be used sparingly.

Finally, while in a function we only pay the price of parsing the string-literal once - at compile time - in interactive use we have to pay it each time we call the function. While this might be a small price to pay in our case, for (much) more complex parsing this might be noticeable.

Conclusion

Using string-literals for DSLs is a simple and quick way to resolve some of the tension of what is comfortable for the user to write and what the compiler requires to work well. For einsum, due to the existing implementations, the DSL itself is a feature but in general one should be weary of introducing DSLs since they have their limitations.