# Julia summer of einsum

I was selected for the summer of code this year and I'm spending these days working on an `einsum`

implementation in julia and algorithms using this implementation. This blogpost will explain what `einsum`

is, in what form it is available in julia and finally how, with a few building blocks, we might go about implementing the functionality ourselves. If you want to follow the project directly, check out the github repository.

## einsum

*einsum* is a function popularized by *numpy* with additional implementations in *torch*, *tensorflow* and *julia*. It enables us to write tensor-contractions in a style derived from the Einstein summation convention from which, presumably, it got its name: *Ein*-stein *sum*-ation.

The Einstein summation convention is a rule to simplify the notation for certain tensor operations, namely *contractions*. A contraction usually refers to a contraction of two indices. Let us start with a rank-2 tensor, that is, a tensor with two indices, more commonly referred to as a matrix just as in julia where we have `Matrix == Array{<:Any,2}`

.

a = randn(2,2)

Since `a`

has two indices, the only contraction we may do is to contract those two indices with each other. Contracting two indices means to *take the sum over equal indices* which we can write as

out = zero(Float64) for i in 1:2 global out += a[i,i] end out

We might recognise this as the *trace* of `a`

. In a more mathematical notation, we can write *drop* the summation symbol -

If we introduce another matrix `b`

as

b = randn(2,2)

we can contract indices between the two matrices` a`

and` b`

, e.g. the second of` a`

with the first index of` b`

. In the maths notation we'd specify the operation by saying that the output` c`

has elements

which according to the summation convention we can write as ` a`

and` b`

, but to make it more explicit we can construct` c`

c = zeros(2,2) for i in 1:2, j in 1:2, k in 1:2 c[i,j] += a[i,k] * b[k,j] end

and check that it is the same as `a * b`

c ≈ a * b

So now we've seen that both traces and matrix products are tensor contractions and we can specify tensor contractions concisely by labelling indices of matrices and having indices that are contracted with each other labelled the same.

A new notation for trace and matrix multiplication alone would not be too exciting though but contractions can be used for tensors with other than two indices.

Multiplying a matrix` a`

with a vector` v`

can be expressed as

vector dot products between vectors` v`

and` w`

as

and we can contract higher-ranked tensors` a `

and` b`

as they might appear in physics as

`einsum `

is a general interface for tensor contractions with some added operations, but in its original specification it works by providing a list of (input) tensors and their labelled indices and labels for the output tensor.

As an example in numpy, the trace is specified like so

import numpy a = [[1,2],[3,4]] numpy.einsum("ii->",a)

and a matrix product looks like

a = [[1,2],[3,4]] b = [[5,6],[7,8]] numpy.einsum("ik,kj -> ij", a,b)

So we specify the index-labels in the Einstein convention and let einsum take care of the rest. On top of the strict *contract twice occuring labels* that's usual in physics, einsum supports more operations, such as

numpy.einsum("ii -> i", a)

You might already see that this is the *diagonal* of a: It maps a two-index object - a matrix - to a one-index object - a vector- and only takes the values where the two indices of the input are the same!

But we can make it more explicit by writing the loop-version

out = zeros(Int,2) a = [1 2; 3 4] for i in 1:2 out[i] += a[i,i] end out

The general` einsum`

in numpy specified as

`#O = einsum("l_1,l_2,l_3... -> l_O", A1, A2, A3,...)`

translates to the sum of products

where `O`

)` `

is the output, `A_i`

)are the inputs and the vectors `l_i`

)are the labels of tensors *not* the output.

With this definition we also get the reduction of tensors along a dimension as specified in

numpy.einsum("ij -> i", a)

And this concludes what einsum *is*: A way to specify certain tensor-operations and a backend to *evaluate* those specifications.

## Einsum.jl

Pkg.add("Einsum");

In julia, there is an implementation of `einsum`

in the package `Einsum.jl`

. `Einsum.jl`

has a macro-based notation that enables us to write expressions closer to the maths expressions than the python implementations.

Let us look at some examples:

a = randn(2,2) b = randn(2,2) using Einsum, LinearAlgebra c = zeros(2,2) c[i,j] = a[i,k] * b[k,j] # matrix multiplication d = zeros() d[] = a[i,i] # trace e = zeros(2) e[i] = a[i,i] # diagonal f = zeros(2) f[i] = a[i,j] # reduction over second index of a (c ≈ a * b, d[] ≈ tr(a), e ≈ diag(a), f ≈ sum(a,dims=2)) g = randn(2,2,2,2,3) h = randn(2,2,3) gh[i,j,k] := g[i,j,m,n,k] * h[m,n,k] # contraction and diagonal

where the last example - contracting `g`

with `h`

- does not have an elementary counterpart in the `LinearAlgebra`

library. But it looks almost like batched matrix multiplication.

`Einsum.jl`

takes the expression placed after the `@einsum`

macro and lowers it to loops as one might write by hand and we did in the examples above. This approach has three major disadvantages:

- A macro based approach requires fixing the contraction at the time of writing - you can't feed the specification to
`Einsum.jl`

at runtime - Often, memory bandwidth is a limiting factor and thus cache-access patterns become really important. Algorithms with that in mind can be superior to the simple loops-approach of
`Einsum.jl`

- The time complexity grows exponentially in the number of indices since the sum is over all valid combinations of the indices.

So for the Summer of Code, we aim to rectify this situation by offering an `einsum`

implementation in julia that is function based and more efficient by dispatching to specialised algorithms depending on the einsum specification.

As a base, we describe a general, function-based implementation in the following based on an `einsum`

fallback implementation in `tensorflow`

.

## Your very own einsum!

The tensorflow inspired implementation we will look look at below can only deal with inputs where no array has duplicate labels such as when we want to take a (generalized-) diagonal of an input or a (partial-) trace.

So first we will do some preprocessing to get rid of those duplicate labels and we'll figure out how to do it from a concrete problem, where we want to go from indices

a = randn(2,2,2) # inds = iji

If the label i appears in the output, this corresponds to a diagonal, if not, to a partial trace of a. But the trace can be seen as a simple two-step process where we

- take the diagonal
- sum over the diagonals index

and the following implementation *can* deal with summing over indices. So we need only concern ourselves with taking diagonals.

For this, we permute the labels of `a`

such that the indices with the same label are adjacent

a2 = permutedims(a, (1,3,2)) #inds = iij

and taking the diagonal is then just slicing with an appropriate vector of `CartesianIndex`

s

a3 = a2[[CartesianIndex(i,i) for i in 1:2],:] #inds = ij [a3[i,:] ≈ a[i,:,i] for i in 1:2]

By going over each input tensor and taking diagonals for duplicate indices, we can get to an input where no tensor has duplicate indices.

Next we need four operations to work:

- Outer products between tensors
- Taking diagonals between tensors
- Reduction over an index
- Tensor contractions between tensors

Let's start with outer products. For two matrices, `a`

and `b`

, the outer product `c`

is a rank-4 tensor which can calculated in the `einsum`

notation as

a = randn(2,2) b = randn(2,2) c[i,j,k,l] := a[i,j] * b[k,l];

If you're familiar with broadcast, you might recognise a broadcasted multiply here. Consider

a2 = zeros(2,2,2,2) b2 = zeros(2,2,2,2) for i in 1:2, j in 1:2 a2[:,:,i,j] = a b2[i,j,:,:] = b end c ≈ a2 .* b2

So we get an outer product with broadcast, by *repeating values* for all non-overlapping indices. Luckily we don't need to do that repeating explicitly since broadcast *virtually repeats values of singleton-dimensions* to make two arrays compatible. For us, this means we can simply insert singelton dimensions into `a`

and `b`

and get

c ≈ reshape(a,2,2,1,1) .* reshape(b,1,1,2,2)

So taking outer products can be achieved by inserting singleton-dimensions into our tensors and doing a broadcasted multiply.

Taking diagonals *between* tensors can be expressed in einsum as

c[i,j,k] := a[i,j] * b[j,k];

Again we might recognize this as a broadcasted multiply except that this time, the indices overlap. Reshaping both `a`

and `b`

to have three indices, labelled i,j and k respectively, and inserting a singleton in the respective missing labels, we get

c ≈ reshape(a,2,2,1) .* reshape(b,1,2,2)

Reshaping and Broadcasting is all we need!

Next we want to reduce over single dimensions, e.g.

d[i,j] := c[i,j,k]

This corresponds to the built-in `sum`

along a specific dimension

d ≈ sum(c,dims=3) size(sum(c,dims=3)), size(d)

although we might want to drop the singelton-dimension that `sum`

leaves at the end using dropdims

d ≈ dropdims(sum(c,dims=3),dims=3)

No problem either! Lastly we want to implement contractions like

c[i,j] := a[i,k] * b[k,j]

But noting that that's the same as first taking the diagonal w.r.t the two k and then summing over it, we recognise this as a sequence of operations we already have: diagonal and index reduction:

c ≈ dropdims(sum(reshape(a,2,2,1) .* reshape(b,1,2,2), dims=2), dims=2)

And thus we have all the ingredients to build ourselves an `einsum`

!

It is slow and uses a lot of memory, but it only requires operations that are implemented for many array types - broadcasted multiply, `sum`

, `reshape`

, `permutedims`

- and is conceptually simple enough. It also works with runtime-instructions which is a good start. A working implementation can be found here whereas the main repository - that might use a different implementation, is here.