# Julia summer of einsum

I was selected for the summer of code this year and I'm spending these days working on an einsum implementation in julia and algorithms using this implementation. This blogpost will explain what einsum is, in what form it is available in julia and finally how, with a few building blocks, we might go about implementing the functionality ourselves. If you want to follow the project directly, check out the github repository.

## einsum

einsum is a function popularized by numpy with additional implementations in torch, tensorflow and julia. It enables us to write tensor-contractions in a style derived from the Einstein summation convention from which, presumably, it got its name: Ein-stein sum-ation.

The Einstein summation convention is a rule to simplify the notation for certain tensor operations, namely contractions. A contraction usually refers to a contraction of two indices. Let us start with a rank-2 tensor, that is, a tensor with two indices, more commonly referred to as a matrix just as in julia where we have Matrix == Array{<:Any,2}.

a = randn(2,2)
2×2 Array{Float64,2}: -1.00966 0.0190109 0.306902 -0.557857

Since a has two indices, the only contraction we may do is to contract those two indices with each other. Contracting two indices means to take the sum over equal indices which we can write as

out = zero(Float64)
for i in 1:2
global out += a[i,i]
end
out
-1.56751

We might recognise this as the trace of a. In a more mathematical notation, we can write . The Einstein convention is to drop the summation symbol -- and instead implicitly sum over all indices that appear more than once, i.e. we write .

If we introduce another matrix b as

b = randn(2,2)
2×2 Array{Float64,2}: -1.43338 -0.298578 -0.934664 0.508858

we can contract indices between the two matrices a and b, e.g. the second of a with the first index of b. In the maths notation we'd specify the operation by saying that the output c has elements obeying

which according to the summation convention we can write as . Readers familiar with matrix multiplication might already recognise this as the matrix product of a and b, but to make it more explicit we can construct c

c = zeros(2,2)
for i in 1:2, j in 1:2, k in 1:2
c[i,j] += a[i,k] * b[k,j]
end

and check that it is the same as a * b

c ≈ a * b
true

So now we've seen that both traces and matrix products are tensor contractions and we can specify tensor contractions concisely by labelling indices of matrices and having indices that are contracted with each other labelled the same.

A new notation for trace and matrix multiplication alone would not be too exciting though but contractions can be used for tensors with other than two indices.

Multiplying a matrix a with a vector v can be expressed as

vector dot products between vectors v and w as

and we can contract higher-ranked tensors a and b as they might appear in physics as

einsum is a general interface for tensor contractions with some added operations, but in its original specification it works by providing a list of (input) tensors and their labelled indices and labels for the output tensor.

As an example in numpy, the trace is specified like so

import numpy
a = [[1,2],[3,4]]
numpy.einsum("ii->",a)
5

and a matrix product looks like

a = [[1,2],[3,4]]
b = [[5,6],[7,8]]
numpy.einsum("ik,kj -> ij", a,b) 
array([[19, 2... [43, 50]])

So we specify the index-labels in the Einstein convention and let einsum take care of the rest. On top of the strict contract twice occuring labels that's usual in physics, einsum supports more operations, such as

numpy.einsum("ii -> i", a)
array([1, 4])

You might already see that this is the diagonal of a: It maps a two-index object - a matrix - to a one-index object - a vector- and only takes the values where the two indices of the input are the same!

But we can make it more explicit by writing the loop-version

out = zeros(Int,2)
a = [1 2; 3 4]
for i in 1:2
out[i] += a[i,i]
end
out
2-element Array{Int64,1}: 1 4

The general einsum in numpy specified as

#O = einsum("l_1,l_2,l_3... -> l_O", A1, A2, A3,...)

translates to the sum of products

where (O) is the output, (A_i)are the inputs and the vectors (l_i)are the labels of tensors or . The summation goes over all labels that are in the label-vectors of the inputs but not the output.

With this definition we also get the reduction of tensors along a dimension as specified in

numpy.einsum("ij -> i", a)
array([3, 7])

And this concludes what einsum is: A way to specify certain tensor-operations and a backend to evaluate those specifications.

## Einsum.jl

Pkg.add("Einsum");

In julia, there is an implementation of einsum in the package Einsum.jl. Einsum.jl has a macro-based notation that enables us to write expressions closer to the maths expressions than the python implementations.

Let us look at some examples:

a = randn(2,2)
b = randn(2,2)
using Einsum, LinearAlgebra
c = zeros(2,2)
@einsum c[i,j] = a[i,k] * b[k,j] # matrix multiplication
d = zeros()
@einsum d[] = a[i,i] # trace
e = zeros(2)
@einsum e[i] = a[i,i] # diagonal
f = zeros(2)
@einsum f[i] = a[i,j] # reduction over second index of a
(c ≈ a * b, d[] ≈ tr(a), e ≈ diag(a), f ≈ sum(a,dims=2))
g = randn(2,2,2,2,3)
h = randn(2,2,3)
@einsum gh[i,j,k] := g[i,j,m,n,k] * h[m,n,k] # contraction and diagonal
2×2×3 Array{Float64,3}: [:, :, 1] = -0.0137469 0.284731 0.364231 0.500878 [:, :, 2] = -0.632146 -1.81552 -1.81898 0.40321 [:, :, 3] = -1.2572 -1.1665 1.0898 1.7389

where the last example - contracting g with h - does not have an elementary counterpart in the LinearAlgebra library. But it looks almost like batched matrix multiplication.

Einsum.jl takes the expression placed after the @einsum macro and lowers it to loops as one might write by hand and we did in the examples above. This approach has three major disadvantages:

1. A macro based approach requires fixing the contraction at the time of writing - you can't feed the specification to Einsum.jl at runtime
2. Often, memory bandwidth is a limiting factor and thus cache-access patterns become really important. Algorithms with that in mind can be superior to the simple loops-approach of Einsum.jl
3. The time complexity grows exponentially in the number of indices since the sum is over all valid combinations of the indices.

So for the Summer of Code, we aim to rectify this situation by offering an einsum implementation in julia that is function based and more efficient by dispatching to specialised algorithms depending on the einsum specification.

As a base, we describe a general, function-based implementation in the following based on an einsum fallback implementation in tensorflow.

The tensorflow inspired implementation we will look look at below can only deal with inputs where no array has duplicate labels such as when we want to take a (generalized-) diagonal of an input or a (partial-) trace.

So first we will do some preprocessing to get rid of those duplicate labels and we'll figure out how to do it from a concrete problem, where we want to go from indices to :

a = randn(2,2,2) # inds = iji
2×2×2 Array{Float64,3}: [:, :, 1] = 0.146034 0.157689 0.635864 0.76844 [:, :, 2] = 0.335954 -0.10908 -0.146267 1.51654

If the label i appears in the output, this corresponds to a diagonal, if not, to a partial trace of a. But the trace can be seen as a simple two-step process where we

1. take the diagonal
2. sum over the diagonals index

and the following implementation can deal with summing over indices. So we need only concern ourselves with taking diagonals.

For this, we permute the labels of a such that the indices with the same label are adjacent

a2 = permutedims(a, (1,3,2)) #inds = iij
2×2×2 Array{Float64,3}: [:, :, 1] = 0.146034 0.335954 0.635864 -0.146267 [:, :, 2] = 0.157689 -0.10908 0.76844 1.51654

and taking the diagonal is then just slicing with an appropriate vector of CartesianIndexs

a3 = a2[[CartesianIndex(i,i) for i in 1:2],:] #inds = ij
[a3[i,:] ≈ a[i,:,i] for i in 1:2]
2-element Array{Bool,1}: true true

By going over each input tensor and taking diagonals for duplicate indices, we can get to an input where no tensor has duplicate indices.

Next we need four operations to work:

1. Outer products between tensors
2. Taking diagonals between tensors
3. Reduction over an index
4. Tensor contractions between tensors

Let's start with outer products. For two matrices, a and b, the outer product c is a rank-4 tensor which can calculated in the einsum notation as

a = randn(2,2)
b = randn(2,2)
@einsum c[i,j,k,l] := a[i,j] * b[k,l];

If you're familiar with broadcast, you might recognise a broadcasted multiply here. Consider

a2 = zeros(2,2,2,2)
b2 = zeros(2,2,2,2)
for i in 1:2, j in 1:2
a2[:,:,i,j] = a
b2[i,j,:,:] = b
end
c ≈ a2 .* b2
true

So we get an outer product with broadcast, by repeating values for all non-overlapping indices. Luckily we don't need to do that repeating explicitly since broadcast virtually repeats values of singleton-dimensions to make two arrays compatible. For us, this means we can simply insert singelton dimensions into a and b and get

c ≈ reshape(a,2,2,1,1) .* reshape(b,1,1,2,2)
true

So taking outer products can be achieved by inserting singleton-dimensions into our tensors and doing a broadcasted multiply.

Taking diagonals between tensors can be expressed in einsum as

@einsum c[i,j,k] := a[i,j] * b[j,k];

Again we might recognize this as a broadcasted multiply except that this time, the indices overlap. Reshaping both a and b to have three indices, labelled i,j and k respectively, and inserting a singleton in the respective missing labels, we get

c ≈ reshape(a,2,2,1) .* reshape(b,1,2,2)
true

Reshaping and Broadcasting is all we need!

Next we want to reduce over single dimensions, e.g.

@einsum d[i,j] := c[i,j,k]
2×2 Array{Float64,2}: -0.139832 -1.49227 0.0338151 0.214557

This corresponds to the built-in sum along a specific dimension

@show d ≈ sum(c,dims=3)
size(sum(c,dims=3)), size(d)
((2, 2, 1), (2, 2))

although we might want to drop the singelton-dimension that sum leaves at the end using dropdims

d ≈ dropdims(sum(c,dims=3),dims=3)
true

No problem either! Lastly we want to implement contractions like

@einsum c[i,j] := a[i,k] * b[k,j]
2×2 Array{Float64,2}: -0.795885 -0.836216 0.202323 0.0460494

But noting that that's the same as first taking the diagonal w.r.t the two k and then summing over it, we recognise this as a sequence of operations we already have: diagonal and index reduction:

c ≈ dropdims(sum(reshape(a,2,2,1) .* reshape(b,1,2,2), dims=2), dims=2)
true

And thus we have all the ingredients to build ourselves an einsum!

It is slow and uses a lot of memory, but it only requires operations that are implemented for many array types - broadcasted multiply, sum, reshape, permutedims - and is conceptually simple enough. It also works with runtime-instructions which is a good start. A working implementation can be found here whereas the main repository - that might use a different implementation, is here.