Reflections on Math, Tensors and Programming

This is not an explanation of Tensors if you are interested in that but rather a reflection upon the process of learning about mathematical concepts such as Tensors as a programmer with limited mathematical background.

Over the last months I have on and off been learning about Deep Learning, which quickly leads one down in the rabbit hole to the strange world of abstract mathematics.

"Alice in Wonderland," written by a mathematician, Lewis Carroll, gives a hint at some of the problems of grasping abstract math. This is about a song, "Haddocks Eyes" sung by the white knight:

You are sad,’ the Knight said in an anxious tone: ‘let me sing you a song to comfort you.’

‘Is it very long?’ Alice asked, for she had heard a good deal of poetry that day.

‘It’s long,’ said the Knight, ‘but very, VERY beautiful. Everybody that hears me sing it--either it brings the TEARS into their eyes, or else--’

‘Or else what?’ said Alice, for the Knight had made a sudden pause.

‘Or else it doesn’t, you know. The name of the song is called “HADDOCKS’ EYES.”’

‘Oh, that’s the name of the song, is it?’ Alice said, trying to feel interested.

‘No, you don’t understand,’ the Knight said, looking a little vexed. ‘That’s what the name is CALLED. The name really IS “THE AGED AGED MAN.”’

‘Then I ought to have said “That’s what the SONG is called”?’ Alice corrected herself.

‘No, you oughtn’t: that’s quite another thing! The SONG is called “WAYS AND MEANS”: but that’s only what it’s CALLED, you know!’

‘Well, what IS the song, then?’ said Alice, who was by this time completely bewildered.

‘I was coming to that,’ the Knight said. ‘The song really IS “A-SITTING ON A GATE”: and the tune’s my own invention.’

We can summarize the confusion about the song in this way:

  • The song's name is called Haddocks' Eyes

  • The song's name is The Aged Aged Man

  • The song is called Ways and Means

  • The song is A-sitting on a Gate

This is the kind of problems you stumble on as a poor programmer wanting to familiarize them with machine learning gets into. They discover say TensorFlow and immediately ask or Google "What is a Tensor?"

This question will easily cause you a lot of pain. It is the equivalent to getting into functional programming and asking what a Monad is.

So a Tensor is a Matrix?

The typical early mistake you will make on your quest to understand Tensors is that you ask somebody to show you a Tensor and you see something that to you looks like a Matrix.

And so you go "ah... so a tensor is just a matrix?" Beep!! Wrong!

The answer you get after this point typically just make you more confuse than ever.

a tensor is an algebraic object that describes a (multilinear) relationship between sets of algebraic objects related to a vector space.

This is a common human habit to focus on what things are, what they look like, we naturally latch on to how something is represented.

So before we get back to discussing Tensors I want to detour and talk more about how to think about objects both in programming and mathematics.

What Something is and Its Representation are not the Same

III, 3 and three are all different representations of the number 3. We should not mistake either representation for actually "being" that number.

Likewise one and the same representation could represent entirely different things. A could mean the letter 'A' or it could mean the hexadecimal digit 0xA corresponding to 10 in the decimal system. Likewise 65 could represent the number 65 or the letter 'A' in ASCII or UTF-8 encoding.

We could look at more complex representations. A list of two numbers (2,3)(2, 3) could in principle refer to a point, a vector, a tuple or a set.

To not mix the two up we often use slightly different notation. A set of two points would often be written {2,3}\{2, 3\}. We usually write a vector as [23]\left [\begin{matrix}2\\ 3 \end{matrix}\right]. However because that is often cumbersome in the middle of a text it is also possible to write it as (2,3)(2, 3). Both are representation of a column-vector. This should not be confused with [23][2 \enskip 3] which usually denotes a 1 ×\times 2 row matrix.

The point is that how something is written does not uniquely define it (2,3)(2, 3) could mean a tuple, a point or a column vector. They are all written the same way.

You cannot tell by merely looking at tuple (2,3)(2, 3) and the vector (2,3)(2, 3) what the difference is.

This can be a source of confusion for a programmer, because as programmers we are not used to this kind of ambiguity. In code the vector may have been written Vector(2, 3) and the tuple Tuple(2, 3). Of course many languages have specific literal syntax for very common objects. E.g. in Julia a column-vector would be written [2, 3] and a tuple would be written (2, 3). The point however is that there is a unique syntax for each type of object.

In mathematics in contrast you get to know the type through context. The author will in the text point out what sort of objects we are working with.

Type Hierarchies and Interfaces

In programming each object is of a particular type. This type may be a subtype of another concrete or abstract super-type. The type may also conform to several different interfaces, protocols or abstract types. The terminology used will vary depending on the language used.

We can consider the Julia programming language. An Array is a concrete subtype of AbstractArray. However AbstractArray types also implement an iterator interface, so that its elements can be iterated over. That something is iteratable is separate from it being an array. Things which are not arrays can also be iterated over.

In Julia every type is a subtype of Any. And types such as Array are also actual objects. The type of the Array type object is DataType. DataType is also a subtype of Any. This kind of arrangement is not unusual. In Java or Smalltalk e.g. the class Object is at the top of the type hierarchy.

We find very similar kinds of arrangement within mathematics. Everything is a mathematical object. Instead of speaking of types we speak of mathematical structures. Mathematical structures exist in a hierarchy similar to a type-hierarchy in programming. E.g. under mathematical structures we have algebraic structures. This is subdivided further into other structures. Further down the hierarchy we have vector spaces.

Relating Mathematical Structures to Programming Types

To make sense of this we can compare to the Vector type in the Julia programming language. Vector is a parameterized type, meaning we must specify the type of its elements (components in math-speak).

v = Vector{Float64}()
1.0s
0-element Array{Float64,1}
push!(v, 3, 4)
0.6s
2-element Array{Float64,1}: 3.0 4.0
u = [3.0, 4.0]
0.2s
2-element Array{Float64,1}: 3.0 4.0

In this case I have create the vectors v and u in different but equivalent ways. v as first created empty and then I added two elements 3 and 4 to it.

Formally we can say that v is an object the type Vector with type parameter Float64.

A mathematician could express something similar by saying v=(3,4)v = (3, 4) is a mathematical object belonging to the vector space of vectors VV over the field FF where F=RF =\mathbb{R}.

Let us try to unpack this. In programming-speak the type of vv is a vector space. However in mathematics you cannot simply say it is a vector space anymore than you can say something is a Vector in Julia. You need to specify the type parameter. In mathematics we do that by saying it is a vector space over the field R\mathbb{R} e.g. In mathematics R\mathbb{R} is the set of all real numbers. Float64 is a concrete subset of Julia's abstract Real type, so there is some similarity in our definitions.

Fields

In Julia the type parameter of a Vector does not need to be a number. Nor does it have to be for a vector space. That is why we are not saying:

vector space over number FF

That is why we are saying field instead. So what is a field? Math is heavy on abstractions. A field is a type of mathematical object that allows addition, subtraction, multiplication and division according to specific rules.

In Julia notation it would be some type T which has the following operations defined:

+(x::T, y::T) where T <: Field
-(x::T, y::T) where T <: Field
*(x::T, y::T) where T <: Field
/(x::T, y::T) where T <: Field
Julia

Of course in mathematics everything is far more pedantic. In programming we make quite a lot of assumptions which must be spelled out in mathematics. Let us pick a couple of examples from wikipedia to show how accurate these descriptions must be:

If aa and bb are in the field then:

  • Associativity: a+(b+c)=(a+b)+ca + (b + c) = (a + b) + c and a(bc)=(ab)ca \cdot (b \cdot c) = (a \cdot b) \cdot c

  • Cummulativity: a+b=b+aa + b = b +a and ab=baa \cdot b = b \cdot a

  • etc

Because everything in the field is of the same type, we don't have to specify it, but if we had. The we could write that addition is defined as:

This basically says that addition takes two values aa and bb and produce a new value cc. It says that ++ takes two arguments which each are in the field FF and produce a value in the field FF.

So F×FF \times F means an operation takes two arguments of the same type FF. F×F×FF \times F \times F would mean three arguments and so on. We can shorten that to F3F^3. In cases where we want to be specific we can say that we are taking say 3 real values and producing a real value with R3R\mathbb{R}^3 \rightarrow \mathbb{R}

Formal Specification of a Vector Space

In programming we may define a Vector interface and list method supported or something similar. We have a similar but more pedantic way of specifying the same in mathematics. A Vector space is defined as the set:

Where VV is a set of vectors, FF a set of scalars and ++ and \cdot the operations which must be defined/supported for the elements of the vector space. If we where to attempt to frame this in programming language syntax it may look something like this in Julia pseudo-code syntax:

abstract type VectorSpace{V, F} where V <: Array{F}, F <: Field
   +(u::V, v::V)
   *(k::F, v::V)
end
Julia

This is not really valid Julia code as it is hard to really express in code what is expressed with mathematical notation. It is just trying to get across that VV and FF are similar to type parameters in programming, with some constraints. VV has to be some kind of container containing components of type FF where FF has to be some kind of field. A real number is a field while an integer e.g. isn't.

We also try to get across that a vector space is something quite abstract. You don't see a definition of member variables stored. It is all about its behavior. What kind of operations are supported. And it is not like object-oriented programming where these operations are attached to one particular type. It is more like perhaps a module or namespace. It is a collection of functionality and relationships which together forms the vector space.

A vector space must as a minimum it must support addition and scalar multiplication. If addition in a vector space looks like w=u+v\mathbf{w} = \mathbf{u} + \mathbf{v} We need to actually say something about types involved with:

With vector spaces it gets a bit more interesting than with fields, when we get to multiplication. This is defined for a scalar and a vector. v=cu\mathbf{v} = c\mathbf{u}, where cc is our scalar. This if formally expressed as:

So F×VF \times V tells us that the multiplication operator \cdot, takes two arguments: one which is our field FF with values such as cc and the other VV which is a set representing our vectors, such as u\mathbf{u}, v\mathbf{v} and w\mathbf{w}.

You can keep adding functionality to define new subspaces. It must be kept in mind that a vector space is not like a concrete type in programming, but far more like an interface. Lots of things you would not think of as vectors can form vector spaces such as functions, because you can add two functions and multiply a function with a scalar.

Matrix and Tensors

You can form vector spaces out of matrices as well. However not every matrix combination forms a vector space. E.g. You cannot add a 2×32 \times 3 matrix with a 2×22 \times 2 matrix. In programming speak you could say these two matrices together don't adhere to the vector space interface, because they don't support addition.

However the set of m×nm \times n matrices form a vector space because it supports addition and multiplication with a scalar. Here is an example of proof of this.

In fact tensors are also vector spaces. However pointing that out is not all that interesting.

Nor is it interesting to say both a vectors and matrices can form vector spaces, since so little is required to be a vector space. Matrices support a bunch of other operations which distinguishes them from mere vectors.

E.g. you can multiply matrices, which you cannot do with vectors.

With tensors comes more abilities or axioms if you will that don't exit for regular matrices. And again let me hammer in yet again. A representation of something is not the object.

You can represent a tensor as a matrix but that does not really make it a matrix.

Linear and Multilinear Transformations

You can think of a matrix as something which happens to be useful in defining a linear transformation. A linear transformation is just a fancy way of saying function which takes a vector as input and spits out a vector as output. It is just a word to distinguish it from a regular function which takes a scalar and spits out a scalar.

So if TT is our transformation it is something like this

It takes a vector v\mathbf{v} and spits out another vector u\mathbf{u}. It so happens that we figured out that you can accomplish such linear transformations with matrix multiplication. Thus there exists some matrix A\mathbf{A} which accomplishes the same as TT. So we can write:

Tensors is an analogy to this for multilinear transformations. Again this is just a fancy way of saying a function which takes multiple vectors as inputs and spits out a vector as output. So something like this would qualify:

A tensor is just a multidimensional array of numbers which allow us to perform this kind of transformation. Frequently this will look like a matrix to you. For tensors I believe w\mathbf{w} will usually be a scalar such as a real number. This is possible because reals can also be vector spaces. A real can be a vector space onto itself. The vectors in VV just all have single components which are fields FF.

Final Remarks

These where just reflections upon how mathematicians define things in relation to how us software developers think about things. I have not actually shown any practical use of a tensor. My motivation was to write a piece for somebody who wants to watch a video of read about tensors but who find it hard to deal with the math heavy language.

I have tried here to relate that language to concepts from programming.

Runtimes (1)