JSoC'19: What is Trace Estimation?

Introduction

Hey there! I'm Akshay Jain, and this is my first blog entry in a series of many created as a part of Julia Seasons of Contributions Summer 2019 organized by The Julia Organization in association with NumFOCUS. My JSoC project concerns with developing TraceEstimation.jl, a package containing trace estimation techniques for large sparse matrices (as well as dense matrices, as large as your memory allows!).

What do you mean by Trace Estimation?

Let be a symmetric matrix, we know that is the sum of the elements on the main diagonal of the matrix. Now consider a matrix function , where is the eigen-decomposition of The trace estimation problem consists of computing the trace of This can be trivially calculated as

tr(f(A)) = \sum_i f(\lambda_i)

where are the eigenvalues of Computation of eigen-decomposition to solve the trace estimation problem, albeit trivial, is a computationally expensive task especially for applications dealing with very large matrices. Therefore, in this project I will explore and implement multiple ways to efficiently (and somewhat accurately) estimate the trace of a matrix function.

What applications, again?

Trace estimation finds its uses in many applications such as computational biology and chemistry, solid state physics, machine learning, network theory, structural topology optimization, etc. Let's take a quick look at some of the most prominent trace estimation problems.

Log-determinant: Used in numerous fields including machine learning. The logarithm of the determinant of a given symmetric positive definite matrix is equal to the trace of the logarithm of the matrix.

log~det(A) = tr(log(A)) = \sum log(\lambda_i)

Log-likelihood: Maximum Likelihood Estimation for parameter estimation in high dimensional Gaussian models. Both the terms relating with hyperparameter vector can be calculated via trace estimation

log~p(z|\xi) = -\frac{1}{2}z^TS(\xi)^{-1}z - \frac{1}{2}log~ det~S(\xi)

Trace of matrix inverse: Calculation of is extensively used in Lattice Chromodynamics, structural topology optimization and other related fields.
Some other applications include calculation of Schatten p-norms, Estrada Index, Differential Privacy, Chemical Graph Theory, eigencount and numerical rank.

Approaches to Trace Estimation

Methods used for calculation of can be broadly classified into two categories:

Stochastic Trace Estimators

These methods approximate the trace of a matrix function by means of repeated matrix-vector multiplications. A sequence of random vectors are generated and some eigenvalue estimation methods are employed to estimatethen a Monte Carlo method is used to approximate the result.

tr(f(A)) = \frac{n}{n_{v}}~\sum^{n_{v}}v_i^Tf(A)v_i

Some of the popular stochastic trace estimators are:

Hutchinson trace estimator
Stochastic Lanczos Quadrature and Stochastic GKB Quadrature
Entropic trace estimators (specifically for log-determinant)

Non-stochastic Trace Estimators

These methods employ different techniques such as Chebyshev polynomial expansions, Taylor expansions, interpolation from diagonal of an approximate inverse, etc. These methods are generally developed for a specific problem, hence are not as general purpose as Stochastic estimators.

Chebyshev Polynomial method
Diagonal approximation method
Hierarchical probing method

Stochastic Lanczos Quadrature (SLQ)

During the first three weeks, also known as Community Bonding period (courtesy to Google Summer of Code), I studied and implemented a version of stochastic trace estimator that uses Lanczos method to generate a symmetric tridiagonal matrix from which Ritz-values for the given matrix are calculated and are used to estimate the trace of the given matrix function.

"Talk is cheap, show me the code!" - Linus Torvalds

Usage example

Let's add some of the important ingredients to our notebook!

import Pkg; Pkg.add(PackageSpec(url="https://github.com/luca-aki/traceEstimation.jl", rev="slq"))
using LinearAlgebra, TraceEstimation, BenchmarkTools

3.3s

Julia

For this example, we will just generate a simple symmetric positive-definite dense matrix.

A = rand(5000,5000)
A = A + A'+ 250I
while !isposdef(A)
  A = A + 100I
end

3.5s

Julia

And finally, lets run some quick tests...

@btime println(slq(A, fn = log, eps = 0.05, ctol = 0.1))
@btime println(tr(log(A)))

248.7s

Julia

By tweaking the eps and ctol values one can achieve the required accuracy and precision, respectively. To read more on this you can visit https://github.com/luca-aki/TraceEstimation.jl.

What next?

An implementation of SLQ is just the beginning! There seems to be a lot of potential for research and error bounds improvement in stochastic estimators.

Next up on my implementation list is Chebyshev Polynomial method, along with error bound improvements for SLQ.

Thank you for reading, any and all suggestions are appreciated!