How Fast Is R with FastR?

benchmarking ray traced shadows for heightfields

I've been following GraalVM since the first publication so I was naturally very interested in the benchmarks posted in faster-r-with-fastr.

It shows impressive speed ups and claims that GraalVM introduces "A world where R code is as fast as (or even faster than) equivalent C, C++, or Fortran code". 

That calls for reproducing the benchmarks and actually comparing it to a baseline C/C++/Fortran implementation. Nextjournal seems to be the perfect tool to freeze code, data and runtimes into one article and enable all readers to execute the code with the same, reproducible setup! You can simply click on remix to edit this article and run the code (for an account: signup).

Nextjournal already built a FastR runtime in another article, which I was able to update to work with the newest version (sdanisch/fastr). The article sets up the docker image we use here and one can use it in a new R code cell by transcluding sdanisch/fastr into the Environment in the runner menu to the right!

print(sessionInfo())

1.1s

1.
FastR

Let's take the ray tracing code from the original article and benchmark it. I took the freedom to simplify the bilinear implementation to get an extra performance boost:

14.2s

# Code taken from article: http://www.tylermw.com/throwing-shade/
# Author: Tyler Morgan-Wall

# simple bilinear interpolation:
faster_bilinear <- function (Z, x0, y0){
  i = floor(x0)
  j = floor(y0)
  XT = (x0 - i)
  YT = (y0 - j)
  result = (1 - YT) * (1 - XT) * Z[i, j]
  nx = nrow(Z)
  ny = ncol(Z)
  if(i + 1 <= nx){
    result = result + (1-YT) * XT * Z[i + 1, j]
  }
  if(j + 1 <= ny){
    result = result + YT * (1-XT) * Z[i, j + 1]
  }
  if(i + 1 <= nx && j + 1 <= ny){
    result = result + YT * XT * Z[i + 1, j + 1]
  }
  result
}
bench_rays <- function() {
  volcanoshadow = matrix(1, ncol = ncol(volcano), nrow = nrow(volcano))

  volc = list(x=1:nrow(volcano), y=1:ncol(volcano), z=volcano)
  sunangle = 45 / 180*pi
  angle = -90 / 180 * pi
  diffangle = 90 / 180 * pi
  numberangles = 25
  anglebreaks = sapply(seq(angle, diffangle, length.out = numberangles), tan)
  maxdistance = floor(sqrt(ncol(volcano)^2 + nrow(volcano)^2))
  sinsun = sin(sunangle)
  cossun = cos(sunangle)
  for (i in 1:nrow(volcano)) {
    for (j in 1:ncol(volcano)) {
      vij = volcano[i, j]
      for (anglei in anglebreaks) {
        for (k in 1:maxdistance) {
          xcoord = (i + sinsun*k)
          ycoord = (j + cossun*k)
          if(xcoord > nrow(volcano) || 
             ycoord > ncol(volcano) || 
             xcoord < 0 || ycoord < 0) {
            break
          } else {
            tanangheight = vij + anglei * k 
            if (all(c(volcano[ceiling(xcoord), ceiling(ycoord)],
                      volcano[floor(xcoord), ceiling(ycoord)],
                      volcano[ceiling(xcoord), floor(ycoord)],
                      volcano[floor(xcoord), floor(ycoord)]) < tanangheight)) next
            if (tanangheight < faster_bilinear(volcano, xcoord, ycoord)) {
              volcanoshadow[i, j] =  volcanoshadow[i, j] - 1 / length(anglebreaks)
              break
            }
          }
        }
      }
    }
  }
}
min_time = 99999.0
for(i in 1:10){
  start.time <- Sys.time()
  bench_rays()
  end.time <- Sys.time()
  time.taken <- end.time - start.time
  print(sprintf("time taken: %f", time.taken))
  min_time = min(min_time, time.taken)
}
min_time

14.2s

2.
GNU R

Now let's use the default GNU R runtime and execute exactly the same code as above using GNU R.

86.9s

min_time

0.3s

3.
Baseline

I promised a C++/Fortran baseline implementation - I'm sorry to say, that instead I will implement it in Julia. It's much easier and Julia can be expected to offer the same speed as C++ or Fortran. That is no surprise, since Julia is compiled via LLVM which is also used in Clang to compile C++ with state of the art performance. As a matter of fact, Julia has reimplemented quite a few former C/Fortran functions in pure Julia, while preserving performance and even yielding speed ups. You can see that e.g. `sin` is implemented in Julia with  `@which`. Click the link and it will take you to the implementation:

using InteractiveUtils
@which sin(1.0)

1.6s

We can upload files directly to the article, which is the easiest way to package data in a reproducible way:

volcano.csv

I tweaked the Julia implementation a bit, since I rather work with tuples instead of single numbers for x and y. To be sure, I first made a direct port, and verified that the performance is the same.

import Pkg; Pkg.add("BenchmarkTools")

43.6s

5.6s

using Plots, BenchmarkTools, DelimitedFiles
function faster_bilinear(Z::AbstractMatrix{T}, x0, y0) where T
  i = floor(Int, x0)
  j = floor(Int, y0)
  XT = (x0 - i)
  YT = (y0 - j)
  result = (1 - YT) * (1 - XT) * Z[i, j]
  nx, ny = size(Z)
  (i + 1 <= nx) && (result = result + (1-YT) * XT * Z[i + 1, j])
  (j + 1 <= ny) && (result = result + YT * (1-XT) * Z[i, j + 1])
  if i + 1 <= nx && j + 1 <= ny
    result = result + YT * XT * Z[i + 1, j + 1]
  end
  result
end
function shadows(
    Z, shadow = fill(one(eltype(Z)), size(Z)); 
    sunangle = 45 / 180*pi, 
    angle = -90 / 180 * pi, diffangle = 90 / 180 * pi, angles = 25,
    anglebreaks = range(angle, stop = diffangle, length = angles),
    maxdistance = floor(Int, sqrt(sum(size(Z).^2)))
  )
  tan_angles = tan.(anglebreaks)
  sun = sincos(sunangle)
  for I in CartesianIndices(size(Z)) # iterate over all I=(i, j)
    aij = Z[I]; ivec = Tuple(I)
    for tana in tan_angles
      for k in 1:maxdistance
        tanheight = aij + tana * k
        coord = ivec .+ sun .* k
        clow = floor.(Int, coord); chigh = ceil.(Int, coord)
        checkbounds(Bool, Z, clow...) && 
          checkbounds(Bool, Z, chigh...) || break
        if (Z[clow[1], clow[2]] < tanheight || 
            Z[chigh[1], chigh[2]] < tanheight || 
            Z[clow[1], chigh[2]] < tanheight || 
            Z[chigh[1], clow[2]] < tanheight)
          continue
        end
        if tanheight < faster_bilinear(Z, coord...)
          shadow[I] = shadow[I] - 1 / angles
          break
        end
      end
    end
  end
  shadow
end
# just press @ and select the file volcano, to insert a file reference
volcano = readdlm(volcano.csv↩, ',', Float32, header = true)[1][:, 2:end]

5.6s

b = @benchmark shadows($volcano)
julia_time = minimum(b).time / 10^9 # time in nanoseconds

14.5s

heatmap(shadows(volcano), aspect_ratio = 1)

31.9s

Now let's make an animation to see what we've created:

gr(format = :png)
volcano = copy(volcano')
anim = @gif for i in range(0, stop = 2π, length = 60)
  heatmap(shadows(volcano, sunangle = i), aspect_ratio = 1)
end
cp(anim.filename, "/results/output.gif")

41.5s

Finally, we can plot the speed up by referencing the result values from the R cells and plotting it in Julia:

pip install pythran

1.1s

shade_pythran.py

cat shade_pythran.py↩ > shade_pythran.py

0.3s

apt-get update && apt-get install g++ clang

3.8s

printf "[compiler]\nCXX=clang\nCC=clang" > ~/.pythranrc

0.1s

pythran -O3 -march=native -DUSE_BOOST_SIMD shade_pythran.py

8.8s

import numpy as np
import sys
sys.path.append(".")
import shade_pythran

io = open(volcano.csv↩, 'rb')
volcano = np.loadtxt(io, dtype=np.float32, delimiter=',', skiprows=1, usecols=range(1, 62))

result = shade_pythran.bench_rays(volcano)

import matplotlib.pyplot as plt
import plotly.tools as tls
# redefine some plotting methods for Nextjournal support
plt.show = plt.gcf

plt.imshow(result.T, origin='lower')
plt.show()

1.0s

import timeit

setup = """
import shade_pythran
import numpy as np
io = open(NJ__REF0c35941c_3771_4d34_8eca_efe866f1f807_volcano_csv, 'rb')
volcano = np.loadtxt(io, dtype=np.float32, delimiter=',', skiprows=1, usecols=range(1, 62))
"""

code = """
result = shade_pythran.bench_rays(volcano)
"""

timings = timeit.repeat(code, setup, repeat=10, number=1)
print(timings)
min(timings)

1.2s

np.savetxt("/results/pythran.csv", result, delimiter=",")

0.0s

pythran.csv

using Plots
plotly() # use the plotly plotting backend
fastr = nil↩
julia = julia_time
r = nil↩
pythran = nil↩
times = [r, fastr, julia, pythran]
speedup = maximum(times) ./ times
labels = ["R", "FastR", "Julia", "Pythran"]
println("speedup relative to R: ", speedup)
println("speedup relative to FastR: ", fastr ./ times);
bar(labels, speedup, legend = false, fillcolor = :grey)

32.8s

using DelimitedFiles
using Plots
gr(format = :png)
pythrandata = readdlm(pythran.csv↩, ',')
heatmap(pythrandata', aspect_ratio = 1)

8.4s

Shift+Enter to run

How Fast Is R with FastR?

1. FastR

2. GNU R

3. Baseline

1.
FastR

2.
GNU R

3.
Baseline