Linear Regression in 3 Forms
Linear regression can be implemented a few different ways that can lead to the trade-off of accuracy versus speed. Rereading this post by Steven G. Johnson on the Julia Discourse site will give you summary
Here is a screenshot if you don't want to link out.
Well and ill conditioned
For reference to what conditioning is and what well and ill condition mean see this wikipedia article. If you have access to the book Numerical Linear Algebra, Trefethen and Bau, you'll find section III useful (also in the screenshot Johnson refers to this book in terms of QR factorization as the basis for how Julia handles A \ b
, which can be found in section II).
Some Code
The three forms of linear regression is a good thing to remember if you are implementing your own function. Julia makes this easy. Here are the the three forms.
using LinearAlgebra, BenchmarkTools
#set up some data to regress on
N = 10000
xn = rand(N);
Xp = [ones(N) xn];
yp = (10 .+ xn .* 0.3);
(sum(Xp), sum(yp))
Fast but not accurate
# normal equations approach
#fastest but least accurate; doubles the number of digits you lose to rounding.
#only use when you know you have well-conditioned matrices
function linreg1(X, y)
#β_hat
return (X' * X) \ (X' * y)
end
Not the fastest and not the least accurate
#QR factorization
#slower than linreg1 but uses pivoted QR factorization.
#more accurate for badly conditioned matrices than "normal equations" approach
#it does not square the condition number
function linreg2(X, y)
#β_hat
return X \ y
end
Slow but accurate
#SVD approach
#uses SVD to apply psuedo-inverse. the slowest of the linreg* approaches
#most accurate of linreg*
function linreg3(X, y)
#β_hat
return pinv(X) * y
end
Benchmarks
println("linreg1: ", linreg1(Xp, yp))
linreg1(Xp, yp)
#show that we are not modifying the inputs
(sum(Xp), sum(yp))
println("linreg2: ", linreg2(Xp, yp))
linreg2(Xp, yp)
#show that we are not modifying the inputs
(sum(Xp), sum(yp))
println("linreg3: ", linreg3(Xp, yp))
linreg3(Xp, yp)
#show that we are not modifying the inputs
(sum(Xp), sum(yp))