Aluminium / Apr 10 2025 / Published

Linear Regression

Introduction to Linear Regression

If you are new to ML, then this will be the first algorithm that you will implement. It is simple, without much complexity about hyperparameters and other stuff, plus this is heavily based on linear algebra.

Linear regression is one of the most fundamental algorithms in machine learning. It models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a linear equation to the observed data.

The simplest form of linear regression, with one independent variable, is represented by:

formula not implemented

Where:

y is the dependent variable (what we're trying to predict).

x is the independent variable (our feature).

m is the slope (or weight).

b is the y-intercept (or bias).

When dealing with more features, the equation becomes:

formula not implemented

Or in vector notation:

formula not implemented

Let's create a dataset

For Linear regression we will create a small dataset with numpy.random library, this will also help us revise some numpy concepts too.

We will take 100 points which are evenly distributed between 0 and 10, after that we will create a noise, which had 5 standard deviation so that we can add that to our dependent variable unless if we create a dataset it, will be inform of a line, so to create it in scatter form we will add some noise.

Now the noise is 5 standard deviation so the Mean Squared at end will also be around 5^2= 25

import numpy as npfrom numpy import randomx = np.random.uniform(0,10,100)noise = np.random.normal(0,5,x.size)y = 5*x +3 + noise

import matplotlib.pyplot as pltplt.scatter(x, y)

This is our dataset, which we will try to approximate with a single line, so the line should closely mimic this dataset.

Split the dataset

Once we train the model, we need some data points to test this on, so we will take 80% of og data and put it in x_train and the other in x_test.

This process should be random, so to make you understand the logic, we will use numpy instead of scikit.learn.

First in order to randomize this process, we will shuffle the order of og dataset, but in same sense for x and y, or it will create new problems all together.

Here, we will take lenght of x datapoints and give it to np.arrange which will make a array of [0,99] in a ascending order, after that we will shuffle that and feed that indices, into x_shuffled and y_shuffled.

indices = np.arange(x.shape[-1])np.random.shuffle(indices)x_shuffled = x[indices]y_shuffled = y[indices]

Now we will do 80:20 split using slicing.

splt = int(0.8*len(x))x_train, x_test = x_shuffled[:splt], x_shuffled[splt:]y_train, y_test = y_shuffled[:splt], y_shuffled[splt:]

plt.scatter(x_train,y_train)

As you can see x_train and y_train looks very muck like the og dataset plot.

Initializing Parameters & Hyper Parameters

Difference between Parameters and Hyper-Parameters

The main way to distinguish between them is if they remain the same throughout training loop or not.

Like Weights and Biases are updated every epoch, thus --> Parameters

But epochs and learning-rate are set before even the training loop begins and remain the same throughout that process --> Hyper-parameters

https://twitter.com/ayushjluhar/status/1910443458445361561

Weights and Biases

In Linear Regression, we have scalar weights and biases, but in more complex algorithm,s we might have to set their shape accordingly to our network's architecture.

weights = np.random.random()bias = np.random.random()

Learning rate and epochs are completely dependent on you, these are things which can only be perfected after certain hit and miss.

learning_rate = 1e-4epochs = 150

Start the training process

Let's walkthrough the steps of training.

Calculating the dependent variable from the independent variable.

This is done in a loop which iterates over all `x_{train}` examples and store them in different array `yo` which is empty array of same shape of `x_{train}`.

formula not implemented

Calculate the Error: MSE

We generally use Mean Squared Error for finding how far the model is from convergence.

formula not implemented

Now we start updating parameters

Derivation of Update Rule of Weights and Bias is simple maths, you can think of this by your own, but if you want here is the proof:

Think of weight as a slope of line, if we misclassify the datapoint below the optimum plane/dividing line, then we will subtract the slop so that that specific datapoint can be classified correctly.

Code

yo =np.empty(x_train.shape[-1]) #refer 5.1.1

for i in range(epochs):    for j in range(len(x_train)):        yo[j] = np.dot(weights,x_train[j]) + bias # refer 5.1.1    mse = np.mean((y_train-yo)**2) # refer 5.1.2    print(mse)    for j in range(len(x_train)): # refer 5.1.3        error = y_train[j]-yo[j]        weights += np.dot(learning_rate,error) *x_train[j]        bias += np.dot(learning_rate,error)

Testing Time

We will take x_test and multiply it with weights and add bias to it and store it in yet another array and then we will calculate MSE with y_test.

Making of yet another array

it has to be of same size of x_test

yoo = np.empty(x_test.shape[-1])

Calculate the dependent variable with x_test

We will take x_test array and find yoo array.

for i in range(len(x_test)):    yoo[i] = weights* x_test[i] +bias

Find MSE

Calculate MSE for y_test and yoo.

mse= np.mean((y_test-yoo)**2)print(mse)

Here we added the noise of 5 thus we get 25 error by default so original error is 0.3 only.

This concept is related to the bias-variance tradeoff in machine learning, where the total error can be decomposed into:

Bias (how far your model predictions are from the true function)
Variance (how much your model fluctuates for different training sets)
Irreducible error (noise in the data that can't be modeled)

Plot the result

We do this to visualize the line which fit the dataset perfectly

import matplotlib.pyplot as pltplt.scatter(x, y, color='blue', label='Actual')plt.plot(x_test, yoo, color='red', label='Predicted', marker='x')plt.title('Test Predictions vs Actual')plt.xlabel('x')plt.ylabel('y')plt.legend()plt.show()

This line almost fits our dataset, thus ends the Linear Regression, you can try to implement the vectorized version of LR by yourself, where take 2D arrays.

Linear Regression

Introduction to Linear Regression

Let's create a dataset

Split the dataset

Initializing Parameters & Hyper Parameters

Difference between Parameters and Hyper-Parameters

Weights and Biases

Start the training process

Let's walkthrough the steps of training.

Calculating the dependent variable from the independent variable.

Calculate the Error: MSE

Now we start updating parameters

Code

Testing Time

Making of yet another array

Calculate the dependent variable with x_test

Find MSE

Plot the result

Runtimes (1)