Linear Regression
Introduction to Linear Regression
If you are new to ML, then this will be the first algorithm that you will implement. It is simple, without much complexity about hyperparameters and other stuff, plus this is heavily based on linear algebra.
Linear regression is one of the most fundamental algorithms in machine learning. It models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a linear equation to the observed data.
The simplest form of linear regression, with one independent variable, is represented by:
formula not implementedWhere:
y is the dependent variable (what we're trying to predict).
x is the independent variable (our feature).
m is the slope (or weight).
b is the y-intercept (or bias).
When dealing with more features, the equation becomes:
formula not implementedOr in vector notation:
formula not implementedLet's create a dataset
For Linear regression we will create a small dataset with numpy.random library, this will also help us revise some numpy concepts too.
We will take 100 points which are evenly distributed between 0 and 10, after that we will create a noise, which had 5 standard deviation so that we can add that to our dependent variable unless if we create a dataset it, will be inform of a line, so to create it in scatter form we will add some noise.
Now the noise is 5 standard deviation so the Mean Squared at end will also be around 5^2= 25
import numpy as npfrom numpy import randomx = np.random.uniform(0,10,100)noise = np.random.normal(0,5,x.size)y = 5*x +3 + noiseimport matplotlib.pyplot as pltplt.scatter(x, y)This is our dataset, which we will try to approximate with a single line, so the line should closely mimic this dataset.
Split the dataset
Once we train the model, we need some data points to test this on, so we will take 80% of og data and put it in x_train and the other in x_test.
This process should be random, so to make you understand the logic, we will use numpy instead of scikit.learn.
First in order to randomize this process, we will shuffle the order of og dataset, but in same sense for x and y, or it will create new problems all together.
Here, we will take lenght of x datapoints and give it to np.arrange which will make a array of [0,99] in a ascending order, after that we will shuffle that and feed that indices, into x_shuffled and y_shuffled.
indices = np.arange(x.shape[-1])np.random.shuffle(indices)x_shuffled = x[indices]y_shuffled = y[indices]Now we will do 80:20 split using slicing.
splt = int(0.8*len(x))x_train, x_test = x_shuffled[:splt], x_shuffled[splt:]y_train, y_test = y_shuffled[:splt], y_shuffled[splt:]plt.scatter(x_train,y_train)As you can see x_train and y_train looks very muck like the og dataset plot.
Initializing Parameters & Hyper Parameters
Difference between Parameters and Hyper-Parameters
The main way to distinguish between them is if they remain the same throughout training loop or not.
Like Weights and Biases are updated every epoch, thus --> Parameters
But epochs and learning-rate are set before even the training loop begins and remain the same throughout that process --> Hyper-parameters
Weights and Biases
In Linear Regression, we have scalar weights and biases, but in more complex algorithm,s we might have to set their shape accordingly to our network's architecture.
weights = np.random.random()bias = np.random.random()Learning rate and epochs are completely dependent on you, these are things which can only be perfected after certain hit and miss.
learning_rate = 1e-4epochs = 150Start the training process
Let's walkthrough the steps of training.
Calculating the dependent variable from the independent variable.
This is done in a loop which iterates over all `x_{train}` examples and store them in different array `yo` which is empty array of same shape of `x_{train}`.
formula not implementedCalculate the Error: MSE
We generally use Mean Squared Error for finding how far the model is from convergence.
formula not implementedNow we start updating parameters
Derivation of Update Rule of Weights and Bias is simple maths, you can think of this by your own, but if you want here is the proof:


Think of weight as a slope of line, if we misclassify the datapoint below the optimum plane/dividing line, then we will subtract the slop so that that specific datapoint can be classified correctly.
Code
yo =np.empty(x_train.shape[-1]) #refer 5.1.1for i in range(epochs): for j in range(len(x_train)): yo[j] = np.dot(weights,x_train[j]) + bias # refer 5.1.1 mse = np.mean((y_train-yo)**2) # refer 5.1.2 print(mse) for j in range(len(x_train)): # refer 5.1.3 error = y_train[j]-yo[j] weights += np.dot(learning_rate,error) *x_train[j] bias += np.dot(learning_rate,error)Testing Time
We will take x_test and multiply it with weights and add bias to it and store it in yet another array and then we will calculate MSE with y_test.
Making of yet another array
it has to be of same size of x_test
yoo = np.empty(x_test.shape[-1])Calculate the dependent variable with x_test
We will take x_test array and find yoo array.
for i in range(len(x_test)): yoo[i] = weights* x_test[i] +biasFind MSE
Calculate MSE for y_test and yoo.
mse= np.mean((y_test-yoo)**2)print(mse)Here we added the noise of 5 thus we get 25 error by default so original error is 0.3 only.
This concept is related to the bias-variance tradeoff in machine learning, where the total error can be decomposed into:
Bias (how far your model predictions are from the true function)
Variance (how much your model fluctuates for different training sets)
Irreducible error (noise in the data that can't be modeled)
Plot the result
We do this to visualize the line which fit the dataset perfectly
import matplotlib.pyplot as pltplt.scatter(x, y, color='blue', label='Actual')plt.plot(x_test, yoo, color='red', label='Predicted', marker='x')plt.title('Test Predictions vs Actual')plt.xlabel('x')plt.ylabel('y')plt.legend()plt.show()This line almost fits our dataset, thus ends the Linear Regression, you can try to implement the vectorized version of LR by yourself, where take 2D arrays.