Avi Drucker / May 27 2024

Module 7 - Practice - Linear Regression

Exercise 1:

Using the pandas library, in the datasets folder load the gradedata.csv file as a dataframe. Narrow your data (make the dataframe smaller) by choosing columns that you think can help predict student grades. Use any method that you've learned so far to help your decision on which columns to keep.

Exercise 2:

Using the dataframe in the exercise above, clean and prepare your data. This means to eliminate any null (missing) values (either by dropping or filling them) and to transform any data column types to numerical values that a model can interpret. For example, if the column has string values, convert them to integers that best represent their order.

Exercise 3:

Using the cleaned dataframe in the exercise above, use the sklearn library to split the data into training and test datasets. Make the test size 30%.

Exercise 4:

Using the training data from the previous exercise, set a linear regression function to fit the data (build the model).

Exercise 5:

What is the intercept coefficient (y-intercept) for the linear regression model?

Exercise 6:

Use the predict function on the training data and the test data.

Exercise 7:

Calculate the score of the training and test predictions. How "good" was the linear regression model at predicting the test data compared to the training data?

Back to the course outline