Avi Drucker / May 27 2024

Module 3 - Practice - Data Manipulation

Exercise 1:

From the datasets folder, load in the "dupedata.csv" file as a dataframe. Drop the duplicates from the dataframe, keeping the first value (save the resulting dataframe to a new variable).

Exercise 2:

Using the dataframe in the previous exercise, select all the rows where students received a grade lower than 60 (they need a teacher conference on how to improve for the next test).

Exercise 3:

Using the dataframe from Exercise 1, select all the rows where a student received a grade of 100 and change their grade to 103 (extra credit!).

Exercise 4:

Load in the "travel_times.csv" file as a dataframe. Drop the "Comments" column. Then remove rows from the dataframe that have missing values and assign the resulting dataframe as a new variable.

Exercise 5:

Using the dataframe from the exercise above (w/ no missing values), create bins that will categorize the AvgSpeed column as "slow" or "fast", and make a new column called "Speed" to hold those new values. Values less than 75 are "slow" and everything above is "fast".

Exercise 6:

Using the dataframe in the previous exercise, make a new column called "Police" which is equal to all the values being "no" (they were never stopped by police for speeding while traveling).

Exercise 7:

Using the dataframe from the previous exercise, pick a method (Standard Deviation or Interquartile Range) and remove the outliers from the "FuelEconomy" column.

Back to the course outline

Appendix

priesterkc/Data-Analytics-Lessons