Module 2 - Practice - Explore Data
Exercise 0:
In order to successfully do the Module 2 exercises, please download and import the necessary datasets by following these instructions:
1. Download the zip file from this link: https://github.com/priesterkc/Data-Analytics-Lessons/blob/master/Lv1_Lessons/datasets.zip
2. Extract the dataset files from the zip file
After you've extracted the dataset files you may delete the zip file.
3. Drag and drop individual specific files (see file names listed below) as you need them (one file at a time) into this coding notebook just before the code cell that requires said data file.
Nextjournal Skills
How to create a new Python code cell
How to create a new Markdown text cell
How to import files into a Nextjournal document
How to reference an imported file from a Python code cell
Note: If for any reason you have trouble loading in data files, please see this page for more help: https://nextjournal.com/help/uploads-and-results
Exercise 1:
Use the pandas library to read in the file "travel-times.csv" as a dataframe. Set the dataframe's variable name as "travel_df".
Exercise 2:
Use the pandas library to read in the file "income_expenses.xlsx" as a dataframe. Set the dataframe's variable name as "expense_df".
Exercise 3:
Using the lists in the cell below, write code that will zip up the lists and make them into one list, then turn it into a dataframe. Next, export the dataframe as a csv file. Then try exporting the dataframe as an Excel file.
names = ['Nike','Adidas','New Balance','Puma','Reebok']
grades = [176,59,47,38,99]
Exercise 4:
What columns are in the travel_df dataframe? What columns are in the expense_df dataframe?
Exercise 5:
Using the expense_df dataframe, sum the expense amount using the group_by function by income range.
Exercise 6:
Using the travel_df dataframe and pivot_table function, get the average total time by day of the week and direction traveled (Home/GSK).
Exercise 7:
Choose either the travel_df or expense_df and do some exploratory analysis.