Module 2 - Practice - Explore Data

Back to the course outline

Exercise 0:

In order to successfully do the Module 2 exercises, please download and import the necessary datasets by following these instructions:

After you've extracted the dataset files you may delete the zip file.

  • 3. Drag and drop individual specific files (see file names listed below) as you need them (one file at a time) into this coding notebook just before the code cell that requires said data file.

Nextjournal Skills

  • How to create a new Python code cell

  • How to create a new Markdown text cell

  • How to import files into a Nextjournal document

  • How to reference an imported file from a Python code cell

Note: If for any reason you have trouble loading in data files, please see this page for more help: https://nextjournal.com/help/uploads-and-results

Exercise 1:

Use the pandas library to read in the file "travel-times.csv" as a dataframe. Set the dataframe's variable name as "travel_df".

Exercise 2:

Use the pandas library to read in the file "income_expenses.xlsx" as a dataframe. Set the dataframe's variable name as "expense_df".

Exercise 3:

Using the lists in the cell below, write code that will zip up the lists and make them into one list, then turn it into a dataframe. Next, export the dataframe as a csv file. Then try exporting the dataframe as an Excel file.

names = ['Nike','Adidas','New Balance','Puma','Reebok']
grades = [176,59,47,38,99]

Exercise 4:

What columns are in the travel_df dataframe? What columns are in the expense_df dataframe?

Exercise 5:

Using the expense_df dataframe, sum the expense amount using the group_by function by income range.

Exercise 6:

Using the travel_df dataframe and pivot_table function, get the average total time by day of the week and direction traveled (Home/GSK).

Exercise 7:

Choose either the travel_df or expense_df and do some exploratory analysis.

Back to the course outline

Appendix

Runtimes (1)