Master [Python]
What's Python?
The programming language Python was created in 1989 by Guido van Rossum in the Netherlands. The name Python comes from a tribute to the TV series Monty Python's Flying Circus, of which G. van Rossum is a fan.
The latest version of Python is version 3. Version 2 of Python is now obsolete and has not been maintained since 1 January 2020. If possible, avoid using it.
This programming language has many interesting features:
It is cross-platform. That is to say that it works on many operating systems: Windows, Mac OS X, Linux, Android.
It is free of charge. You can install it on as many computers as you want (even on your phone!).
It is a high-level language. It requires relatively little knowledge about how a computer works to be used.
It is an interpreted language. A Python script does not need to be compiled to run, unlike languages like C or C++.
It is object-oriented. That is, it is possible to design entities in Python that mimic real-world entities (a cell, a protein, an atom, etc.) with a number of rules of operation and interaction.
It is relatively simple to use.
Finally, it is widely used in bioinformatics and more generally in data analysis.
All these characteristics mean that Python is now taught in many courses, from secondary to higher education.
1. Data types, data structures and indexing
1.1 Basics
Object assignment, functions, how to comment and get help
x
A variable is an area of computer memory in which a value is stored. For the programmer, this variable is defined by a name, whereas for the computer it is actually an address, i.e. a particular area of memory.
In Python, the declaration of a variable and its initialization (i.e. the first value that will be stored in it) are done at the same time. To convince yourself of this, test the following instructions:
x = 2
x
Line one. In this example, we have declared and then initialized the variable x with the value 2. Note that in reality, several things happened:
Python "guessed" that the variable was an integer. It is said that Python is a dynamically typed language.
Python has allocated (reserved) memory space to accommodate an integer. Each type of variable takes up more or less memory space. Python has also made it possible to retrieve the variable under the name x.
Finally, Python has assigned the value 2 to variable x.
Note that the assignment operator = is used in a certain sense. For example, the statement x = 2 means that the variable on the left (here, x) is assigned the value to the right of the = operator (here, 2).
x + 2
x * 3 # We can use Python as a calculator!
max([1, 2, 3]) # Built-in functions can help execute things; we usually have to provide arguments
help(max) # Don't know how a function works^ Ask for help!
1.2 Data types
The type of a variable corresponds to the nature of the variable. The three main types that we will need first are integers (int), decimal numbers that we will call floats and strings (str). Of course, there are many other types (for example, booleans, complex numbers, etc.).
Numeric: integer and float
The four basic arithmetic operations are done in a simple way on numeric types (integers and floats):
my_int = 1
my_float = 2.34
my_int + my_float
Note, however, that if you mix integer and float types, the result is returned as a float (because this type is more general).
The operator / performs a division. Unlike the operators +, - and *, this one always returns a float
String
For strings, two operations are possible, addition and multiplication :
my_str = "Hello World!"
my_str
my_str + " Python"
my_str * 3
The addition operator + concatenates (assembles) two strings.
The multiplication operator * between an integer and a character string duplicates (repeats) a character string several times.
Boolean
In programming you often need to know if an expression is True
or False
.
You can evaluate any expression in Python, and get one of two answers, True
or False
.
When you compare two values, the expression is evaluated and Python returns the Boolean answer:
is_french = False
is_english = True
is_french == is_english
Not sure?
If you can't remember the type of a variable, use the type() function which will remind you of it
print (type(my_int))
print (type(my_float))
print (type(is_english))
1.3 Collections
Lists
Lists are ordered collections of changeable elements
A list is a data structure that contains a series of values. Python allows the construction of lists containing values of different types (for example, integer and string), which gives them a great deal of flexibility. A list is declared by a series of values (don't forget quotation marks, single or double, if they are strings) separated by commas, and enclosed in square brackets. Here is an example:
my_list = ['Python', 3.7, 5]
Can we add element to a list?
my_list = my_list + [True]
my_list
Like strings, lists support the + operator for concatenation, as well as the * operator for duplication.
List inception: list of lists
Finally, you should know that it is quite possible to build lists of lists. This feature can sometimes be very practical. For example:
my_mega_list = [['Python', 'R', 'Julia'], [3.7, 3.6, 1.3]]
Lists of lists support the operator + of concatenation
my_mega_list + [True, False, False]
my_mega_list + [[True, False, False, False]]
Dictionary
Dictionaries are unordered collections of changeable and indexed elements
Dictionaries come in very handy when you have to manipulate complex structures to describe and lists have their limitations. Dictionaries are unordered collections of objects, i.e. there is no notion of order (i.e. no index). Dictionary values are accessed by keys.
my_dict = {'Python' : ['list', 'dict', 'df'], 'R' : ['array', 'matrix', 'df']}
First, we define an empty dictionary with the braces {}
(just as one can do for lists with []
). Then, we fill the dictionary with different keys ("Python"
, "R"
) to which we assign values (['list', 'dict', 'df']
, ['array', 'matrix', 'df']
). You can put as many keys as you want in a dictionary (just as you can add as many elements as you want in a list).
my_dict.keys() # Dictionaries are made of keys...
my_dict.values() # ... and values
DataFrame
DataFrames are a data structure containing labeled axes
Pandas is a popular Python package for data science, and with good reason: it offers powerful, expressive and flexible data structures that make data manipulation and analysis easy, among many other things. The DataFrame is one of these structures.
import pandas as pd # DataFrame are not a built-in type of Python; we must import a Python library to use them
Obviously, making your DataFrames is your first step in almost anything that you want to do when it comes to data munging in Python. Sometimes, you will want to start from scratch, but you can also convert other data structures, such as lists of lists:
# DF can be build from a list of list
pd.DataFrame(my_mega_list, index=['Language', 'Version'], columns=range(1, 4))
1 | 2 | 3 | |
---|---|---|---|
Language | Python | R | Julia |
Version | 3.7 | 3.6 | 1.3 |
# DF can be build from a dictionary
my_df = pd.DataFrame(my_dict)
my_df
Python | R | |
---|---|---|
0 | list | array |
1 | dict | matrix |
2 | df | df |
1.4 Dimensions
The len()
statement lets you know the length of a list, i.e. the number of items in the list.
len(my_list)
For dictionaries, this function will give you the number of keys:
len(my_dict)
After you have created your DataFrame, you might want to know a little bit more about it. You can use the shape
property. The shape
property will provide you with the dimensions of your DataFrame. That means that you will get to know the width and the height of your DataFrame. :
my_df.shape #(row,col)
1.5 Indexing
Accessing elements of a list
One of the big advantages of a list is that you can call its items by their position. This number is called the index of the list.
Pay close attention to the fact that the clues in a list of n items starts at 0 and ends at n-1.
my_list[0]
The list can also be indexed with negative numbers according to the following pattern:
my_list = ['Python', 3.7, 5]
#indice positif : 0 1 2
#indice négatif : -3 -2 -1
Negative indices are counted from the end. Their main advantage is that you can access the last item in a list using index -1
without knowing the length of the list.
my_list[-1]
my_mega_list[0] #To access an item in the list, use the usual indices
my_mega_list[0][2] #To access an item in the sub-list, use a double indices
Looking up values for a specific key
To retrieve the value associated with a given key, just use the following syntax dictionary[key]
my_dict['Python']
Getting en entire column of a DF
my_df['Python']
0 | list |
---|---|
1 | dict |
2 | df |
Int and Label indexing of DF
.iloc[]
works on the positions in your index. This means that if you give in iloc[2]
, you look for the values of your DataFrame that are at index ’2`.
# integer indexing [row, col]
my_df.iloc[0, 1]
.loc[]
works on labels of your index. This means that if you give in loc[2]
, you look for the values of your DataFrame that have an index labeled 2
.
# label indexing [row, col]
my_df.loc[1, 'R']
1.6 Slicing
Another advantage of lists is the possibility to select a part of a list by using an indication built on the [m:n+1]
model to retrieve all the elements, from the m
element included to the n+1
element excluded. We then say that we retrieve a slice of the list
my_nums = [1, 2, 3, 4, 5, 6, 7, 8, 9]
my_nums[2:6]
Note that when no hint is given to the left or right of the colon symbol, Python defaults to take all elements from the beginning or all elements to the end respectively.
my_nums[:6]
my_nums[6:]
Tuples correspond to lists with the difference that they are not modifiable. They use parentheses instead of square brackets.
my_tuple = zip(['a', 'B', 'C', 'd', 'E', 'f'], [False, True, True, False, True, False])
my_alpha = pd.DataFrame(my_tuple, columns=['Letter', 'isCap'], index=range(1, 7))
my_alpha
Letter | isCap | |
---|---|---|
1 | a | False |
2 | B | True |
3 | C | True |
4 | d | False |
5 | E | True |
6 | f | False |
### PRO TIP
# Select rows with capital letters only
my_alpha.loc[my_alpha.isCap == True, ]
Letter | isCap | |
---|---|---|
2 | B | True |
3 | C | True |
5 | E | True |
2. Files
Absolute path
C:/Users/RonBumblefootThal/Documents/pythonFolder/MyFirstProject/Draft/IDon'tKnowWhatI'mDoing/etc.py
Relative path
~/I_love_my_project/CoolCode.py
2.1 Working with directories
The OS module in python provides functions for interacting with the operating system. OS, comes under Python’s standard utility modules. This module provides a portable way of using operating system dependent functionality.
import os
listdir() : To print files and directories in the current directory on your system
os.listdir()
Python method chdir() changes the current working directory to the given path
os.chdir("data-trek-2020")
os.listdir()
2.2 Save/write files
DF example
soa_tour = pd.DataFrame()
soa_tour['country'] = ['USA', 'UK', 'FRA', 'GER', 'BRA']
soa_tour['frequency'] = [39, 6, 6, 5, 3]
soa_tour['continents'] = ['north_america', 'europe', 'europe', 'europe', 'south_america']
soa_tour
country | frequency | continents | |
---|---|---|---|
0 | USA | 39 | north_america |
1 | UK | 6 | europe |
2 | FRA | 6 | europe |
3 | GER | 5 | europe |
4 | BRA | 3 | south_america |
Write object to a comma-separated values (csv) file.
header: Write out the column names.
# df_name.to_csv(path, header)
soa_tour.to_csv('data/clean/soa_tour_python.csv', header=True)
os.listdir('data/clean')
2.3 Load/read files
From your PC
The addition operator + concatenates (assembles) two strings.
path = 'data/clean/'
filename = 'soa_tour_python.csv'
path + filename
read_csv
: Read a comma-separated values (csv) file into DataFrame.
soa_tour_fl = pd.read_csv(path+filename)
soa_tour_fl
Unnamed: 0 | country | frequency | continents | |
---|---|---|---|---|
0 | 0 | USA | 39 | north_america |
1 | 1 | UK | 6 | europe |
2 | 2 | FRA | 6 | europe |
3 | 3 | GER | 5 | europe |
4 | 4 | BRA | 3 | south_america |
index_col: Column(s) to use as the row labels of the DataFrame
, either given as string name or column index.
soa_tour_fl = pd.read_csv(path+filename, index_col=0)
soa_tour_fl
country | frequency | continents | |
---|---|---|---|
0 | USA | 39 | north_america |
1 | UK | 6 | europe |
2 | FRA | 6 | europe |
3 | GER | 5 | europe |
4 | BRA | 3 | south_america |
From URL
url = "http://sciencecomputing.io/data/metabolicrates.csv"
metabolic_rates = pd.read_csv(url)
metabolic_rates.head() #This function returns the first n rows for the object based on position, default n=5
Class | Order | Family | Genus | Species | Study | M (kg) | FMR (kJ / day) | |
---|---|---|---|---|---|---|---|---|
0 | Mammalia | Carnivora | Odobenidae | Odobenus | rosmarus | Acquarone et al 2006 | 1370.0 | 345000.0 |
1 | Mammalia | Carnivora | Odobenidae | Odobenus | rosmarus | Acquarone et al 2006 | 1250.0 | 417400.0 |
2 | Aves | Procellariiformes | Diomedeidae | Diomedea | exulans | Adams et al 1986 | 7.4 | 3100.0 |
3 | Aves | Procellariiformes | Diomedeidae | Diomedea | exulans | Adams et al 1986 | 6.95 | 2898.0 |
4 | Aves | Procellariiformes | Diomedeidae | Diomedea | exulans | Adams et al 1986 | 8.9 | 3528.0 |
Play around a little bit with this big DF... what can you do?
3. Control Flow
You already apply control flow when you decide how to go to work during winter. Simply put, control flow determines the order in which the code is executed or evaluated. It allows defining different actions for different conditions.
For example:
You take the metro if it's snowing
You take the metro if it's cold
You walk every other time
Let's code it now ;)
3.1 Conditional evaluation
Simple if
statement
Structure:
if condition :
do something
Example:
weather = 'snowing'
if weather == 'snowing' :
print ('Take the metro')
weather = 'blue sky'
if weather == 'snowing' :
print ('Take the metro')
If/elif/else
statement
Situations are not always related to two conditions (or two possible results). You may encounter situations where multiple conditions and associated results need to be considered.
Structure:
if condition1 :
do something
elif condition2 :
do another thing
else :
do something else
Example:
if weather == 'snowing' :
print ('Take the metro')
elif weather == 'cold' :
print ('Take the metro')
else :
print ("Let's walk!")
Nested statements
You can even test conditions after a first condition was tested. Don't forget the indentation!
Structure:
if condition1 :
do something
else :
if conditionA :
do another thing
else :
do something else
Example:
temp = 1
if weather == 'snowing' :
print ('Take the metro')
elif weather == 'cold' :
if temp > -5 :
print ("Let's walk!")
else :
print ('Take the metro')
else :
print ("Let's walk!")
Multiple conditions: AND & OR
Using logical operators (and, or) is possible in the if statement. You can easily test the conditions specific to the outcome that you want to produce or the problem you are looking at. It avoids having too many nested statements which would not be too easy to read.
if weather == 'snowing' and temp < -5 :
print ('Take the metro')
elif weather == 'cold' or temp < -5 :
print ('Take the metro')
else :
print ("Let's walk!")
3.2 For
loops
Using a for
loop you could plan your outings for a few days based on the forecast. A for loop is used to iterate over a sequence (a list of values) and to repeat code.
What we are now familiar with:
weather = 'snowing'
temp = -15
What if we want to take into account the forecast for the next five days?
forecast = ['snowing', 'cloudy', 'snowing', 'clear', 'rainy']
temps = [-15, 3, -2, -23, 11]
Can we still use our for
loop?
if forecast == 'snowing' and temps < -5 :
print ('Take the metro')
elif forecast == 'cold' or temps < -5 :
print ('Take the metro')
else :
print ("Let's walk!")
Iterations
Structure:
for i in [] :
do something for each i
Examples:
for t in temps :
print ('It is', t, 'C outside')
# remember the len function? and how to access elements of a list?
for t in range(0, len(temps)) :
print ('It is', temps[t], 'C outside')
If
statement inside for
loop
Structure:
for i in [] :
if condition on i :
do something
else :
do something else
Example:
# don't forget to correctly INDENT
for i in range(0, len(temps)) :
t = temps[i]
w = forecast[i]
if w == 'snowing' and t < -5 :
print ('Take the metro')
elif w == 'cold' or t < -5 :
print ('Take the metro')
else :
print ("Let's walk!")
3.3 Extras
Some logical operators
print (2 < 3)
print (2 > 3)
print (2 <= 2, 1 <= 2)
print (4 >= 4, 4 >= 3)
print (1 == 1)
print (1 != 1, 1 != 2)
print (not True, not False)
print (True or False, True | False)
print (True and False, True and True, False & True, False & False)
4. Functions
A function is a block of code that can be reused and run only when called. You can pass (or not) arguments and return (or not) results.
4.1 Syntax and arguments
Basic syntax
def function_name(args) :
result = do something with args
return result
def temp_difference(temp1, temp2) : # name and arguments
result = temp2 - temp1 # what the function does
return result # what the function returns
# apply function on values
temp_difference(-15, -5)
# apply function on variables
temperature = [-15, -23]
temp_difference(temperature[0], temperature[1])
Using built-in functions
def abs_temp_difference(temp1, temp2) :
result = temp1 - temp2
abs_result = abs(result)
return abs_result
abs_temp_difference(temperature[0], temperature[1])
4.2 Scope
Variables can exist in a global or a local scope.
Remember, the abs_temp_difference
function is returning the value of the abs_result
variable
# what will 'abs_result' return?
abs_result
Here is a second example:
# global variables
trees = 4
squirrels = 10
def count_living_things() :
birds = 5 # local variable
squirrels = 20
return [birds, squirrels, trees]
# global and local variables returned: what is what?
count_living_things()
# which variable does not exist in the global scope?
birds
# what is the value of 'squirrels'?
squirrels
4.3 Integration
Combining functions and control flow
Let's go back to our forecast example.
Here's the forecast for an entire week:
# week forecast
weather_week = ['snowing', 'cloudy', 'snowing', 'clear', 'rainy']
temp_week = [-15, 3, -2, -23, 11]
# weekend forecast
weather_weekend = ['snowing', 'rainy']
temp_weekend = [-3, 2]
Let's build a function that will analyze either the week or the weekend forecast
def choose_transporation(weather, temperature) :
for each day :
if snowing or cold :
'Take the metro'
else :
'Walk'
def choose_transportation(weather, temperature) :
for i in range(0, len(weather)) :
w = weather[i]
t = temperature[i]
if w == 'snowing' or t < -10 :
print ('Take the metro')
else :
print ('Just walk')
# week transportation
choose_transportation(weather_week, temp_week)
# weekend transportation
choose_transportation(weather_weekend, temp_weekend)
4.4 Exercise: Animal Metabolism
This exercise will help you integrate the following:
Files handling
Control flow
Function
Write a function
to read a file if it exists and downloading it if it does not exist
# Pseudocode
function get_data(fn, url)
if fn exist
open it
else
download from url
# remember the os librairy...
path = 'data/clean/'
fn1 = 'soa_tour_python.csv'
fn2 = 'this_file_does_not_exist.csv'
print (os.path.isfile(path+fn1))
print (os.path.isfile(path+fn2))
def get_data(fn, url) :
if os.path.isfile(path+fn) :
df = pd.read_csv(path+fn)
else :
df = pd.read_csv(url)
return df
filename = 'metabolic_rates.csv'
url = "http://sciencecomputing.io/data/metabolicrates.csv"
get_data(filename, url)