Master [Python]

This repo is mounted by: Python

What's Python?

The programming language Python was created in 1989 by Guido van Rossum in the Netherlands. The name Python comes from a tribute to the TV series Monty Python's Flying Circus, of which G. van Rossum is a fan.

The latest version of Python is version 3. Version 2 of Python is now obsolete and has not been maintained since 1 January 2020. If possible, avoid using it.

This programming language has many interesting features:

  • It is cross-platform. That is to say that it works on many operating systems: Windows, Mac OS X, Linux, Android.

  • It is free of charge. You can install it on as many computers as you want (even on your phone!).

  • It is a high-level language. It requires relatively little knowledge about how a computer works to be used.

  • It is an interpreted language. A Python script does not need to be compiled to run, unlike languages like C or C++.

  • It is object-oriented. That is, it is possible to design entities in Python that mimic real-world entities (a cell, a protein, an atom, etc.) with a number of rules of operation and interaction.

  • It is relatively simple to use.

  • Finally, it is widely used in bioinformatics and more generally in data analysis.

All these characteristics mean that Python is now taught in many courses, from secondary to higher education.

1. Data types, data structures and indexing

1.1 Basics

Object assignment, functions, how to comment and get help

x
0.2s
Python

A variable is an area of computer memory in which a value is stored. For the programmer, this variable is defined by a name, whereas for the computer it is actually an address, i.e. a particular area of memory.

In Python, the declaration of a variable and its initialization (i.e. the first value that will be stored in it) are done at the same time. To convince yourself of this, test the following instructions:

x = 2
x
0.2s
Python
2

Line one. In this example, we have declared and then initialized the variable x with the value 2. Note that in reality, several things happened:

  • Python "guessed" that the variable was an integer. It is said that Python is a dynamically typed language.

  • Python has allocated (reserved) memory space to accommodate an integer. Each type of variable takes up more or less memory space. Python has also made it possible to retrieve the variable under the name x.

  • Finally, Python has assigned the value 2 to variable x.

Note that the assignment operator = is used in a certain sense. For example, the statement x = 2 means that the variable on the left (here, x) is assigned the value to the right of the = operator (here, 2). 

x + 2
0.2s
Python
4
x * 3 # We can use Python as a calculator!
0.2s
Python
6
max([1, 2, 3]) # Built-in functions can help execute things; we usually have to provide arguments
0.2s
Python
3
help(max) # Don't know how a function works^ Ask for help!
0.3s
Python

1.2 Data types

The type of a variable corresponds to the nature of the variable. The three main types that we will need first are integers (int), decimal numbers that we will call floats and strings (str). Of course, there are many other types (for example, booleans, complex numbers, etc.).

Numeric: integer and float

The four basic arithmetic operations are done in a simple way on numeric types (integers and floats):

my_int = 1
my_float = 2.34
0.2s
Python
my_int + my_float
0.2s
Python
3.34

Note, however, that if you mix integer and float types, the result is returned as a float (because this type is more general). 

The operator / performs a division. Unlike the operators +, - and *, this one always returns a float

String

For strings, two operations are possible, addition and multiplication :

my_str = "Hello World!"
my_str
0.2s
Python
my_str + " Python"
Shift+Enter to run
Python
my_str * 3
Shift+Enter to run
Python

The addition operator + concatenates (assembles) two strings.

The multiplication operator * between an integer and a character string duplicates (repeats) a character string several times.

Boolean

In programming you often need to know if an expression is True or False.

You can evaluate any expression in Python, and get one of two answers, True or False.

When you compare two values, the expression is evaluated and Python returns the Boolean answer:

is_french = False
is_english = True
is_french == is_english
0.2s
Python

Not sure?

If you can't remember the type of a variable, use the type() function which will remind you of it

print (type(my_int))
print (type(my_float))
print (type(is_english))
0.4s
Python

1.3 Collections

Lists

Lists are ordered collections of changeable elements

A list is a data structure that contains a series of values. Python allows the construction of lists containing values of different types (for example, integer and string), which gives them a great deal of flexibility. A list is declared by a series of values (don't forget quotation marks, single or double, if they are strings) separated by commas, and enclosed in square brackets. Here is an example:

my_list = ['Python', 3.7, 5]
0.2s
Python

Can we add element to a list?

my_list = my_list + [True]
my_list
0.2s
Python
['Python', 3.7, 5, True]

Like strings, lists support the + operator for concatenation, as well as the * operator for duplication.

List inception: list of lists

Finally, you should know that it is quite possible to build lists of lists. This feature can sometimes be very practical. For example:

my_mega_list = [['Python', 'R', 'Julia'], [3.7, 3.6, 1.3]]
0.2s
Python

Lists of lists support the operator + of concatenation

my_mega_list + [True, False, False]
0.2s
Python
[['Python', 'R', 'Julia'], [3.7, 3.6, 1.3], True, False, False]
my_mega_list + [[True, False, False, False]]
0.2s
Python
[['Python', 'R', 'Julia'], [3.7, 3.6, 1.3], [True, False, False, False]]

Dictionary

Dictionaries are unordered collections of changeable and indexed elements

Dictionaries come in very handy when you have to manipulate complex structures to describe and lists have their limitations. Dictionaries are unordered collections of objects, i.e. there is no notion of order (i.e. no index). Dictionary values are accessed by keys.

my_dict = {'Python' : ['list', 'dict', 'df'], 'R' : ['array', 'matrix', 'df']}
0.2s
Python

First, we define an empty dictionary with the braces {} (just as one can do for lists with []). Then, we fill the dictionary with different keys ("Python", "R") to which we assign values (['list', 'dict', 'df'], ['array', 'matrix', 'df']). You can put as many keys as you want in a dictionary (just as you can add as many elements as you want in a list).

my_dict.keys() # Dictionaries are made of keys...
0.1s
Python
dict_keys(['Python', 'R'])
my_dict.values() # ... and values
0.2s
Python
dict_values([['list', 'dict', 'df'], ['array', 'matrix', 'df']])

DataFrame

DataFrames are a data structure containing labeled axes

Pandas is a popular Python package for data science, and with good reason: it offers powerful, expressive and flexible data structures that make data manipulation and analysis easy, among many other things. The DataFrame is one of these structures.

import pandas as pd # DataFrame are not a built-in type of Python; we must import a Python library to use them
0.2s
Python

Obviously, making your DataFrames is your first step in almost anything that you want to do when it comes to data munging in Python. Sometimes, you will want to start from scratch, but you can also convert other data structures, such as lists of lists:

# DF can be build from a list of list
pd.DataFrame(my_mega_list, index=['Language', 'Version'], columns=range(1, 4))
0.2s
Python
123
LanguagePythonRJulia
Version3.73.61.3
2 items
# DF can be build from a dictionary
my_df = pd.DataFrame(my_dict)
my_df
0.2s
Python
PythonR
0listarray
1dictmatrix
2dfdf
3 items

1.4 Dimensions

The len() statement lets you know the length of a list, i.e. the number of items in the list.

len(my_list)
0.2s
Python
4

For dictionaries, this function will give you the number of keys:

len(my_dict)
0.2s
Python
2

After you have created your DataFrame, you might want to know a little bit more about it. You can use the shape property. The shape property will provide you with the dimensions of your DataFrame. That means that you will get to know the width and the height of your DataFrame. :

my_df.shape  #(row,col)
0.1s
Python
(3, 2)

1.5 Indexing

Accessing elements of a list

One of the big advantages of a list is that you can call its items by their position. This number is called the index of the list.

Pay close attention to the fact that the clues in a list of n items starts at 0 and ends at n-1.

my_list[0] 
0.2s
Python
'Python'

The list can also be indexed with negative numbers according to the following pattern:

my_list =        ['Python', 3.7, 5]
#indice positif :     0      1   2 
#indice négatif :    -3     -2   -1
0.2s
Python

Negative indices are counted from the end. Their main advantage is that you can access the last item in a list using index -1 without knowing the length of the list.

my_list[-1]
0.2s
Python
True
my_mega_list[0] #To access an item in the list, use the usual indices 
my_mega_list[0][2] #To access an item in the sub-list, use a double indices
0.2s
Python
'Julia'

Looking up values for a specific key

To retrieve the value associated with a given key, just use the following syntax dictionary[key]

my_dict['Python']
0.2s
Python
['list', 'dict', 'df']

Getting en entire column of a DF

my_df['Python']
0.4s
Python
0list
1dict
2df
2 items

Int and Label indexing of DF

.iloc[] works on the positions in your index. This means that if you give in iloc[2], you look for the values of your DataFrame that are at index ’2`.

# integer indexing [row, col]
my_df.iloc[0, 1]
0.2s
Python
'array'

.loc[] works on labels of your index. This means that if you give in loc[2], you look for the values of your DataFrame that have an index labeled 2.

# label indexing [row, col]
my_df.loc[1, 'R']
0.2s
Python
'matrix'

1.6 Slicing

Another advantage of lists is the possibility to select a part of a list by using an indication built on the [m:n+1] model to retrieve all the elements, from the m element included to the n+1 element excluded. We then say that we retrieve a slice of the list

my_nums = [1, 2, 3, 4, 5, 6, 7, 8, 9]
my_nums[2:6]
0.1s
Python
[3, 4, 5, 6]

Note that when no hint is given to the left or right of the colon symbol, Python defaults to take all elements from the beginning or all elements to the end respectively.

my_nums[:6]
0.1s
Python
[1, 2, 3, 4, 5, 6]
my_nums[6:]
0.2s
Python
[7, 8, 9]

Tuples correspond to lists with the difference that they are not modifiable.  They use parentheses instead of square brackets. 

my_tuple = zip(['a', 'B', 'C', 'd', 'E', 'f'], [False, True, True, False, True, False])
my_alpha = pd.DataFrame(my_tuple, columns=['Letter', 'isCap'], index=range(1, 7))
my_alpha
0.2s
Python
LetterisCap
1aFalse
2BTrue
3CTrue
4dFalse
5ETrue
6fFalse
6 items
### PRO TIP
# Select rows with capital letters only
my_alpha.loc[my_alpha.isCap == True, ]
0.2s
Python
LetterisCap
2BTrue
3CTrue
5ETrue
3 items

2. Files

Absolute path

C:/Users/RonBumblefootThal/Documents/pythonFolder/MyFirstProject/Draft/IDon'tKnowWhatI'mDoing/etc.py

Relative path

~/I_love_my_project/CoolCode.py

2.1 Working with directories

The OS module in python provides functions for interacting with the operating system. OS, comes under Python’s standard utility modules. This module provides a portable way of using operating system dependent functionality. 

import os
0.2s
Python

listdir() : To print files and directories in the current directory on your system

os.listdir()
0.3s
Python
['lib64', 'media', 'root', 'sys', 'var', 'sbin', 'mnt', 'opt', 'dev', 'boot', 'proc', 'etc', 'lib', 'usr', 'home', 'run', 'srv', 'tmp', 'bin', 'data-trek-2020', 'runtimes', '.dockerenv', ...]

Python method chdir() changes the current working directory to the given path

os.chdir("data-trek-2020")
0.1s
Python
os.listdir()
0.3s
Python
['LICENSE', 'README.md', '.git', 'data', 'code', 'output']

2.2 Save/write files

DF example

soa_tour = pd.DataFrame()
soa_tour['country'] = ['USA', 'UK', 'FRA', 'GER', 'BRA']
soa_tour['frequency'] = [39, 6, 6, 5, 3]
soa_tour['continents'] = ['north_america', 'europe', 'europe', 'europe', 'south_america']
soa_tour
0.2s
Python
countryfrequencycontinents
0USA39north_america
1UK6europe
2FRA6europe
3GER5europe
4BRA3south_america
5 items

Write object to a comma-separated values (csv) file.

header: Write out the column names. 

# df_name.to_csv(path, header)
soa_tour.to_csv('data/clean/soa_tour_python.csv', header=True)
0.2s
Python
os.listdir('data/clean')
0.1s
Python
['soa_tour_python.csv', 'README.md']

2.3 Load/read files

From your PC

The addition operator + concatenates (assembles) two strings.

path = 'data/clean/'
filename = 'soa_tour_python.csv'
path + filename 
0.2s
Python
'data/clean/soa_tour_python.csv'

read_csv : Read a comma-separated values (csv) file into DataFrame.

soa_tour_fl = pd.read_csv(path+filename)
soa_tour_fl
0.2s
Python
Unnamed: 0countryfrequencycontinents
00USA39north_america
11UK6europe
22FRA6europe
33GER5europe
44BRA3south_america
5 items

index_col: Column(s) to use as the row labels of the DataFrame, either given as string name or column index.

soa_tour_fl = pd.read_csv(path+filename, index_col=0)
soa_tour_fl
0.2s
Python
countryfrequencycontinents
0USA39north_america
1UK6europe
2FRA6europe
3GER5europe
4BRA3south_america
5 items

From URL

url = "http://sciencecomputing.io/data/metabolicrates.csv"
metabolic_rates = pd.read_csv(url)
0.5s
Python
metabolic_rates.head() #This function returns the first n rows for the object based on position, default n=5
0.3s
Python
ClassOrderFamilyGenusSpeciesStudyM (kg)FMR (kJ / day)
0MammaliaCarnivoraOdobenidaeOdobenusrosmarusAcquarone et al 20061370.0345000.0
1MammaliaCarnivoraOdobenidaeOdobenusrosmarusAcquarone et al 20061250.0417400.0
2AvesProcellariiformesDiomedeidaeDiomedeaexulansAdams et al 19867.43100.0
3AvesProcellariiformesDiomedeidaeDiomedeaexulansAdams et al 19866.952898.0
4AvesProcellariiformesDiomedeidaeDiomedeaexulansAdams et al 19868.93528.0
5 items

Play around a little bit with this big DF... what can you do?

3. Control Flow

You already apply control flow when you decide how to go to work during winter. Simply put, control flow determines the order in which the code is executed or evaluated. It allows defining different actions for different conditions.

For example:

  • You take the metro if it's snowing

  • You take the metro if it's cold

  • You walk every other time

Let's code it now ;)

3.1 Conditional evaluation

Simple if statement

Structure:

if condition :
  do something

Example:

weather = 'snowing'
if weather == 'snowing' :
  print ('Take the metro')
0.4s
Python
weather = 'blue sky'
if weather == 'snowing' :
  print ('Take the metro')
0.2s
Python

If/elif/else statement

Situations are not always related to two conditions (or two possible results). You may encounter situations where multiple conditions and associated results need to be considered.

Structure:

if condition1 :
  do something
  
elif condition2 :
  do another thing
else :
  do something else

Example:

if weather == 'snowing' :
  print ('Take the metro')
elif weather == 'cold' :
  print ('Take the metro')
else :
  print ("Let's walk!")
0.4s
Python

Nested statements

You can even test conditions after a first condition was tested. Don't forget the indentation!

Structure:

if condition1 :
  do something
else :
  if conditionA :
    do another thing
  else :
    do something else

Example:

temp = 1
0.2s
Python
if weather == 'snowing' :
  print ('Take the metro')
elif weather == 'cold' :
  if temp > -5 :
  	print ("Let's walk!")
  else :
    print ('Take the metro')
else :
  print ("Let's walk!")
0.3s
Python

Multiple conditions: AND & OR

Using logical operators (and, or) is possible in the if statement. You can easily test the conditions specific to the outcome that you want to produce or the problem you are looking at. It avoids having too many nested statements which would not be too easy to read.

if weather == 'snowing' and temp < -5 :
  print ('Take the metro')
elif weather == 'cold' or temp < -5 :
	print ('Take the metro')
else :
  print ("Let's walk!")
0.4s
Python

3.2 For loops

Using a for loop you could plan your outings for a few days based on the forecast. A for loop is used to iterate over a sequence (a list of values) and to repeat code.

What we are now familiar with:

weather = 'snowing'
temp = -15
0.2s
Python

What if we want to take into account the forecast for the next five days?

forecast = ['snowing', 'cloudy', 'snowing', 'clear', 'rainy']
temps = [-15, 3, -2, -23, 11]
0.1s
Python

Can we still use our for loop?

if forecast == 'snowing' and temps < -5 :
  print ('Take the metro')
elif forecast == 'cold' or temps < -5 :
	print ('Take the metro')
else :
  print ("Let's walk!")
0.2s
Python

Iterations

Structure:

for i in [] :
  do something for each i

Examples:

for t in temps :
  print ('It is', t, 'C outside')
1.0s
Python
# remember the len function? and how to access elements of a list?
for t in range(0, len(temps)) :
  print ('It is', temps[t], 'C outside')
0.5s
Python

If statement inside for loop

Structure:

for i in [] :
  if condition on i :
    do something
  else :
    do something else

Example:

# don't forget to correctly INDENT
for i in range(0, len(temps)) :
  t = temps[i]
  w = forecast[i]
  
  if w == 'snowing' and t < -5 :
  	print ('Take the metro')
    
  elif w == 'cold' or t < -5 :
    print ('Take the metro')
    
  else :
    print ("Let's walk!")
0.4s
Python

3.3 Extras

Some logical operators

print (2 < 3)
print (2 > 3)
print (2 <= 2, 1 <= 2)
print (4 >= 4, 4 >= 3)
print (1 == 1)
print (1 != 1, 1 != 2)
0.5s
Python
print (not True, not False)
print (True or False, True | False)
print (True and False, True and True, False & True, False & False)
0.3s
Python

4. Functions

A function is a block of code that can be reused and run only when called. You can pass (or not) arguments and return (or not) results.

4.1 Syntax and arguments

Basic syntax

def function_name(args) :
	result = do something with args
  
  return result
def temp_difference(temp1, temp2) : # name and arguments
  result = temp2 - temp1 # what the function does
  
  return result # what the function returns
0.3s
Python
# apply function on values
temp_difference(-15, -5)
0.2s
Python
10
# apply function on variables
temperature = [-15, -23]
temp_difference(temperature[0], temperature[1])
0.2s
Python
-8

Using built-in functions

def abs_temp_difference(temp1, temp2) : 
  result = temp1 - temp2
  abs_result = abs(result)
  
  return abs_result 
0.2s
Python
abs_temp_difference(temperature[0], temperature[1])
0.2s
Python
8

4.2 Scope

Variables can exist in a global or a local scope.

Remember, the abs_temp_difference function is returning the value of the abs_result variable

# what will 'abs_result' return?
abs_result
0.2s
Python

Here is a second example:

# global variables
trees = 4
squirrels = 10
0.3s
Python
def count_living_things() :
  birds = 5 # local variable
  squirrels = 20
  
  return [birds, squirrels, trees]
0.3s
Python
# global and local variables returned: what is what?
count_living_things()
0.2s
Python
[5, 20, 4]
# which variable does not exist in the global scope?
birds
0.2s
Python
# what is the value of 'squirrels'?
squirrels
0.2s
Python
10

4.3 Integration

Combining functions and control flow

Let's go back to our forecast example.

Here's the forecast for an entire week:

# week forecast
weather_week = ['snowing', 'cloudy', 'snowing', 'clear', 'rainy']
temp_week = [-15, 3, -2, -23, 11]
# weekend forecast
weather_weekend = ['snowing', 'rainy']
temp_weekend = [-3, 2]
0.2s
Python

Let's build a function that will analyze either the week or the weekend forecast

def choose_transporation(weather, temperature) :
	for each day :
  	if snowing or cold :
    	'Take the metro'
    else :
    	'Walk'
def choose_transportation(weather, temperature) :
  for i in range(0, len(weather)) :
    w = weather[i]
    t = temperature[i]
  	
    if w == 'snowing' or t < -10 :
      print ('Take the metro')
    
    else :
      print ('Just walk')
0.2s
Python
# week transportation
choose_transportation(weather_week, temp_week)
0.4s
Python
# weekend transportation
choose_transportation(weather_weekend, temp_weekend)
0.4s
Python

4.4 Exercise: Animal Metabolism

This exercise will help you integrate the following:

  • Files handling

  • Control flow

  • Function

Write a function to read a file if it exists and downloading it if it does not exist

# Pseudocode
function get_data(fn, url)
	if fn exist
  	open it
    
  else 
  	download from url
# remember the os librairy...
path = 'data/clean/'
fn1 = 'soa_tour_python.csv'
fn2 = 'this_file_does_not_exist.csv'
print (os.path.isfile(path+fn1))
print (os.path.isfile(path+fn2))
0.4s
Python
def get_data(fn, url) :
  if os.path.isfile(path+fn) :
    df = pd.read_csv(path+fn)
  
  else :
    df = pd.read_csv(url)
    
  return df
0.2s
Python
filename = 'metabolic_rates.csv'
url = "http://sciencecomputing.io/data/metabolicrates.csv"
0.2s
Python
get_data(filename, url)
0.5s
Python
Shift+Enter to run
Python
Runtimes (1)