Code snippets

Packages

Install packages

Update packages

### From https://www.neonscience.org/packages-in-r
## List all packages where an update is available
old.packages()

## Update all available packages
update.packages()

## Update, without prompts for permission
update.packages(ask = FALSE)

## Update only a specific package use install.packages()
install.packages("plotly")

Data objects

Binary feather format

## Import library
import feather

## Save dataframe to feather format
df.to_feather('tmp/df_feather')

## Read dataframe from feather file
df = pd.read_feather('tmp/df_feather')  ## May throw an error

df = feather.read_dataframe('tmp/df_feather') ## Alternative method
## Import library
library(feather)

## Save dataframe to feather format
write_feather(df, "file.name")   

## Read dataframe from feather file
df <- read_feather(path)

Serialize objects for storage

import pickle

with open('train.pickle', 'wb') as f:
    pickle.dump([X_train, y_train], f)    
                                         
with open('train.pickle', 'rb') as f:
    X_train, y_train = pickle.load(f)

Environment set-up

Set environment variables

## Create/edit the .bashrc file
nano ~/.bashrc

## Create an environment variable "ENVAR" in the file
export ENVAR="username"

## Source the .bashrc file
source ~/.bashrc
## Import library
import os

## Set environment variables
os.environ['API_TOKEN']="API key"
os.environ['PROJECT']="username/project_name"
os.environ['NOTEBOOK_ID']="notebook ID"

Exploring data

Data wrangling

Dataframe operations

Concatenate multiple dataframes (side-by-side)

## Import library
import pandas as pd

## Concatenate dataframes
new_df = pd.concat([df1, df2])
rbind(df1, df2)

Dealing with missing values

Get rows with no missing values in any column

df[~df.isnull().any(axis=1)

Replace all missing/NaN values

## Replace NaN with row mean
df.fillna(df.mean(axis=0))

## Replace NaN with 0
df.fillna(0)

## Replace NaN with value from adjacent cell
df['B'].combine_first(df['A'])

Replace values

Conditional replacement of cell value

df.loc[df.ID == 103, 'FirstName'] = "Matt"