Analyzing Stock Risk

Are there informed ways choose which stocks to invest in? While we can't predict with certainty how stocks will perform, analytical tools can estimate how much risk we take by investing in one stock or another.

Here we'll analyze four tech giant stocks, to answer:

  • What was the change in price of the stock over time?
  • What was the daily return of the stock on average?
  • What was the moving average of the various stocks?
  • What was the correlation between different stocks' closing prices?
  • What was the correlation between different stocks' daily returns?
  • How much value do we put at risk by investing in a particular stock?
  • How can we predict, to some degree, future stock behavior?

Exploring stock prices, returns and averages

First, let's import our libraries and setup our environment, then save it.

# cross-compatible pandas and pdr versions
conda install -qy -c conda-forge \
  pandas=0.20.3 pandas-datareader=0.5.0 \
  ipython scipy statsmodels seaborn cufflinks-py
pip install fix_yahoo_finance

analysis (Python)install
#Let's go ahead and start with some imports
import pandas as pd
import numpy as np
import statsmodels

# For Visualization
import matplotlib

import matplotlib.pyplot as plt

import seaborn as sns

# plotly imports
# plotly.plotly contains all the machinery to communicate with Plotly
import plotly.plotly as py
import plotly.figure_factory as ff

# plotly.graph_objs contains all the helper classes to make/style plots
from plotly.graph_objs import *

# import cufflings to easily plot pandas data frames
import cufflinks as cf

# for making some beautiful custom color scales
# import colorlover as cl

# For reading stock data from yahoo
from pandas_datareader import data as pdr

import pandas_datareader

# For time stamps
from datetime import datetime

# import sys 
import sys

import warnings

import fix_yahoo_finance as yf

# for reproducibility purposes, here are the versions of modules used 
print("statsmodels: {}".format(statsmodels.__version__))
print("numpy: {}".format(np.__version__))
print("pandas: {}".format(pd.__version__))
print("matplotlib: {}".format(matplotlib.__version__))
print("seaborn: {}".format(sns.__version__))
print("cufflinks: {}".format(cf.__version__))
print("python: {}".format(sys.version))
print("datareader: {}".format(pandas_datareader.version))

Using the pandas library, let's fetch and analyze data for these companies stocks: Microsoft, Amazon, Apple and Google.

# Set up End and Start times for data grab
end =
# we want the start date to be a year a go
start = datetime(end.year - 1,end.month,

# let's grab stock data for Apple, Google, Microsoft and Amazon
AAPL = pdr.get_data_yahoo('AAPL', start, end)
GOOG = pdr.get_data_yahoo('GOOG', start, end)
AMZN = pdr.get_data_yahoo('AMZN', start, end)
MSFT = pdr.get_data_yahoo('MSFT', start, end)

# change the Timestamp index in the four data frames to Date
tech_list = [AAPL, GOOG, AMZN, MSFT]

for stock in tech_list:   
    stock["Date"] =
for stock in tech_list:
    stock.set_index("Date", drop=True, inplace=True)
[                       0%                       ][*********************100%***********************]  1 of 1 downloaded[                       0%                       ][*********************100%***********************]  1 of 1 downloaded[                       0%                       ][*********************100%***********************]  1 of 1 downloaded[                       0%                       ][*********************100%***********************]  1 of 1 downloaded/opt/conda/lib/python3.6/site-packages/matplotlib/ UserWarning:

 This call to matplotlib.use() has no effect
because the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

Let's see what's in the the AAPL dataframe.

ff.create_table(AAPL.describe().round(2), index = True)
# General Info
<class 'pandas.core.frame.DataFrame'>
Index: 252 entries, 2016-08-25 to 2017-08-24
Data columns (total 6 columns):
Open         252 non-null float64
High         252 non-null float64
Low          252 non-null float64
Close        252 non-null float64
Adj Close    252 non-null float64
Volume       252 non-null int64
dtypes: float64(5), int64(1)
memory usage: 13.8+ KB

Let's plot APPL's price change over time.

AAPL["Adj Close"].iplot(asFigure=True, 
                        color = "blue",
                        layout=Layout(title = 'AAPL Closing Price'))

We see an upward trend. But it's ragged due to daily fluctuations. To get a better overview of price variations, we need to compute and plot the Moving Average.

Moving Average is a constantly updated average price for a stock over a specified period. For example, a 10-day Moving Average presents its first data point as the average of prices from Day 1 - Day 10. The next data point is the average of prices from Day 2 - Day 11. And so on.

Investors and traders use moving averages to track trends by glossing over day-to-day price fluctuations.

Let's use pandas again to compute the 10, 20, and 50 day moving averages for Apple's stock. We can add these as new columns to the AAPL data frame.

ma_day = [10, 20, 50]

AAPL['10days MA'] = AAPL['Adj Close'].rolling(10).mean()
AAPL['20days MA'] = AAPL['Adj Close'].rolling(20).mean()
AAPL['50days MA'] = AAPL['Adj Close'].rolling(50).mean()
table = ff.create_table(AAPL.round(2).head(20), index = True, 
                             index_title = "Date")
table['layout'].update(width = 800)

Finally, we plot the moving averages.

# first, store the columns plottable columns in a list
cols = ['Adj Close', '10days MA', '20days MA', '50days MA']

fig = AAPL[cols].iplot(asFigure=True,
  layout=Layout(title = "AAPL Moving Averages and Daily Price"))


Notice how the curves smooth out as we average over more days.

Analyzing Daily Returns

Closing prices show trends, but we can dig deeper. Let's review the daily return - the percentage change of a stock price from one day to the next.

# We'll use pct_change to find the percent change for each day
AAPL['Daily Return'] = AAPL['Adj Close'].pct_change()

AAPL['Daily Return'].iplot(asFigure=True, 
                           layout=Layout(title = "AAPL Daily Return"))

Unlike the daily stock price, daily returns seem to float around zero. Let's see why. We'll plot the distribution of the average daily return:

import plotly

# setting up the layout of matplotlib figure
fig, ax = plt.subplots()

# use the distplot function from seaborn to plot the distribution of AAPL
sns.distplot(AAPL['Daily Return'].dropna(), color = 'indianred', 
             hist_kws=dict(alpha = 0.7))

# remove the plot borders

#get the plotly json object
fig =

# polish the histogram plot
fig['data'][1]['marker']['line']['color'] = 'white'
fig['data'][1]['marker']['line']['width'] = 1

# controlling the size of the figure and adding title
fig['layout'].update(title = "AAPL Daily Return Distribution",
                     width = 850, height = 450,
                     xaxis = dict(showgrid=False))

This looks like a normal distribution, right? Why is this important?

As we'll see later in the article, to assess a stock's risk, we want to estimate the probability of more extreme price changes. The distribution will detect this.

Let's analyze the returns of all the stocks. We'll build a dataframe with the Close columns from each stock dataframe.

# Grab all the closing prices for the tech stock list into one DataFrame
stocks = ['AAPL','GOOG','MSFT','AMZN']
closing_df = pdr.get_data_yahoo(stocks,start,end)['Adj Close']

# change the timestamp to date
closing_df["Date"] =
closing_df.set_index("Date", drop=True, inplace=True)
[                       0%                       ][**********************50%                       ]  2 of 4 downloaded[**********************75%***********            ]  3 of 4 downloaded[*********************100%***********************]  4 of 4 downloaded[*********************100%***********************]  4 of 4 downloaded/opt/conda/lib/python3.6/site-packages/matplotlib/ UserWarning:

 This call to matplotlib.use() has no effect
because the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

# let's have a sneak peak at our data frame
                index = True, index_title = "Date")

Let's compare the closing prices.

closing_df.iplot(asFigure = True, subplots = True, 
                 layout=Layout(title="Closing Price Comparison"))

We get a nice overview here, but we can't do cross sectional comparison: comparing stock prices at a specific point of time. Let's stack the stocks on a single plot. We can hover to see the price of all stocks on a specific day.

closing_df.iplot(fill=True, asFigure=True, 
                 layout=Layout(title="Daily Closing Price"))

And now, the daily returns for all four stocks.

# Make a new tech returns DataFrame
tech_rets = closing_df.pct_change()

# have a look at our returns
ff.create_table(tech_rets.round(6).head(), index = True)
                asFigure = True, 
                layout=Layout(title = "Distribution of Stocks Returns"))

Notice how all stocks show very similar returns distribution.

Correlation Analysis

In the world of finance, correlation is a statistical measure of how two stocks move in relation to one another.

Correlation is represented by the correlation coefficient, also known as the Pearson correlation coefficient (pearson's r), which ranges between -1 and +1.

When the prices of two stocks usually move in a similar direction, the stocks are considered positively correlated. The amount of correlation ranges from 0, which means no correlation, to 1, which means perfect correlation. Perfect correlation means the relationship that appears to exist between two stocks is positive 100% of the time.

For example, if the prices of oil-related and gas-related stocks increase, they are considered to have a positive correlation. The more predictably they do this, the closer the correlation coefficient is to +1.

When the prices of two securities consistently move in opposite directions, they are negatively correlated. This is measured from 0 to -1, where 0 is no correlation, and -1 is perfect negative correlation. A perfect negative correlation means the relationship between two securities is opposite 100% of the time.

No-correlation stocks move independently of each other, with no apparent relationship.

The picture below summarizes what we've just said:

The importance of correlation analysis appears clearly if you're managing a portfolio of stocks. You want the stocks in your portfolio to have low (or no) correlation between them. When some stocks are losing money, other non-correlated stocks will likely still be gaining or not moving at all, reducing the investor's losses.

This article is not dedicated to portfolio management. But correlation is an essential component of the financial analyst's toolbox.

So let's use pandas, seaborn and plotly to see how these four stocks correlate with each other.

Let's look at the correlation between Google's stock and itself. What might we expect?

import plotly

sns.jointplot('GOOG','GOOG', data = tech_rets, kind='scatter',
              color='purple', edgecolor="w", s = 40, alpha = 0.8,
              space = 0.3, xlim = (-0.06, 0.06), ylim = (-0.06, 0.06))

# get the current figure
fig = plt.gcf()

# convert it to plotly figure
fig =

# update figure layout
fig['layout'].update(# control x and y axes of the marginal histograms
                     yaxis2 = dict(nticks = 4, showgrid = False),
                     xaxis2 = dict(ticks = "", showticklabels = False, 
                                   showgrid = True, showline = False),
                     # control x and y axes of the right marginal histogram
                     yaxis3 = dict(ticks = "", showticklabels = False),
                     xaxis3 = dict(nticks = 5, showgrid = False),
                     width = 850,
                     height = 450)


Comparing Google to itself, logically, shows a perfectly linear relationship.

This help us see that two perfectly (and positively) correlated stocks have a linear relationship bewteen their daily return values.

Let's correlate Google and Microsoft.

analysis (Python)install
import plotly

sns.jointplot('GOOG','MSFT', data = tech_rets, kind='scatter',
              color='seagreen', edgecolor="w", s = 50, space = 0.3,
              marginal_kws=dict(bins=20), xlim = (-0.06, 0.06))

# get the current matplotlib figure
fig = plt.gcf()

# convert matplotlib figure to a plotly one
plotly_fig =

# change the marker line color and width
for trace in plotly_fig['data']:
    #trace['opacity'] = 0.7
    trace['marker']['line']['color'] = 'white'
    trace['marker']['line']['width'] = 0.65

# polish the layout by controlling the ticks of the marginal histograms
plotly_fig['layout'].update(# control x and y axes of
                            # the top marginal histogram
                            yaxis2 = dict(nticks = 4, showgrid = False),
                            xaxis2 = dict(ticks = "", showline = False,
                                          showticklabels = False, 
                                          showgrid = True),
                            # control x and y axes of
                            # the right marginal histogram
                            yaxis3 = dict(ticks = "", showticklabels = False),
                            xaxis3 = dict(nticks = 5, showgrid = False),
                            width = 850,
                            height = 450)

QFontDatabase: Cannot find font directory /opt/conda/lib/fonts - is Qt installed correctly?
QFontDatabase: Cannot find font directory /opt/conda/lib/fonts - is Qt installed correctly?
QFontDatabase: Cannot find font directory /opt/conda/lib/fonts - is Qt installed correctly?
QFontDatabase: Cannot find font directory /opt/conda/lib/fonts - is Qt installed correctly?
/opt/conda/lib/python3.6/site-packages/plotly/matplotlylib/ UserWarning:

Looks like the annotation(s) you are trying 
to draw lies/lay outside the given figure size.

Therefore, the resulting Plotly figure may not be 
large enough to view the full text. To adjust 
the size of the figure, use the 'width' and 
'height' keys in the Layout object. Alternatively,
use the Margin object to adjust the figure's margins.

/opt/conda/lib/python3.6/site-packages/plotly/matplotlylib/ UserWarning:

I found a path object that I don't think is part of a bar chart. Ignoring.

As we said before, the pearsonr value gives you a sense of how correlated the daily percentage returns are. The correlation is relatively high at 0.67.

Let's repeat this comparison for every combination of our four stocks.

# the same scatter plot using plotly instead of cufflinks
fig = ff.create_scatterplotmatrix(tech_rets, diag='histogram', size = 5,
                                  height=740, width=880)

for trace in fig['data']:
    trace['opacity'] = 0.7
    trace['marker'] = dict(color = "blue", line = dict(color = 'white', 
                                                       width = 0.7))


A quick glance shows all four stocks are highly correlated with each other. This which means you should not put all of them in one portfolio. Rather, you should pick the least correlated ones and combine them with stocks from other industries that have low correlation with the chosen stocks. This is a wise path towards a more diverse portfolio, and reducing losses.

But we can't tell, just by looking at this chart, which pair are the most and least correlated. For that, we employ the heatmap.

tech_rets.dropna().corr().iplot(kind = 'heatmap',
                                layout=Layout(title = "Stocks' Correlations"),
                                colorscale = 'YlGnBu',
                                asFigure = True)

Better! Now we're ready to determine the risk of investing in any of these stocks.

Risk Analysis: Should we invest these stocks?

Let's look at one simple way to evaluate risk — comparing the expected return with the standard deviation of the daily returns.

analysis (Python)install
rets = tech_rets.dropna()

text = list(rets.mean().index.values)

fig = Figure()

fig['data'].append(Scatter(x = rets.mean(),
                           y = rets.std(),
                           mode = 'markers+text',
                           marker = dict(color = 'orange', size = 10, 
                                         opacity = 0.7,
                                         line = dict(color = 'darkorange',
                                                     width = 1))))

fig['layout'].update(title = "Standard Deviation (Risk) vs. Expected Return",
                     xaxis = dict(title = "Expected Return",
                                  range = (-0.002, 0.002)),
                     yaxis = dict(title = "Standard Deviation (Risk)",
                                  range = (0, 0.02)),
                     annotations = [ 
                                     dict(showarrow=True, arrowhead=2,
                                          arrowsize = 2, arrowwidth = 1,
                                          x = x, y = y, xref='x', yref='y',
                                          text = stock, ax=20,
                                          bordercolor='#c7c7c7', borderwidth=2,
                                          borderpad=4, bgcolor='#ff7f0e',
                                          opacity=0.8, ay=-40) 
                                     for x,y,stock in zip(rets.mean(), 


We can see that risk is usually positively correlated with expected return.

If you're risk averse, you could opt for a low-risk, low-return stock. On the other hand, if you're a risk taker you might pick the highest paying stock, knowing you face a higher probability of loss.

While standard deviation is a simple and quick measurement, it can mislead you. It measures how much the return on investment varies, but it ignores the direction of that variation. A stock can be volatile because it suddenly rises. Of course, investors are not distressed by gains!

So we need to look beyond the volatility of returns, at Value at Risk.

Value at Risk

Value at risk (VaR) is the amount of money we expect to lose for a given period of time and a given confidence interval.

The definition exposes three values: a time period, a confidence level and a loss amount (or loss percentage). For example:

  • With a 95% or 99% level of confidence, how much can I expect to lose over the next month?
  • With 95% or 99% confidence, what is the maximum percentage I can expect to lose over the next year?

Let's example two approaches that consider these values: the Historical method and the Monte Carlo simulation.

Computing Value at Risk Using the Historical Method

The Historical method, also known as the Bootstrap method, simply reorders historical returns from worst to best. From a risk perspective, it assumes that history will repeat.

Do you remember AAPL's daily returns histogram? Let's see it again:

import plotly
# Note the use of dropna() here, otherwise the NaN values can't be read by seaborn
fig, ax = plt.subplots()

sns.distplot(AAPL['Daily Return'].dropna(), color = "darkcyan", 
             hist_kws = dict(alpha=0.7), ax=ax)

plotly_fig =

plotly_fig['data'][1]['marker']['line']['color'] = 'white'
plotly_fig['data'][1]['marker']['line']['width'] = 1

plotly_fig['layout'].update(title = "AAPL Daily Return Distribution",
                            paper_bgcolor = 'white',
                            xaxis = dict(showgrid=False),
                            width = 850,
                            height = 450)

To compute Value at Risk for Apple's stock over the next year with 95% confidence, we need to compute the 0.05 quantile of Apple's last year daily returns distribution. For more information on quantiles, check out this link.

# The 0.05 empirical quantile of daily returns
var_5_quant = round(tech_rets['AAPL'].quantile(0.05), 3)
# the percent value of the 5th quantile
var_5_perc = "{:.1f}%".format(-var_5_quant*100)
# VaR for 1,000,000 investment
var_one_mil = "${}".format(int(-var_5_quant * 1000000))

So the 0.05 empirical quantile of daily returns is at nilquantile. That means with 95% confidence, our worst daily loss will not exceed nilquantile

So if we have 1 million dollar investment, our one-day 5% VaR is nilquantile % x 1,000,000 = nilVaR @ 1M.

Estimating Value at Risk Using Monte Carlo Simulation

A Monte Carlo simulation refers to any method that randomly generates trials, but does not tell us anything about the underlying methodology. Here, we use it to develop a model for predicting future stock price returns. Then, we run multiple hypothetical trials through that model.

We'll use geometric Brownian motion (GBM), which is technically known as a Markov process. This means the stock price follows a random walk and is consistent with, at the very least, the weak form of the efficient market hypothesis (EMH): past price information is already incorporated, and the next price movement is conditionally independent of past price movements.

This means that the past stock prices are independent of where the stock price will be in the future. In other words, you can't predict the future solely based on the previous price of a stock.

The equation for geometric Browninan motion is given by the following:

ΔSS=μΔt+σϵΔt\frac{\Delta S}{S} = \mu\Delta t + \sigma \epsilon \sqrt{\Delta t}

Where SS is the stock price, μ\mu is the expected return (which we calculated earlier), σ\bold{\sigma} is the standard deviation of the returns, tt is time, and ϵ\bold{\epsilon} is the random variable.

We can mulitply both sides by the stock price SS to rearrange the formula and solve for the stock price.

ΔS=S(μΔt+σϵΔt)\Delta S = S(\mu\Delta t + \sigma \epsilon \sqrt{\Delta t})

Now we see that the change in the stock price is the current stock price multiplied by two terms. The first term is known as drift, which is the average daily return multiplied by the change of time. The second term is known as shock.

For each time period the stock will drift and then experience a shock which will randomly push the stock price up or down. By simulating this series of steps of drifts and shocks thousands of times, we can simulate where we might expect the stock price to be.

Let's start with just a few simulations. First we'll define the variables we'll be using for the Google dataframe.

analysis (Python)install
# Set up our time horizon
days = 365

# Now our delta
dt = 1.0/days

# Now let's grab our mu (drift) from the expected return data we got for GOOG
mu = rets.mean()['GOOG']

# Now let's grab the volatility of the stock from the std() of the average return
sigma = rets.std()['GOOG']

# Next, we will create a function that takes in the starting price and 
# number of days, and uses the σ and μ we already calculated from our 
# daily returns

def stock_monte_carlo(start_price, days, mu, sigma, dt):
  import numpy as np

  price = np.zeros(days)
  price[0] = start_price
  # shock and drift
  shock = np.zeros(days)
  drift = np.zeros(days)
  # run price array for number of days
  for x in range(1, days):
    drift_delta = mu * dt
    # calculate shock
    shock[x] = np.random.normal(loc=drift_delta, scale=sigma * np.sqrt(dt))
    # calculate drift
    drift[x] = drift_delta
    # calculate price
    price[x] = price[x-1] + (price[x-1] * (drift[x] + shock[x]))
  return price

Let's put our function to work.

# Get start price 
start_price = GOOG["Adj Close"][0]

prices = stock_monte_carlo(start_price, days, mu, sigma, dt)
pd.Series(prices).iplot(color = 'blue', asFigure = True, 
  layout=Layout(title = "Simulating GOOG Daily Price over 1 year"))

Let's run a full Monte Carlo simulation 100 times.

fig = Figure()

for run in range(100):
    fig['data'].append(Scatter(y= stock_monte_carlo(start_price,days,mu,sigma,dt)))
fig['layout'].update(title = "Monte Carlo Simulation for Google Stock over 1 year",
                     xaxis = dict(title = "Days"),
                     yaxis = dict(title = "Price"),
                     showlegend = False)


Finally, let's get the end results for a much larger run.

# Set a large numebr of runs
runs = 10000

# Create an empty matrix to hold the end price data
simulations = np.zeros(runs)

# Set the print options of numpy to only display 0-5 points from an array 
# to suppress output

for run in range(runs):    
    # Set the simulation data point as the last stock price for that run
    simulations[run] = stock_monte_carlo(start_price,days,mu,sigma,dt)[days-1]

Now that we have our array of simulations, we can plot the distribution of the final results. We'll use the 1% quantile to define our risk for this stock.

1% quantile risk
analysis (Python)install
# Now we'lll define q as the 1% empirical qunatile, 
# this basically means that 99% of the values should fall between here
q = np.percentile(simulations, 1)

# Now let's plot the distribution of the end prices
fig = Figure()

fig['data'].append(Histogram(x = simulations, nbinsx = 100,
                             marker = dict(line = dict(color = 'white', 
                                                       width = 1.3))))

fig['layout'].update(title = 'Distribution of Final Results',
annotations = [# Starting Price
               dict(x = 0.9, y = 0.9,
                    showarrow = False,
                    xref = 'paper', yref = 'paper',
                    text = "Start Price: ${:.2f}".format(start_price)),
               # Mean ending price
               dict(x = 0.9, y = 0.8,
                    showarrow = False,
                    xref = 'paper', yref = 'paper',
                    text = "Mean Final Price: ${:.2f}".format(simulations.mean())),
               # Variance of the price (within 99% confidence interval)
               dict(x = 0.9, y = 0.7,
                    showarrow = False,
                    xref = 'paper', yref = 'paper',
                    text = "VaR(0.99): ${:.2f}".format(start_price - q)),
               dict(x = 0.07, y = 0.8,
                    showarrow = False,
                    xref = 'paper', yref = 'paper',
                    text = "q(0.99): ${:.2f}".format(q))],
shapes = [# Plot a line at the 1% quantile result
          dict(type = 'line',
               x0 = q,
               x1 = q,
               y0 = 0,
               y1 = 500,
               line = dict(color = 'red', width = 2))])

"${}".format(round(start_price - q,2))
"${}".format(round(start_price, 2))

So we have looked at the 1% empirical quantile of the final price distribution to estimate Value at Risk for Google stock, which looks to be nil for every investment of nil (the starting price of google stock).

For every initial stock you purchase, you're putting about nil at risk 99% of the time.