Hisham Hussein May 04, 2017

# Analyzing Stock Risk

Are there informed ways choose which stocks to invest in? While we can't predict with certainty how stocks will perform, analytical tools can estimate how much risk we take by investing in one stock or another.

Here we'll analyze four tech giant stocks, to answer:

• What was the change in price of the stock over time?
• What was the daily return of the stock on average?
• What was the moving average of the various stocks?
• What was the correlation between different stocks' closing prices?
• What was the correlation between different stocks' daily returns?
• How much value do we put at risk by investing in a particular stock?
• How can we predict, to some degree, future stock behavior?

## 1. Exploring stock prices, returns and averages

First, let's import our libraries and setup our environment.

libraries
Python
#Let's go ahead and start with some imports
import pandas as pd
import numpy as np
import statsmodels

# For Visualization
import matplotlib

import matplotlib.pyplot as plt

import seaborn as sns
sns.set_style('whitegrid')

# plotly imports
# plotly.plotly contains all the machinery to communicate with Plotly
import plotly.plotly as py
import plotly.figure_factory as ff

# plotly.graph_objs contains all the helper classes to make/style plots
from plotly.graph_objs import *

# import cufflings to easily plot pandas data frames
cf.set_config_file(theme='white')

# for making some beautiful custom color scales
import colorlover as cl

# For reading stock data from yahoo

# For time stamps
from datetime import datetime

# import sys
import sys

import warnings
warnings.filterwarnings('ignore')

# for reproducibility purposes, here are the versions of modules used
print("statsmodels: {}".format(statsmodels.__version__))
print("numpy: {}".format(np.__version__))
print("pandas: {}".format(pd.__version__))
print("matplotlib: {}".format(matplotlib.__version__))
print("seaborn: {}".format(sns.__version__))
print("plotly: {}".format(plotly.__version__))
print("python: {}".format(sys.version))
statsmodels: 0.6.1numpy: 1.11.3pandas: 0.19.2matplotlib: 1.5.3seaborn: 0.7.1plotly: 2.0.7cufflinks: 0.8.2python: 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

Using the pandas library, let's fetch and analyze data for these companies stocks: Microsoft, Amazon, Apple and Google.

stocks
Python
# Set up End and Start times for data grab
end = datetime.now()
# we want the start date to be a year a go
start = datetime(end.year - 1,end.month,end.day)

# let's grab stock data for Apple, Google, Microsoft and Amazon
AAPL = DataReader('AAPL', 'yahoo', start, end)
GOOG = DataReader('GOOG', 'yahoo', start, end)
AMZN = DataReader('AMZN', 'yahoo', start, end)
MSFT = DataReader('MSFT', 'yahoo', start, end)

# change the Timestamp index in the four data frames to Date
tech_list = [AAPL, GOOG, AMZN, MSFT]

for stock in tech_list:
stock["Date"] = stock.index.date
for stock in tech_list:
stock.set_index("Date", drop=True, inplace=True)

Let's see what's in the the AAPL dataframe.

purple
Python
plot(ff.create_table(AAPL.describe().round(2), index = True))
bluegray
Python
# General Info
print(AAPL.info())

Index: 252 entries, 2016-05-04 to 2017-05-03
Data columns (total 6 columns):
Open         252 non-null float64
High         252 non-null float64
Low          252 non-null float64
Close        252 non-null float64
Volume       252 non-null int64
dtypes: float64(5), int64(1)
memory usage: 13.8+ KB
None

Let's plot APPL's price change over time.

orange
Python
plot(AAPL["Adj Close"].iplot(asFigure=True,
color = "blue",
title = 'AAPL Closing Price'))

We see an upward trend. But it's ragged due to daily fluctuations. To get a better overview of price variations, we need to compute and plot the Moving Average.

Moving Average is a constantly updated average price for a stock over a specified period. For example, a 10-day Moving Average presents its first data point as the average of prices from Day 1 - Day 10. The next data point is the average of prices from Day 2 - Day 11. And so on.

Investors and traders use moving averages to track trends by glossing over day-to-day price fluctuations.

Let's use pandas again to compute the 10, 20, and 50 day moving averages for Apple's stock. We can add these as new columns to the AAPL data frame.

lightgreen
Python
ma_day = [10, 20, 50]

AAPL['50days MA'] = AAPL['Adj Close'].rolling(50).mean()
deeporange
Python
table = plot(ff.create_table(AAPL.round(2).head(20), index = True,
index_title = "Date"))
table['layout'].update(width = 950)
plot(table)

Finally, we plot the moving averages.

lime
Python
# first, store the columns plottable columns in a list
cols = ['Adj Close', '10days MA', '20days MA', '50days MA']

fig = AAPL[cols].iplot(asFigure=True,
title = "AAPL Moving Averages and Daily Price")
plot(fig)

Notice how the curves smooth out as we average over more days.

## 2. Analyzing Daily Returns

Closing prices show trends, but we can dig deeper. Let's review the daily return - the percentage change of a stock price from one day to the next.

deeppurple
Python
# We'll use pct_change to find the percent change for each day

plot(AAPL['Daily Return'].iplot(asFigure=True,
title = "AAPL Daily Return"))

Unlike the daily stock price, daily returns seem to float around zero. Let's see why. We'll plot the distribution of the average daily return:

pink
Python
# setting up the layout of matplotlib figure
fig, ax = plt.subplots()

# use the distplot function from seaborn to plot the distribution of AAPL
sns.distplot(AAPL['Daily Return'].dropna(), color = 'indianred',
hist_kws=dict(alpha = 0.7))

# remove the plot borders
sns.despine()

#get the plotly json object
fig = plotly.tools.mpl_to_plotly(fig)

# polish the histogram plot
fig['data'][1]['marker']['line']['color'] = 'white'
fig['data'][1]['marker']['line']['width'] = 1

# controlling the size of the figure and adding title
fig['layout'].update(title = "AAPL Daily Return Distribution",
width = 850, height = 450,
xaxis = dict(showgrid=False))
plot(fig)

This looks like a normal distribution, right? Why is this important?

As we'll see later in the article, to assess a stock's risk, we want to estimate the probability of more extreme price changes. The distribution will detect this.

Let's analyze the returns of all the stocks. We'll build a dataframe with the Close columns from each stock dataframe.

teal
Python
# Grab all the closing prices for the tech stock list into one DataFrame
stocks = ['AAPL','GOOG','MSFT','AMZN']

# change the timestamp to date
closing_df["Date"] = closing_df.index.date
closing_df.set_index("Date", drop=True, inplace=True)
green
Python
# let's have a sneak peak at our data frame
index = True, index_title = "Date"))

Let's compare the closing prices.

brown
Python
fig = closing_df.iplot(asFigure = True, subplots = True)
plot(fig)

We get a nice overview here, but we can't do cross sectional comparison: comparing stock prices at a specific point of time. Let's stack the stocks on a single plot. We can hover to see the price of all stocks on a specific day.

yellow
Python
plot(closing_df.iplot(fill=True, asFigure=True, title="Daily Closing Price"))

And now, the daily returns for all four stocks.

red
Python
# Make a new tech returns DataFrame
tech_rets = closing_df.pct_change()

# have a look at our returns
plot(ff.create_table(tech_rets.round(6).head(), index = True))
lightblue
Python
plot(tech_rets.iplot(kind='box',
asFigure = True,
title = "Distribution of Stocks Returns"))

Notice how all stocks show very similar returns distribution.

## 3. Correlation Analysis

In the world of finance, correlation is a statistical measure of how two stocks move in relation to one another.

Correlation is represented by the correlation coefficient, also known as the Pearson correlation coefficient (pearson's r), which ranges between -1 and +1.

When the prices of two stocks usually move in a similar direction, the stocks are considered positively correlated. The amount of correlation ranges from 0, which means no correlation, to 1, which means perfect correlation. Perfect correlation means the relationship that appears to exist between two stocks is positive 100% of the time.

For example, if the prices of oil-related and gas-related stocks increase, they are considered to have a positive correlation. The more predictably they do this, the closer the correlation coefficient is to +1.

When the prices of two securities consistently move in opposite directions, they are negatively correlated. This is measured from 0 to -1, where 0 is no correlation, and -1 is perfect negative correlation. A perfect negative correlation means the relationship between two securities is opposite 100% of the time.

No-correlation stocks move independently of each other, with no apparent relationship.

The picture below summarizes what we've just said:

The importance of correlation analysis appears clearly if you're managing a portfolio of stocks. You want the stocks in your portfolio to have low (or no) correlation between them. When some stocks are losing money, other non-correlated stocks will likely still be gaining or not moving at all, reducing the investor's losses.

So let's use pandas, seaborn and plotly to see how these four stocks correlate with each other.

Let's look at the correlation between Google's stock and itself. What might we expect?

indigo
Python
sns.jointplot('GOOG','GOOG', data = tech_rets, kind='scatter',
color='purple', edgecolor="w", s = 40, alpha = 0.8,
marginal_kws=dict(bins=20),
space = 0.3, xlim = (-0.06, 0.06), ylim = (-0.06, 0.06))

# get the current figure
fig = plt.gcf()

# convert it to plotly figure
fig = plotly.tools.mpl_to_plotly(fig)

# update figure layout
fig['layout'].update(# control x and y axes of the marginal histograms
yaxis2 = dict(nticks = 4, showgrid = False),
xaxis2 = dict(ticks = "", showticklabels = False,
showgrid = True, showline = False),
# control x and y axes of the right marginal histogram
yaxis3 = dict(ticks = "", showticklabels = False),
xaxis3 = dict(nticks = 5, showgrid = False),
width = 850,
height = 450)

plot(fig)

Comparing Google to itself, logically, shows a perfectly linear relationship.

This help us see that two perfectly (and positively) correlated stocks have a linear relationship bewteen their daily return values.

bluegray2
Python
sns.jointplot('GOOG','MSFT', data = tech_rets, kind='scatter',
color='seagreen', edgecolor="w", s = 50, space = 0.3,
marginal_kws=dict(bins=20), xlim = (-0.06, 0.06))

# get the current matplotlib figure
fig = plt.gcf()

# convert matplotlib figure to a plotly one
plotly_fig = plotly.tools.mpl_to_plotly(fig)

# change the marker line color and width
for trace in plotly_fig['data']:
#trace['opacity'] = 0.7
trace['marker']['line']['color'] = 'white'
trace['marker']['line']['width'] = 0.65

# polish the layout by controlling the ticks of the marginal histograms
plotly_fig['layout'].update(# control x and y axes of the top marginal histogram
yaxis2 = dict(nticks = 4, showgrid = False),
xaxis2 = dict(ticks = "", showline = False,
showticklabels = False,
showgrid = True),
# control x and y axes of the right marginal histogram
yaxis3 = dict(ticks = "", showticklabels = False),
xaxis3 = dict(nticks = 5, showgrid = False),
width = 850,
height = 450)

# now plotting
plot(plotly_fig)

As we said before, the pearsonr value gives you a sense of how correlated the daily percentage returns are. The correlation is relatively high at 0.67.

Let's repeat this comparison for every combination of our four stocks.

green2
Python
# the same scatter plot using plotly instead of cufflinks
fig = ff.create_scatterplotmatrix(tech_rets, diag='histogram', size = 5,
height=740, width=880)

for trace in fig['data']:
trace['opacity'] = 0.7
trace['marker'] = dict(color = "blue", line = dict(color = 'white',
width = 0.7))

plot(fig)

A quick glance shows all four stocks are highly correlated with each other. This which means you should not put all of them in one portfolio. Rather, you should pick the least correlated ones and combine them with stocks from other industries that have low correlation with the chosen stocks. This is a wise path towards a more diverse portfolio, and reducing losses.

But we can't tell, just by looking at this chart, which pair are the most and least correlated. For that, we employ the heatmap.

cyan2
Python
plot(tech_rets.dropna().corr().iplot(kind = 'heatmap',
title = "Stocks' Correlations",
colorscale = 'YlGnBu',
asFigure = True))

Better! Now we're ready to determine the risk of investing in any of these stocks.

## 4. Risk Analysis: Should we invest these stocks?

Let's look at one simple way to evaluate risk — comparing the expected return with the standard deviation of the daily returns.

purple2
Python
rets = tech_rets.dropna()

text = list(rets.mean().index.values)

fig = Figure()

fig['data'].append(Scatter(x = rets.mean(),
y = rets.std(),
mode = 'markers+text',
marker = dict(color = 'orange', size = 10,
opacity = 0.7,
line = dict(color = 'darkorange',
width = 1))))

fig['layout'].update(title = "Standard Deviation (Risk) vs. Expected Return",
xaxis = dict(title = "Expected Return",
range = (0, 0.002)),
yaxis = dict(title = "Standard Deviation (Risk)",
range = (0.005, 0.02)),
annotations=[dict(showarrow=True,
arrowsize = 2,
arrowwidth = 1,
x = x,
y = y,
text = stock,
xref='x', yref='y',
ax=20,
bordercolor='#c7c7c7',
borderwidth=2,
bgcolor='#ff7f0e',
opacity=0.8,
ay=-40) for x,y,stock in zip(rets.mean(),
rets.std(),
text)])

plot(fig)

We can see that risk is usually positively correlated with expected return.

If you're risk averse, you could opt for a low-risk, low-return stock. On the other hand, if you're a risk taker you might pick the highest paying stock, knowing you face a higher probability of loss.

While standard deviation is a simple and quick measurement, it can mislead you. It measures how much the return on investment varies, but it ignores the direction of that variation. A stock can be volatile because it suddenly rises. Of course, investors are not distressed by gains!

So we need to look beyond the volatility of returns, at Value at Risk.

### 4.1. Value at Risk

Value at risk (VaR) is the amount of money we expect to lose for a given period of time and a given confidence interval.

The definition exposes three values: a time period, a confidence level and a loss amount (or loss percentage). For example:

• With a 95% or 99% level of confidence, how much can I expect to lose over the next month?
• With 95% or 99% confidence, what is the maximum percentage I can expect to lose over the next year?

Let's example two approaches that consider these values: the Historical method and the Monte Carlo simulation.

#### 4.1.1. Computing Value at Risk Using the Historical Method

The Historical method, also known as the Bootstrap method, simply reorders historical returns from worst to best. From a risk perspective, it assumes that history will repeat.

Do you remember AAPL's daily returns histogram? Let's see it again:

To compute Value at Risk for Apple's stock over the next year with 95% confidence, we need to compute the 0.05 quantile of Apple's last year daily returns distribution. For more information on quantiles, check out this link.

lime2
Python
# The 0.05 empirical quantile of daily returns
var_5_quant = round(tech_rets['AAPL'].quantile(0.05), 3)

# the percent value of the 5th quantile
var_5_perc = "{:.1f}%".format(-var_5_quant*100)

# VaR for 1,000,000 investment
var_one_mil = "${}".format(int(-var_5_quant * 1000000)) print(var_5_quant) -0.015 So the 0.05 empirical quantile of daily returns is at . That means with 95% confidence, our worst daily loss will not exceed So if we have 1 million dollar investment, our one-day 5% VaR is  x 1,000,000 = . #### 4.1.2. Estimating Value at Risk Using Monte Carlo Simulation A Monte Carlo simulation refers to any method that randomly generates trials, but does not tell us anything about the underlying methodology. Here, we use it to develop a model for predicting future stock price returns. Then, we run multiple hypothetical trials through that model. We'll use geometric Brownian motion (GBM), which is technically known as a Markov process. This means the stock price follows a random walk and is consistent with, at the very least, the weak form of the efficient market hypothesis (EMH): past price information is already incorporated, and the next price movement is conditionally independent of past price movements. This means that the past stock prices are independent of where the stock price will be in the future. In other words, you can't predict the future solely based on the previous price of a stock. The equation for geometric Browninan motion is given by the following: Where $S$ is the stock price, $\mu$ is the expected return (which we calculated earlier), $\bold{\sigma}$ is the standard deviation of the returns, $t$ is time, and $\bold{\epsilon}$ is the random variable. We can mulitply both sides by the stock price $S$ to rearrange the formula and solve for the stock price. Now we see that the change in the stock price is the current stock price multiplied by two terms. The first term is known as drift, which is the average daily return multiplied by the change of time. The second term is known as shock. For each time period the stock will drift and then experience a shock which will randomly push the stock price up or down. By simulating this series of steps of drifts and shocks thousands of times, we can simulate where we might expect the stock price to be. Let's start with just a few simulations. First we'll define the variables we'll be using for the Google dataframe. deeporange2 Python # Set up our time horizon days = 365 # Now our delta dt = 1/days # Now let's grab our mu (drift) from the expected return data we got for GOOG mu = rets.mean()['GOOG'] # Now let's grab the volatility of the stock from the std() of the average return sigma = rets.std()['GOOG'] # Next, we will create a function that takes in the starting price and # number of days, and uses the σ and μ we already calculated from our # daily returns def stock_monte_carlo(start_price, days, mu, sigma, dt): price = np.zeros(days) price[0] = start_price # shock and drift shock = np.zeros(days) drift = np.zeros(days) # run price array for number of days for x in range(1, days): drift_delta = mu * dt # calculate shock shock[x] = np.random.normal(loc=drift_delta, scale=sigma * np.sqrt(dt)) # calculate drift drift[x] = drift_delta # calculate price price[x] = price[x-1] + (price[x-1] * (drift[x] + shock[x])) return price Let's put our function to work. deeppurple2 Python # Get start price start_price = GOOG["Adj Close"][0] prices = stock_monte_carlo(start_price, days, mu, sigma, dt) plot(pd.Series(prices).iplot(color = 'blue', asFigure = True, title = "Simulating GOOG Daily Price over 1 year")) Let's run a full Monte Carlo simulation 100 times. yellow2 Python fig = Figure() for run in range(100): fig['data'].append(Scatter(y= stock_monte_carlo(start_price,days,mu,sigma,dt))) fig['layout'].update(title = "Monte Carlo Simulation for Google Stock over 1 year", xaxis = dict(title = "Days"), yaxis = dict(title = "Price"), showlegend = False) plot(fig) Finally, let's get the end results for a much larger run. pink2 Python # Set a large numebr of runs runs = 10000 # Create an empty matrix to hold the end price data simulations = np.zeros(runs) # Set the print options of numpy to only display 0-5 points from an array # to suppress output np.set_printoptions(threshold=5) for run in range(runs): # Set the simulation data point as the last stock price for that run simulations[run] = stock_monte_carlo(start_price,days,mu,sigma,dt)[days-1] Now that we have our array of simulations, we can plot the distribution of the final results. We'll use the 1% quantile to define our risk for this stock. lightgreen2 Python # Now we'lll define q as the 1% empirical qunatile, # this basically means that 99% of the values should fall between here q = np.percentile(simulations, 1) # Now let's plot the distribution of the end prices fig = Figure() fig['data'].append(Histogram(x = simulations, nbinsx = 100, marker = dict(line = dict(color = 'white', width = 1.3)))) fig['layout'].update(title = 'Distribution of Final Results', annotations = [# Starting Price dict(x = 0.9, y = 0.9, showarrow = False, xref = 'paper', yref = 'paper', text = "Start Price:${:.2f}".format(start_price)),
# Mean ending price
dict(x = 0.9, y = 0.8,
showarrow = False,
xref = 'paper', yref = 'paper',
text = "Mean Final Price: ${:.2f}".format(simulations.mean())), # Variance of the price (within 99% confidence interval) dict(x = 0.9, y = 0.7, showarrow = False, xref = 'paper', yref = 'paper', text = "VaR(0.99):${:.2f}".format(start_price - q)),
dict(x = 0.07, y = 0.8,
showarrow = False,
xref = 'paper', yref = 'paper',
text = "q(0.99): ${:.2f}".format(q))], shapes = [# Plot a line at the 1% quantile result dict(type = 'line', x0 = q, x1 = q, y0 = 0, y1 = 500, line = dict(color = 'red', width = 2))]) plot(fig) teal2 Python var = "${}".format(round(start_price - q,2))
start_price = "${}".format(round(start_price, 2)) So we have looked at the 1% empirical quantile of the final price distribution to estimate Value at Risk for Google stock, which looks to be$13.92 for every investment of \$695.7 (the starting price of google stock).

For every initial stock you purchase, you're putting about at risk 99% of the time.