Analyzing Stock Risk
Are there informed ways choose which stocks to invest in? While we can't predict with certainty how stocks will perform, analytical tools can estimate how much risk we take by investing in one stock or another.
Here we'll analyze four tech giant stocks, to answer:
- What was the change in price of the stock over time?
- What was the daily return of the stock on average?
- What was the moving average of the various stocks?
- What was the correlation between different stocks' closing prices?
- What was the correlation between different stocks' daily returns?
- How much value do we put at risk by investing in a particular stock?
- How can we predict, to some degree, future stock behavior?
1. Exploring stock prices, returns and averages
First, let's import our libraries and setup our environment, then save it.
# cross-compatible pandas and pdr versions conda install -qy -c conda-forge \ pandas=0.20.3 pandas-datareader=0.5.0 \ ipython scipy statsmodels seaborn cufflinks-py pip install fix_yahoo_finance
#Let's go ahead and start with some imports import pandas as pd import numpy as np import statsmodels # For Visualization import matplotlib import matplotlib.pyplot as plt import seaborn as sns sns.set_style('whitegrid') # plotly imports # plotly.plotly contains all the machinery to communicate with Plotly import plotly.plotly as py import plotly.figure_factory as ff # plotly.graph_objs contains all the helper classes to make/style plots from plotly.graph_objs import * # import cufflings to easily plot pandas data frames import cufflinks as cf #cf.set_config_file(theme='white') # for making some beautiful custom color scales # import colorlover as cl # For reading stock data from yahoo from pandas_datareader import data as pdr import pandas_datareader # For time stamps from datetime import datetime # import sys import sys import warnings warnings.filterwarnings('ignore') import fix_yahoo_finance as yf yf.pdr_override() # for reproducibility purposes, here are the versions of modules used print("statsmodels: {}".format(statsmodels.__version__)) print("numpy: {}".format(np.__version__)) print("pandas: {}".format(pd.__version__)) print("matplotlib: {}".format(matplotlib.__version__)) print("seaborn: {}".format(sns.__version__)) print("cufflinks: {}".format(cf.__version__)) print("python: {}".format(sys.version)) print("datareader: {}".format(pandas_datareader.version))
Using the pandas
library, let's fetch and analyze data for these companies stocks: Microsoft, Amazon, Apple and Google.
# Set up End and Start times for data grab end = datetime.now() # we want the start date to be a year a go start = datetime(end.year - 1,end.month,end.day) # let's grab stock data for Apple, Google, Microsoft and Amazon AAPL = pdr.get_data_yahoo('AAPL', start, end) GOOG = pdr.get_data_yahoo('GOOG', start, end) AMZN = pdr.get_data_yahoo('AMZN', start, end) MSFT = pdr.get_data_yahoo('MSFT', start, end) # change the Timestamp index in the four data frames to Date tech_list = [AAPL, GOOG, AMZN, MSFT] for stock in tech_list: stock["Date"] = stock.index.date for stock in tech_list: stock.set_index("Date", drop=True, inplace=True)
[ 0% ][*********************100%***********************] 1 of 1 downloaded[ 0% ][*********************100%***********************] 1 of 1 downloaded[ 0% ][*********************100%***********************] 1 of 1 downloaded[ 0% ][*********************100%***********************] 1 of 1 downloaded/opt/conda/lib/python3.6/site-packages/matplotlib/__init__.py:1357: UserWarning: This call to matplotlib.use() has no effect because the backend has already been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot, or matplotlib.backends is imported for the first time.
Let's see what's in the the AAPL
dataframe.
ff.create_table(AAPL.describe().round(2), index = True)
# General Info print(AAPL.info())
<class 'pandas.core.frame.DataFrame'> Index: 252 entries, 2016-08-25 to 2017-08-24 Data columns (total 6 columns): Open 252 non-null float64 High 252 non-null float64 Low 252 non-null float64 Close 252 non-null float64 Adj Close 252 non-null float64 Volume 252 non-null int64 dtypes: float64(5), int64(1) memory usage: 13.8+ KB None
Let's plot APPL's price change over time.
AAPL["Adj Close"].iplot(asFigure=True, color = "blue", layout=Layout(title = 'AAPL Closing Price'))
We see an upward trend. But it's ragged due to daily fluctuations. To get a better overview of price variations, we need to compute and plot the Moving Average.
Moving Average is a constantly updated average price for a stock over a specified period. For example, a 10-day Moving Average presents its first data point as the average of prices from Day 1 - Day 10. The next data point is the average of prices from Day 2 - Day 11. And so on.
Investors and traders use moving averages to track trends by glossing over day-to-day price fluctuations.
Let's use pandas
again to compute the 10, 20, and 50 day moving averages for Apple's stock. We can add these as new columns to the AAPL
data frame.
ma_day = [10, 20, 50] AAPL['10days MA'] = AAPL['Adj Close'].rolling(10).mean() AAPL['20days MA'] = AAPL['Adj Close'].rolling(20).mean() AAPL['50days MA'] = AAPL['Adj Close'].rolling(50).mean()
table = ff.create_table(AAPL.round(2).head(20), index = True, index_title = "Date") table['layout'].update(width = 800) table
Finally, we plot the moving averages.
# first, store the columns plottable columns in a list cols = ['Adj Close', '10days MA', '20days MA', '50days MA'] fig = AAPL[cols].iplot(asFigure=True, layout=Layout(title = "AAPL Moving Averages and Daily Price")) fig
Notice how the curves smooth out as we average over more days.
2. Analyzing Daily Returns
Closing prices show trends, but we can dig deeper. Let's review the daily return - the percentage change of a stock price from one day to the next.
# We'll use pct_change to find the percent change for each day AAPL['Daily Return'] = AAPL['Adj Close'].pct_change() AAPL['Daily Return'].iplot(asFigure=True, layout=Layout(title = "AAPL Daily Return"))
Unlike the daily stock price, daily returns seem to float around zero. Let's see why. We'll plot the distribution of the average daily return:
import plotly # setting up the layout of matplotlib figure fig, ax = plt.subplots() # use the distplot function from seaborn to plot the distribution of AAPL sns.distplot(AAPL['Daily Return'].dropna(), color = 'indianred', hist_kws=dict(alpha = 0.7)) # remove the plot borders sns.despine() #get the plotly json object fig = plotly.tools.mpl_to_plotly(fig) # polish the histogram plot fig['data'][1]['marker']['line']['color'] = 'white' fig['data'][1]['marker']['line']['width'] = 1 # controlling the size of the figure and adding title fig['layout'].update(title = "AAPL Daily Return Distribution", width = 850, height = 450, xaxis = dict(showgrid=False)) fig
This looks like a normal distribution, right? Why is this important?
As we'll see later in the article, to assess a stock's risk, we want to estimate the probability of more extreme price changes. The distribution will detect this.
Let's analyze the returns of all the stocks. We'll build a dataframe with the Close
columns from each stock dataframe.
# Grab all the closing prices for the tech stock list into one DataFrame stocks = ['AAPL','GOOG','MSFT','AMZN'] closing_df = pdr.get_data_yahoo(stocks,start,end)['Adj Close'] # change the timestamp to date closing_df["Date"] = closing_df.index.date closing_df.set_index("Date", drop=True, inplace=True)
[ 0% ][**********************50% ] 2 of 4 downloaded[**********************75%*********** ] 3 of 4 downloaded[*********************100%***********************] 4 of 4 downloaded[*********************100%***********************] 4 of 4 downloaded/opt/conda/lib/python3.6/site-packages/matplotlib/__init__.py:1357: UserWarning: This call to matplotlib.use() has no effect because the backend has already been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot, or matplotlib.backends is imported for the first time.
# let's have a sneak peak at our data frame ff.create_table(closing_df.round(2).head(), index = True, index_title = "Date")
Let's compare the closing prices.
closing_df.iplot(asFigure = True, subplots = True, layout=Layout(title="Closing Price Comparison"))
We get a nice overview here, but we can't do cross sectional comparison: comparing stock prices at a specific point of time. Let's stack the stocks on a single plot. We can hover to see the price of all stocks on a specific day.
closing_df.iplot(fill=True, asFigure=True, layout=Layout(title="Daily Closing Price"))
And now, the daily returns for all four stocks.
# Make a new tech returns DataFrame tech_rets = closing_df.pct_change() # have a look at our returns ff.create_table(tech_rets.round(6).head(), index = True)
tech_rets.iplot(kind='box', asFigure = True, layout=Layout(title = "Distribution of Stocks Returns"))
Notice how all stocks show very similar returns distribution.
3. Correlation Analysis
In the world of finance, correlation is a statistical measure of how two stocks move in relation to one another.
Correlation is represented by the correlation coefficient, also known as the Pearson correlation coefficient (pearson's r), which ranges between -1 and +1.
When the prices of two stocks usually move in a similar direction, the stocks are considered positively correlated. The amount of correlation ranges from 0, which means no correlation, to 1, which means perfect correlation. Perfect correlation means the relationship that appears to exist between two stocks is positive 100% of the time.
For example, if the prices of oil-related and gas-related stocks increase, they are considered to have a positive correlation. The more predictably they do this, the closer the correlation coefficient is to +1.
When the prices of two securities consistently move in opposite directions, they are negatively correlated. This is measured from 0 to -1, where 0 is no correlation, and -1 is perfect negative correlation. A perfect negative correlation means the relationship between two securities is opposite 100% of the time.
No-correlation stocks move independently of each other, with no apparent relationship.
The picture below summarizes what we've just said:
The importance of correlation analysis appears clearly if you're managing a portfolio of stocks. You want the stocks in your portfolio to have low (or no) correlation between them. When some stocks are losing money, other non-correlated stocks will likely still be gaining or not moving at all, reducing the investor's losses.
This article is not dedicated to portfolio management. But correlation is an essential component of the financial analyst's toolbox.
So let's use pandas
, seaborn
and plotly
to see how these four stocks correlate with each other.
Let's look at the correlation between Google's stock and itself. What might we expect?
import plotly sns.jointplot('GOOG','GOOG', data = tech_rets, kind='scatter', color='purple', edgecolor="w", s = 40, alpha = 0.8, marginal_kws=dict(bins=20), space = 0.3, xlim = (-0.06, 0.06), ylim = (-0.06, 0.06)) # get the current figure fig = plt.gcf() # convert it to plotly figure fig = plotly.tools.mpl_to_plotly(fig) # update figure layout fig['layout'].update(# control x and y axes of the marginal histograms yaxis2 = dict(nticks = 4, showgrid = False), xaxis2 = dict(ticks = "", showticklabels = False, showgrid = True, showline = False), # control x and y axes of the right marginal histogram yaxis3 = dict(ticks = "", showticklabels = False), xaxis3 = dict(nticks = 5, showgrid = False), width = 850, height = 450) fig
Comparing Google to itself, logically, shows a perfectly linear relationship.
This help us see that two perfectly (and positively) correlated stocks have a linear relationship bewteen their daily return values.
Let's correlate Google and Microsoft.
import plotly sns.jointplot('GOOG','MSFT', data = tech_rets, kind='scatter', color='seagreen', edgecolor="w", s = 50, space = 0.3, marginal_kws=dict(bins=20), xlim = (-0.06, 0.06)) # get the current matplotlib figure fig = plt.gcf() # convert matplotlib figure to a plotly one plotly_fig = plotly.tools.mpl_to_plotly(fig) # change the marker line color and width for trace in plotly_fig['data']: #trace['opacity'] = 0.7 trace['marker']['line']['color'] = 'white' trace['marker']['line']['width'] = 0.65 # polish the layout by controlling the ticks of the marginal histograms plotly_fig['layout'].update(# control x and y axes of # the top marginal histogram yaxis2 = dict(nticks = 4, showgrid = False), xaxis2 = dict(ticks = "", showline = False, showticklabels = False, showgrid = True), # control x and y axes of # the right marginal histogram yaxis3 = dict(ticks = "", showticklabels = False), xaxis3 = dict(nticks = 5, showgrid = False), width = 850, height = 450) plotly_fig
QFontDatabase: Cannot find font directory /opt/conda/lib/fonts - is Qt installed correctly? QFontDatabase: Cannot find font directory /opt/conda/lib/fonts - is Qt installed correctly? QFontDatabase: Cannot find font directory /opt/conda/lib/fonts - is Qt installed correctly? QFontDatabase: Cannot find font directory /opt/conda/lib/fonts - is Qt installed correctly? /opt/conda/lib/python3.6/site-packages/plotly/matplotlylib/renderer.py:516: UserWarning: Looks like the annotation(s) you are trying to draw lies/lay outside the given figure size. Therefore, the resulting Plotly figure may not be large enough to view the full text. To adjust the size of the figure, use the 'width' and 'height' keys in the Layout object. Alternatively, use the Margin object to adjust the figure's margins. /opt/conda/lib/python3.6/site-packages/plotly/matplotlylib/renderer.py:481: UserWarning: I found a path object that I don't think is part of a bar chart. Ignoring.
As we said before, the pearsonr
value gives you a sense of how correlated the daily percentage returns are. The correlation is relatively high at 0.67.
Let's repeat this comparison for every combination of our four stocks.
# the same scatter plot using plotly instead of cufflinks fig = ff.create_scatterplotmatrix(tech_rets, diag='histogram', size = 5, height=740, width=880) for trace in fig['data']: trace['opacity'] = 0.7 trace['marker'] = dict(color = "blue", line = dict(color = 'white', width = 0.7)) fig
A quick glance shows all four stocks are highly correlated with each other. This which means you should not put all of them in one portfolio. Rather, you should pick the least correlated ones and combine them with stocks from other industries that have low correlation with the chosen stocks. This is a wise path towards a more diverse portfolio, and reducing losses.
But we can't tell, just by looking at this chart, which pair are the most and least correlated. For that, we employ the heatmap.
tech_rets.dropna().corr().iplot(kind = 'heatmap', layout=Layout(title = "Stocks' Correlations"), colorscale = 'YlGnBu', asFigure = True)
Better! Now we're ready to determine the risk of investing in any of these stocks.
4. Risk Analysis: Should we invest these stocks?
Let's look at one simple way to evaluate risk — comparing the expected return with the standard deviation of the daily returns.
rets = tech_rets.dropna() text = list(rets.mean().index.values) fig = Figure() fig['data'].append(Scatter(x = rets.mean(), y = rets.std(), mode = 'markers+text', marker = dict(color = 'orange', size = 10, opacity = 0.7, line = dict(color = 'darkorange', width = 1)))) fig['layout'].update(title = "Standard Deviation (Risk) vs. Expected Return", xaxis = dict(title = "Expected Return", range = (-0.002, 0.002)), yaxis = dict(title = "Standard Deviation (Risk)", range = (0, 0.02)), annotations = [ dict(showarrow=True, arrowhead=2, arrowsize = 2, arrowwidth = 1, x = x, y = y, xref='x', yref='y', text = stock, ax=20, bordercolor='#c7c7c7', borderwidth=2, borderpad=4, bgcolor='#ff7f0e', opacity=0.8, ay=-40) for x,y,stock in zip(rets.mean(), rets.std(), text) ] ) fig
We can see that risk is usually positively correlated with expected return.
If you're risk averse, you could opt for a low-risk, low-return stock. On the other hand, if you're a risk taker you might pick the highest paying stock, knowing you face a higher probability of loss.
While standard deviation is a simple and quick measurement, it can mislead you. It measures how much the return on investment varies, but it ignores the direction of that variation. A stock can be volatile because it suddenly rises. Of course, investors are not distressed by gains!
So we need to look beyond the volatility of returns, at Value at Risk.
4.1. Value at Risk
Value at risk (VaR) is the amount of money we expect to lose for a given period of time and a given confidence interval.
The definition exposes three values: a time period, a confidence level and a loss amount (or loss percentage). For example:
- With a 95% or 99% level of confidence, how much can I expect to lose over the next month?
- With 95% or 99% confidence, what is the maximum percentage I can expect to lose over the next year?
Let's example two approaches that consider these values: the Historical method and the Monte Carlo simulation.
4.1.1. Computing Value at Risk Using the Historical Method
The Historical method, also known as the Bootstrap method, simply reorders historical returns from worst to best. From a risk perspective, it assumes that history will repeat.
Do you remember AAPL's daily returns histogram? Let's see it again:
import plotly # Note the use of dropna() here, otherwise the NaN values can't be read by seaborn fig, ax = plt.subplots() sns.despine() sns.distplot(AAPL['Daily Return'].dropna(), color = "darkcyan", hist_kws = dict(alpha=0.7), ax=ax) plotly_fig = plotly.tools.mpl_to_plotly(fig) plotly_fig['data'][1]['marker']['line']['color'] = 'white' plotly_fig['data'][1]['marker']['line']['width'] = 1 plotly_fig['layout'].update(title = "AAPL Daily Return Distribution", paper_bgcolor = 'white', xaxis = dict(showgrid=False), width = 850, height = 450) plotly_fig
To compute Value at Risk for Apple's stock over the next year with 95% confidence, we need to compute the 0.05 quantile of Apple's last year daily returns distribution. For more information on quantiles, check out this link.
# The 0.05 empirical quantile of daily returns var_5_quant = round(tech_rets['AAPL'].quantile(0.05), 3) var_5_quant
-0.016
# the percent value of the 5th quantile var_5_perc = "{:.1f}%".format(-var_5_quant*100) var_5_perc
# VaR for 1,000,000 investment var_one_mil = "${}".format(int(-var_5_quant * 1000000)) var_one_mil
So the 0.05 empirical quantile of daily returns is at nilquantile↩. That means with 95% confidence, our worst daily loss will not exceed nilquantile↩
So if we have 1 million dollar investment, our one-day 5% VaR is nilquantile %↩ x 1,000,000 = nilVaR @ 1M↩.
4.1.2. Estimating Value at Risk Using Monte Carlo Simulation
A Monte Carlo simulation refers to any method that randomly generates trials, but does not tell us anything about the underlying methodology. Here, we use it to develop a model for predicting future stock price returns. Then, we run multiple hypothetical trials through that model.
We'll use geometric Brownian motion (GBM), which is technically known as a Markov process. This means the stock price follows a random walk and is consistent with, at the very least, the weak form of the efficient market hypothesis (EMH): past price information is already incorporated, and the next price movement is conditionally independent of past price movements.
This means that the past stock prices are independent of where the stock price will be in the future. In other words, you can't predict the future solely based on the previous price of a stock.
The equation for geometric Browninan motion is given by the following:
Where is the stock price, is the expected return (which we calculated earlier), is the standard deviation of the returns, is time, and is the random variable.
We can mulitply both sides by the stock price to rearrange the formula and solve for the stock price.
Now we see that the change in the stock price is the current stock price multiplied by two terms. The first term is known as drift, which is the average daily return multiplied by the change of time. The second term is known as shock.
For each time period the stock will drift and then experience a shock which will randomly push the stock price up or down. By simulating this series of steps of drifts and shocks thousands of times, we can simulate where we might expect the stock price to be.
Let's start with just a few simulations. First we'll define the variables we'll be using for the Google dataframe.
# Set up our time horizon days = 365 # Now our delta dt = 1.0/days # Now let's grab our mu (drift) from the expected return data we got for GOOG mu = rets.mean()['GOOG'] # Now let's grab the volatility of the stock from the std() of the average return sigma = rets.std()['GOOG'] # Next, we will create a function that takes in the starting price and # number of days, and uses the σ and μ we already calculated from our # daily returns def stock_monte_carlo(start_price, days, mu, sigma, dt): import numpy as np price = np.zeros(days) price[0] = start_price # shock and drift shock = np.zeros(days) drift = np.zeros(days) # run price array for number of days for x in range(1, days): drift_delta = mu * dt # calculate shock shock[x] = np.random.normal(loc=drift_delta, scale=sigma * np.sqrt(dt)) # calculate drift drift[x] = drift_delta # calculate price price[x] = price[x-1] + (price[x-1] * (drift[x] + shock[x])) return price
Let's put our function to work.
# Get start price start_price = GOOG["Adj Close"][0] prices = stock_monte_carlo(start_price, days, mu, sigma, dt) pd.Series(prices).iplot(color = 'blue', asFigure = True, layout=Layout(title = "Simulating GOOG Daily Price over 1 year"))
Let's run a full Monte Carlo simulation 100 times.
fig = Figure() for run in range(100): fig['data'].append(Scatter(y= stock_monte_carlo(start_price,days,mu,sigma,dt))) fig['layout'].update(title = "Monte Carlo Simulation for Google Stock over 1 year", xaxis = dict(title = "Days"), yaxis = dict(title = "Price"), showlegend = False) fig
Finally, let's get the end results for a much larger run.
# Set a large numebr of runs runs = 10000 # Create an empty matrix to hold the end price data simulations = np.zeros(runs) # Set the print options of numpy to only display 0-5 points from an array # to suppress output np.set_printoptions(threshold=5) for run in range(runs): # Set the simulation data point as the last stock price for that run simulations[run] = stock_monte_carlo(start_price,days,mu,sigma,dt)[days-1]
Now that we have our array of simulations, we can plot the distribution of the final results. We'll use the 1% quantile to define our risk for this stock.
# Now we'lll define q as the 1% empirical qunatile, # this basically means that 99% of the values should fall between here q = np.percentile(simulations, 1) # Now let's plot the distribution of the end prices fig = Figure() fig['data'].append(Histogram(x = simulations, nbinsx = 100, marker = dict(line = dict(color = 'white', width = 1.3)))) fig['layout'].update(title = 'Distribution of Final Results', annotations = [# Starting Price dict(x = 0.9, y = 0.9, showarrow = False, xref = 'paper', yref = 'paper', text = "Start Price: ${:.2f}".format(start_price)), # Mean ending price dict(x = 0.9, y = 0.8, showarrow = False, xref = 'paper', yref = 'paper', text = "Mean Final Price: ${:.2f}".format(simulations.mean())), # Variance of the price (within 99% confidence interval) dict(x = 0.9, y = 0.7, showarrow = False, xref = 'paper', yref = 'paper', text = "VaR(0.99): ${:.2f}".format(start_price - q)), dict(x = 0.07, y = 0.8, showarrow = False, xref = 'paper', yref = 'paper', text = "q(0.99): ${:.2f}".format(q))], shapes = [# Plot a line at the 1% quantile result dict(type = 'line', x0 = q, x1 = q, y0 = 0, y1 = 500, line = dict(color = 'red', width = 2))]) fig
"${}".format(round(start_price - q,2))
"${}".format(round(start_price, 2))
So we have looked at the 1% empirical quantile of the final price distribution to estimate Value at Risk for Google stock, which looks to be nil↩ for every investment of nil↩ (the starting price of google stock).
For every initial stock you purchase, you're putting about nil↩ at risk 99% of the time.