IB Webinar

2018-09-27

First, install our packages:

install.packages("tidyverse")
install.packages("tidyquant")
install.packages("timetk")
install.packages("tibbletime")
install.packages("scales")
install.packages("broom")

devtools::install_github("jbkunst/highcharter")

Introducing R and RStudio

+ Statistical programming language -> by data scientists, for data scientists
+ Base R + 17,000 packages
+ RStudio
+ Shiny

Packages for today

library(tidyverse)
library(tidyquant)
library(timetk)
library(tibbletime)
library(scales)
library(highcharter)
library(broom)
library(PerformanceAnalytics)

List of packages for finance here: https://cran.r-project.org/web/views/Finance.html

Today’s project

+ Buy when the SP500 50-day SMA is above the 200-day SMA (a 'golden cross')

+ Sell when the 50-day SMA moves below the 200-day SMA (a 'death cross'). 

+ Code up that strategy

+ visualiize its results and descriptive statistics

+ Compare it to buy-and-hold, add secondary logic, visualize that strategy

+ Conclude by building a Shiny dashboard for further exploration

Import data

We will be working with SP500 and treasury bills data so that when we exit the SP500 we can invest in treasuries and get some return.

We can use the tidyquant package and it’s tq_get() function to grab the data from yahoo! finance.

symbols <- c("^GSPC", "^IRX")


prices <- 
  tq_get(symbols, 
         get = "stock.prices",
         from = "1990-01-01")

head(prices)

# A tibble: 6 x 8
  symbol date        open  high   low close    volume adjusted
  <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
1 ^GSPC  1990-01-02  353.  360.  352.  360. 162070000     360.
2 ^GSPC  1990-01-03  360.  361.  358.  359. 192330000     359.
3 ^GSPC  1990-01-04  359.  359.  353.  356. 177000000     356.
4 ^GSPC  1990-01-05  356.  356.  351.  352. 158530000     352.
5 ^GSPC  1990-01-08  352.  354.  351.  354. 140110000     354.
6 ^GSPC  1990-01-09  354.  354.  350.  350. 155210000     350.

Let’s see how to import this data from an Excel file or a csv file if that were desired.

prices_excel <-  
 read_excel("prices.xlsx") %>% 
  mutate(date = ymd(date))

prices_csv <- 
 read_csv("prices.csv")  %>% 
  mutate(date = ymd(date))

Explore the raw data

Start with the simple line chart.

prices %>% 
  filter(symbol == "^GSPC") %>% 
  hchart(., 
         hcaes(x = date, y = adjusted),
         type = "line") %>% 
  hc_title(text = "GSPC prices")

Now we have our raw data and it looks like there are no issues with it.

Calculate Returns

We will use select(), rename, and mutate() from the dplyr package to calculate returns, then table.Stats() from PerformanceAnayltics to get some quick stats.

  prices %>% 
  select(symbol, date, adjusted) %>% 
  spread(symbol, adjusted) %>%
  rename(sp500 = `^GSPC`, treas = `^IRX`) %>% 
  mutate(sp500_returns = log(sp500) - log(lag(sp500)),
         daily_treas = (1 + (treas/100)) ^ (1/252) - 1) %>% 
  select(date, sp500_returns, daily_treas) %>% 
  tq_performance(Ra = sp500_returns,
                 performance_fun = table.Stats) %>% 
  t() %>% 
  knitr::kable()

ArithmeticMean	0.0003
GeometricMean	0.0002
Kurtosis	8.9812
LCLMean(0.95)	0.0000
Maximum	0.1096
Median	0.0005
Minimum	-0.0947
NAs	1.0000
Observations	7243.0000
Quartile1	-0.0044
Quartile3	0.0055
SEMean	0.0001
Skewness	-0.2644
Stdev	0.0110
UCLMean(0.95)	0.0005
Variance	0.0001

We calculated returns and demnostrated 4 of the most common data wrangling functions. We also transformed our data, it’s not raw anymore. Let’s visualize our transformation with a scatter plot by calling geom_point().

prices %>% 
  select(symbol, date, adjusted) %>% 
  spread(symbol, adjusted) %>%
  rename(sp500 = `^GSPC`, treas = `^IRX`) %>% 
  mutate(sp500_returns = log(sp500) - log(lag(sp500))) %>% 
  ggplot(aes(x = date, y = sp500_returns)) +
  geom_point(color = "cornflowerblue") +
  scale_x_date(breaks = pretty_breaks(n = 30)) +
  labs(title = "SP500 daily returns",
      y = "daily percent") +
  theme(axis.text.x = element_text(angle = 90))

Look at 2000-2002 and late 2008 to 2009.

Moving Averages: our strategy

To start exploring the SMA 50 v SMA 200 strategy, we need to calculate the rolling 50 and 200-day moving averages. Up until now, we have been using built-in functions from various R packages. Now we will create our own with rollify(), to run the mean() function into a moving average calculator.

roll_mean_50 <- 
  rollify(mean, window = 50)

roll_mean_200 <- 
  rollify(mean, window = 200)

Now we use mutate() to add the moving averages to our original data. mutate() adds columns to our data.

prices %>% 
  select(symbol, date, adjusted) %>% 
  spread(symbol, adjusted) %>%
  rename(sp500 = `^GSPC`, treas = `^IRX`) %>% 
  mutate(sma_200 = roll_mean_200(sp500),
         sma_50 = roll_mean_50(sp500)) %>% 
  na.omit() %>% 
  tail(5)

# A tibble: 5 x 5
  date       sp500 treas sma_200 sma_50
  <date>     <dbl> <dbl>   <dbl>  <dbl>
1 2018-09-24 2919.  2.12   2752.  2860.
2 2018-09-25 2916.  2.17   2753.  2862.
3 2018-09-26 2906.  2.16   2754.  2864.
4 2018-09-27 2914   2.14   2756.  2866.
5 2018-09-28 2914.  2.15   2757.  2868.

Let’s get systematic

What’s our logic: If the 50-day MA is above the 200-day MA, buy the market, else go to the risk free return, or can put to zero if we prefer. For our purposes, we assume no taxes or trading costs. Buy == get the daily return.

What we need:

1) rolling 50-day SMA
2) rolling 200-day SMA
3) if_else logic to create a buy or sell signal
4) sp500 daily returns/treasury daily returns
5) apply the signal to our returns with

signal = if_else(sma_50 > sma_200, 1, 0)

trend_returns = if_else(lag(signal) == 1, (signal * sp500_returns), daily_treas))

prices %>% 
  select(symbol, date, adjusted) %>% 
  spread(symbol, adjusted) %>%
  rename(sp500 = `^GSPC`, treas = `^IRX`) %>% 
  mutate(sp500_returns = log(sp500) - log(lag(sp500)), 
         daily_treas = (1 + (treas/100)) ^ (1/252) - 1,
         sma_200 = roll_mean_200(sp500),
         sma_50 = roll_mean_50(sp500)) %>% 
  na.omit() %>% 
  mutate(signal = if_else(sma_50 > sma_200, 1, 0),
         trend_returns = if_else(lag(signal) == 1, (signal * sp500_returns), daily_treas)) %>% 
  select(-treas, -sp500)

# A tibble: 7,018 x 7
   date       sp500_returns daily_treas sma_200 sma_50 signal
   <date>             <dbl>       <dbl>   <dbl>  <dbl>  <dbl>
 1 1990-10-15      0.0106      0.000274    339.   319.      0
 2 1990-10-16     -0.0143      0.000274    339.   318.      0
 3 1990-10-17     -0.000535    0.000276    338.   317.      0
 4 1990-10-18      0.0231      0.000278    338.   317.      0
 5 1990-10-19      0.0218      0.000278    338.   316.      0
 6 1990-10-22      0.00727     0.000277    338.   316.      0
 7 1990-10-23     -0.00765     0.000276    337.   315.      0
 8 1990-10-24      0.000768    0.000277    337.   315.      0
 9 1990-10-25     -0.00780     0.000275    337.   314.      0
10 1990-10-26     -0.0178      0.000273    337.   313.      0
# ... with 7,008 more rows, and 1 more variable: trend_returns <dbl>

We now have a column for our trend strategy returns. Let’s add a buy and hold strategy where we invest 90% in the SP500 and 10% in treasuries and hold it for the duration.

buy_hold_returns = (.9 * sp500_returns) + (.1 * daily_treas)

prices %>% 
  select(symbol, date, adjusted) %>% 
  spread(symbol, adjusted) %>%
  rename(sp500 = `^GSPC`, treas = `^IRX`) %>% 
  mutate(
         sp500_returns = log(sp500) - log(lag(sp500)),
         daily_treas = (1 + (treas/100)) ^ (1/252) - 1,
         sma_200 = roll_mean_200(sp500),
         sma_50 = roll_mean_50(sp500)) %>% 
  na.omit() %>% 
  mutate(signal = if_else(sma_50 > sma_200, 1, 0),
         buy_hold_returns = (.9 * sp500_returns) + (.1 * daily_treas),
         trend_returns = if_else(lag(signal) == 1, (signal * sp500_returns), daily_treas)) %>% 
  select(date, trend_returns, buy_hold_returns)

# A tibble: 7,018 x 3
   date       trend_returns buy_hold_returns
   <date>             <dbl>            <dbl>
 1 1990-10-15     NA                0.00958 
 2 1990-10-16      0.000274        -0.0129  
 3 1990-10-17      0.000276        -0.000454
 4 1990-10-18      0.000278         0.0208  
 5 1990-10-19      0.000278         0.0197  
 6 1990-10-22      0.000277         0.00657 
 7 1990-10-23      0.000276        -0.00686 
 8 1990-10-24      0.000277         0.000719
 9 1990-10-25      0.000275        -0.00700 
10 1990-10-26      0.000273        -0.0160  
# ... with 7,008 more rows

We can add columns to see dollar growth for our two asset mixes as well. To calculate dollar growth, we add 1 to the daily returns, and then use accumulate().

sma_trend_results <- 
prices %>% 
  select(symbol, date, adjusted) %>% 
  spread(symbol, adjusted) %>%
  rename(sp500 = `^GSPC`, treas = `^IRX`) %>% 
  mutate(sma_200 = roll_mean_200(sp500),
         sma_50 = roll_mean_50(sp500),
         signal = if_else(sma_50 > sma_200, 1, 0),
         sp500_returns = log(sp500) - log(lag(sp500)), 
         daily_treas = (1 + (treas/100)) ^ (1/252) - 1,
         buy_hold_returns = (.9 * sp500_returns) + (.1 * daily_treas),
         trend_returns = if_else(lag(signal) == 1, (signal * sp500_returns), daily_treas)
         ) %>%
  na.omit() %>% 
  mutate(trend_growth = accumulate(1 + trend_returns, `*`),
         buy_hold_growth = accumulate(1 + buy_hold_returns, `*`))

sma_trend_results %>% tail()

# A tibble: 6 x 12
  date       sp500 treas sma_200 sma_50 signal sp500_returns daily_treas
  <date>     <dbl> <dbl>   <dbl>  <dbl>  <dbl>         <dbl>       <dbl>
1 2018-09-21 2930.  2.12   2750.  2857.      1   -0.000369     0.0000834
2 2018-09-24 2919.  2.12   2752.  2860.      1   -0.00352      0.0000834
3 2018-09-25 2916.  2.17   2753.  2862.      1   -0.00131      0.0000851
4 2018-09-26 2906.  2.16   2754.  2864.      1   -0.00329      0.0000848
5 2018-09-27 2914   2.14   2756.  2866.      1    0.00276      0.0000840
6 2018-09-28 2914.  2.15   2757.  2868.      1   -0.00000687   0.0000844
# ... with 4 more variables: buy_hold_returns <dbl>, trend_returns <dbl>,
#   trend_growth <dbl>, buy_hold_growth <dbl>

Visualize our trend strategy results

How does the dollar growth compare? use select() again to isolate our dollar growth columns. then gather() to a tidy format. What’s tidy? Each variable has its own column. That’s different from wide, which is easier for humans to read.

sma_trend_results %>%
  select(date, trend_growth, buy_hold_growth) %>% 
  gather(strategy, growth, -date) %>%
  hchart(., hcaes(x = date, y = growth, group = strategy),
         type = "line") %>% 
  hc_tooltip(pointFormat = "{point.strategy}: ${point.growth: .2f}")

Table Stats of our strategies

sma_trend_results %>%
  select(date, trend_returns, buy_hold_returns) %>%
  gather(strategy, returns, -date) %>%
  group_by(strategy) %>% 
  tq_performance(Ra = returns, 
                 performance_fun = table.Stats) %>% 
  t() %>% 
  knitr::kable()

strategy	trend_returns	buy_hold_returns
ArithmeticMean	3e-04	3e-04
GeometricMean	3e-04	2e-04
Kurtosis	8.0652	9.1202
LCLMean(0.95)	2e-04	1e-04
Maximum	0.0499	0.0986
Median	1e-04	5e-04
Minimum	-0.0711	-0.0852
NAs	0	0
Observations	7017	7017
Quartile1	-0.0020	-0.0039
Quartile3	0.0033	0.0050
SEMean	1e-04	1e-04
Skewness	-0.5141	-0.2665
Stdev	0.0076	0.0100
UCLMean(0.95)	5e-04	5e-04
Variance	1e-04	1e-04

Part 2: Adding another layer of logic

Let’s get more complicated and add secondary logic.

We can also add a signal, for example a zscore for when market is a number of standard deviations above or below a rolling average.

How would we implement that?

Calculate a spread
turn it into a z-score, number of standard deviations scale
create a signal (back to ifelse)

trend_z_results <- 
prices %>% 
  select(symbol, date, adjusted) %>% 
  spread(symbol, adjusted) %>%
  rename(sp500 = `^GSPC`, treas = `^IRX`) %>% 
  mutate(sma_200 = roll_mean_200(sp500),
         sma_50 = roll_mean_50(sp500),
         sp500_returns = log(sp500) - log(lag(sp500)),
         daily_treas = (1 + (treas/100)) ^ (1/252) - 1) %>% 
  na.omit() %>% 
  mutate(trend_signal = if_else(sma_50 > sma_200, 1, 0),
         z_spread = (sp500 - sma_200),
         z_score = (z_spread - mean(z_spread))/sd(z_spread),
         z_signal = if_else(
                            lag(z_score, 1) < -.05 & 
                            lag(z_score, 2) < -.05 &
                            lag(z_score, 3) < -.05, 
                            0, 1),
         trend_z_returns = if_else(lag(trend_signal) == 1 &
                                 z_signal == 1, 
                                 (trend_signal * sp500_returns), daily_treas),
         trend_returns =  if_else(lag(trend_signal) == 1,
                                 (trend_signal * sp500_returns), daily_treas),
         buy_hold_returns = (.9 * sp500_returns) + (.1 * daily_treas)) %>% 
  select(date, trend_signal, z_signal, buy_hold_returns, trend_returns, trend_z_returns, daily_treas) %>%
  na.omit() %>% 
  mutate(
         trend_growth = accumulate(1 + trend_returns, `*`),
         trend_z_growth = accumulate(1 + trend_z_returns, `*`),
         buy_hold_growth = accumulate(1 + buy_hold_returns, `*`))

trend_z_results %>% tail()

# A tibble: 6 x 10
  date       trend_signal z_signal buy_hold_returns trend_returns
  <date>            <dbl>    <dbl>            <dbl>         <dbl>
1 2018-09-21            1        1      -0.000323     -0.000369  
2 2018-09-24            1        1      -0.00316      -0.00352   
3 2018-09-25            1        1      -0.00117      -0.00131   
4 2018-09-26            1        1      -0.00296      -0.00329   
5 2018-09-27            1        1       0.00249       0.00276   
6 2018-09-28            1        1       0.00000226   -0.00000687
# ... with 5 more variables: trend_z_returns <dbl>, daily_treas <dbl>,
#   trend_growth <dbl>, trend_z_growth <dbl>, buy_hold_growth <dbl>

Visualize our trend + z strategy results

trend_z_results %>%
  select(date, trend_growth, trend_z_growth, buy_hold_growth, daily_treas) %>% 
  gather(strategy, growth, -date, -daily_treas) %>% 
  hchart(., hcaes(x = date, y = growth, group = strategy), type = "line") %>% 
  hc_tooltip(pointFormat = "{point.strategy}: ${point.growth: .2f}")

Our original trend has grown higher, but the z-score logic seems more less prone to large drawdowns becaus it has the extra layer of downside exit logic.

Table Stats of our new strategies

trend_z_results %>%
  select(date, trend_returns, trend_z_growth, buy_hold_returns) %>%
  gather(strategy, returns, -date) %>%
  group_by(strategy) %>% 
  tq_performance(Ra = returns, 
                 performance_fun = table.Stats) %>% 
  t() %>% 
  knitr::kable()

strategy	trend_returns	trend_z_growth	buy_hold_returns
ArithmeticMean	0.0003	2.4200	0.0003
GeometricMean	0.0003	2.2935	0.0002
Kurtosis	8.0669	-0.2279	9.1226
LCLMean(0.95)	0.0002	2.3984	0.0001
Maximum	0.0499	4.9485	0.0986
Median	0.0001	2.4349	0.0005
Minimum	-0.0711	0.9910	-0.0852
NAs	0	0	0
Observations	7015	7015	7015
Quartile1	-0.0020	2.0014	-0.0039
Quartile3	0.0033	2.8294	0.0050
SEMean	0.0001	0.0110	0.0001
Skewness	-0.5140	0.3492	-0.2668
Stdev	0.0076	0.9244	0.0100
UCLMean(0.95)	0.0005	2.4416	0.0005
Variance	0.0001	0.8546	0.0001