battlefin presentation

by Jonathan Regenstein

Introducing R and RStudio

+ Statistical programming language -> by data scientists, for data scientists
+ Base R + 17,000 packages
+ RStudio
+ Shiny
+ sparklyr -> big data 
+ tensorflow -> AI
+ Rmarkdown -> reproducible reports
+ database connectors
+ htmlwidgets

Packages for today

library(tidyverse)
library(tidyquant)
library(timetk)
library(tibbletime)
library(highcharter)
library(PerformanceAnalytics)

More packages for finance here: https://cran.r-project.org/web/views/Finance.html

Today’s project

+ Import and wrangle data on the financial services SPDR ETF

+ Import and wrangle data from Freddie Mac on housing prices

+ Try to find a signal and code up a toy strategy

+ Visualize its results and descriptive statistics

+ Compare it to buy-and-hold

+ Conclude by building a Shiny dashboard for further exploration

+ Data science work flow

Import data

We will use the tidyquant package and it’s tq_get() function to grab the data from public sources.

In real life, this would be a pointer to your proprietary data source for market data. Probably a data base or a data lake somewhere, possibly an excel spreadsheet or csv.

symbols <- "XLF"


prices <- 
  tq_get(symbols, 
         get = "stock.prices",
         from = "1998-01-01")


prices %>% 
  slice(1:5)
# A tibble: 5 x 7
  date        open  high   low close volume adjusted
  <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
1 1998-12-22  19.1  19.1  18.8  18.9  55800     9.71
2 1998-12-23  18.9  19.2  18.9  19.2  78700     9.85
3 1998-12-24  19.2  19.3  19.2  19.3  43800     9.92
4 1998-12-28  19.3  19.3  19.0  19.1  51900     9.79
5 1998-12-29  19.1  19.3  18.9  19.3 100800     9.89

Start with a line chart.

We will use highcharter to create a quick interactive chart.

hc_prices_daily <- 
prices %>% 
  hchart(., 
         hcaes(x = date, y = adjusted),
         type = "line") %>% 
  hc_title(text = "Explore prices") %>% 
  hc_tooltip(pointFormat = "XLF: ${point.y: .2f}")

hc_prices_daily

Why start with a simple line chart? Always good to make sure our data isn’t corrupted or missing values.

returns <- 
prices %>% 
  select(date, adjusted) %>% 
  mutate(returns = log(adjusted) - log(lag(adjusted))) %>% 
  na.omit()

returns %>% 
  slice(1:5)
# A tibble: 5 x 3
  date       adjusted  returns
  <date>        <dbl>    <dbl>
1 1998-12-23     9.85  0.0146 
2 1998-12-24     9.92  0.00658
3 1998-12-28     9.79 -0.0132 
4 1998-12-29     9.89  0.0106 
5 1998-12-30     9.85 -0.00396

Get some quick summary stats on the history of daily returns using the table.Stats function.

table.Stats(returns$returns)
                         
Observations    5066.0000
NAs                0.0000
Minimum           -0.1823
Quartile 1        -0.0071
Median             0.0004
Arithmetic Mean    0.0002
Geometric Mean     0.0000
Quartile 3         0.0078
Maximum            0.2698
SE Mean            0.0003
LCL Mean (0.95)   -0.0003
UCL Mean (0.95)    0.0007
Variance           0.0004
Stdev              0.0191
Skewness           0.5033
Kurtosis          20.6198
hc_returns_daily <- 
returns %>%
  hchart(., hcaes(x = date, y = returns),
               type = "scatter") %>% 
  hc_tooltip(pointFormat = '{point.x: %Y-%m-%d} <br>
                            {point.y:.4f}%')

hc_returns_daily