Looking back on 2019: part 1

2020-01-03

by Jonathan Regenstein

Welcome to Reproducible Finance 2020! It’s a new year, a new beginning, the Earth has completed one more trip around the sun and that means it’s time to look back on the previous January to December cycle.

Today and next time, we’ll explore the returns and volatilities of various market sectors in 2019.

From an R code perspective, we will get familiar with a new source for market data (tiingo, which has come up in several conversations recently and seems to be gaining nice traction in the R world), build some ggplots and dive into highcharter a bit. In that sense, it’s also somewhat of a look back to our previous work because we’ll be stepping through some good’ol data import, wrangling and visualization. Some of the code flows might look familiar to long time readers but if you’ve joined us recently, and haven’t gone back to read the oh-so-invigorating previous posts, this should give a good sense of how we think about working with financial data.

Let’s get to it.

We want to import data on 10 sector ETFs and also on SPY, the market ETF.

We’ll first need the tickers of each sector ETF:

ticker = ("XLY", "XLP", "XLE",  
          "XLF", "XLV", "XLI", "XLB", 
          "XLK", "XLU", "XLRE", 
          "SPY")

And our sector labels are:

sector = ("Consumer Discretionary", "Consumer Staples", "Energy", 
          "Financials", "Health Care", "Industrials", "Materials", 
          "Information Technology", "Utilities", "Real Estate",
          "Market")

We can use the tibble() function to save those as columns of new tibble.

First, let’s load up our packages for the day because we’ll need the tibble package via tidyverse.

library(tidyverse)
library(tidyquant)
library(riingo)
library(timetk)
library(tibbletime)
library(highcharter)
library(plotly)
library(htmltools)

And on to creating a tibble:

etf_ticker_sector <- tibble(
  ticker = c("XLY", "XLP", "XLE",   
          "XLF", "XLV", "XLI", "XLB", 
          "XLK", "XLU", "XLRE", 
          "SPY"),   
  sector = c("Consumer Discretionary", "Consumer Staples", "Energy", 
          "Financials", "Health Care", "Industrials", "Materials", 
          "Information Technology", "Utilities", "Real Estate",
          "Market")
)

etf_ticker_sector

# A tibble: 11 x 2
   ticker sector                
   <chr>  <chr>                 
 1 XLY    Consumer Discretionary
 2 XLP    Consumer Staples      
 3 XLE    Energy                
 4 XLF    Financials            
 5 XLV    Health Care           
 6 XLI    Industrials           
 7 XLB    Materials             
 8 XLK    Information Technology
 9 XLU    Utilities             
10 XLRE   Real Estate           
11 SPY    Market

Now we want to import the daily prices for 2019 for these tickers. For that, we’ll use the excellent tiingo which we access via the riingo package. The workhorse function to grab price data is riingo_prices, to which we need to supply our tickers and a start_date/end_date pair.

Let’s start with the tickers, which we have already saved in the ticker column of etf_ticker_sector. That wasn’t really necssary. We could have just created a vector called tickers_vector by calling tickers_vector = c("ticker1", "ticker2", ...) and then passed that vector straight to riingo_prices. But I didn’t want to do that because I prefer to get my data to a tibble first and, as we’ll see, it will make it easier to add back in our sector labels, since they are aligned with our tickers in one object.

To pass our ticker column to riingo_prices(), we start with our tibble etf_ticker_sector and then pipe it to pull(ticker). That will create a vector from the ticker column. The pull() function is very useful in these situations where we want to pipe or extract a column as a vector.

Here’s the result of pulling the tickers:

  etf_ticker_sector %>%
  pull(ticker)

 [1] "XLY"  "XLP"  "XLE"  "XLF"  "XLV"  "XLI"  "XLB"  "XLK"  "XLU"  "XLRE"
[11] "SPY"

Now we want to pass those tickers to riingo_prices(), but first we need to create an API key. riingo makes that quite convenient:

riingo_browse_signup()
# This requires that you are signed in on the site once you sign up
riingo_browse_token()

Then we set our key for use this session with:

# Need an API key for tiingo

riingo_set_token("your API key here")

Now we can pipe straight to riingo_prices(). We will set start_date = "2018-12-29" and end_date = "2019-12-31" to get prices for just 2019.

  etf_ticker_sector %>%
  pull(ticker) %>% 
  riingo_prices(., 
                start_date = "2018-12-29",
                end_date = "2019-12-31") %>%
  mutate(date = ymd(date)) %>% 
  left_join(etf_ticker_sector, by = "ticker") %>%
  select(sector, everything()) %>%
  group_by(ticker) %>% 
  slice(1)

# A tibble: 11 x 15
# Groups:   ticker [11]
   sector ticker date       close  high   low  open volume adjClose adjHigh
   <chr>  <chr>  <date>     <dbl> <dbl> <dbl> <dbl>  <int>    <dbl>   <dbl>
 1 Market SPY    2018-12-31 250.  250.  247.  250.  1.44e8    245.    246. 
 2 Mater… XLB    2018-12-31  50.5  50.6  50.0  50.3 7.80e6     49.5    49.6
 3 Energy XLE    2018-12-31  57.4  57.8  56.7  57.6 1.87e7     53.7    54.1
 4 Finan… XLF    2018-12-31  23.8  23.9  23.5  23.7 5.64e7     23.3    23.4
 5 Indus… XLI    2018-12-31  64.4  64.5  63.7  64.2 1.01e7     63.1    63.2
 6 Infor… XLK    2018-12-31  62.0  62.2  61.5  61.9 1.69e7     61.2    61.4
 7 Consu… XLP    2018-12-31  50.8  50.8  50.3  50.7 1.50e7     49.4    49.5
 8 Real … XLRE   2018-12-31  31    31.0  30.5  31.0 7.81e6     30.0    30.1
 9 Utili… XLU    2018-12-31  52.9  53.1  52.4  52.8 1.63e7     51.3    51.5
10 Healt… XLV    2018-12-31  86.5  86.6  85.7  85.7 1.02e7     84.6    84.6
11 Consu… XLY    2018-12-31  99.0  99.4  98.0  98.8 6.22e6     97.7    98.1
# … with 5 more variables: adjLow <dbl>, adjOpen <dbl>, adjVolume <int>,
#   divCash <dbl>, splitFactor <dbl>

Okay, we have daily data for our ETFs and sector labels. Now, let’s calculate the daily returns of each sector. We’ll start by slimming our data down to just the sector, date and adjClose columns. Then we’ll group_by(sector) and calculate daily returns with mutate(daily_return = log(adjClose) - log(lag(adjClose))).

sector_returns_2019 <- 
  etf_ticker_sector %>%
  pull(ticker) %>% 
  riingo_prices(., 
                start_date = "2018-12-29",
                end_date = "2019-12-31") %>%
  mutate(date = ymd(date)) %>%
  left_join(etf_ticker_sector, by = "ticker") %>%
  select(sector, date, adjClose) %>%
  group_by(sector) %>% 
  mutate(daily_return = log(adjClose) - log(lag(adjClose))) %>% 
  na.omit()

We have our data and now the fun part - let’s do some exploration and visualization and get a feel for 2018. We start with ggplot() and create a chart showing the daily return of each ETF, colored. We want date on the x axis, daily returns on the y-axis and different colors by sector. That means a call to ggplot().

(
sector_returns_2019 %>% 
  ggplot(aes(x = date, y = daily_return, color = sector)) + 
  geom_col(show.legend = FALSE) + 
  facet_wrap(~sector, ncol = 3) +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  scale_y_continuous(labels = function(x) paste0(x, "%")) +
  # clean up the x and y axis titles
  labs(x = "", y = "daily returns") 
) %>% 
  ggplotly()

Next time, we will get more rigorous about such assessments by using the rolling standard deviation (and we’ll add a new tool by using exponential weighting) but the basic pipeline of raw data to tibble for wrangling and transformation then to chart for exploration will remain the same.

Let’s stick with just daily returns for today and plot the same data with a different color schema. Instead of coloring by sector, let’s color by whether the daily return was positive or negative. This is going to be similar to what we did in a previous post on highcharting jobs Friday.

First let’s create two new columns called col_pos and col_neg. col_pos will hold the daily returns that are positive and an NA for returns are negative. We code that with:

sector_returns_2019 %>% 
mutate(col_pos =if_else(daily_return > 0, 
                  daily_return, as.numeric(NA)))

And col_neg will hold negative returns:

sector_returns_2019 %>% 
mutate(col_neg =if_else(daily_return < 0, 
                  daily_return, as.numeric(NA)))

Then we’ll tell ggplot() to chart those two columns in their own geoms and can choose a custom color. The geoms won’t overlap because they have no common data. Here is the full code flow. We start with sector_returns_2019, create our new color columns, then pipe to ggplot().

(
sector_returns_2019 %>% 
   mutate(col_pos = 
           if_else(daily_return > 0, 
                  daily_return, as.numeric(NA)),
         col_neg = 
           if_else(daily_return < 0, 
                  daily_return, as.numeric(NA))) %>%
  ggplot(aes(x = date)) +
  geom_col(aes(y = col_neg),
               alpha = .85, 
               fill = "pink", 
               color = "pink") +
  geom_col(aes(y = col_pos),
               alpha = .85, 
               fill = "cornflowerblue", 
               color = "cornflowerblue") +
  facet_wrap(~sector, shrink = FALSE, ncol = 2) +
  labs(title = "2019 daily returns", y = "daily returns") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        strip.background = element_blank(),
        strip.placement = "inside",
        strip.text = element_text(size=15),
        panel.spacing = unit(0.2, "lines") ,
        panel.background=element_rect(fill="white"))
) %>% 
  ggplotly()

Before we close, let’s take some of this work and translate it to highcharter. If we simpy want to chart one sector’s daily returns, it would be a relatively straightforward mapping from ggplot() to highcharter. We start with our tibble sector_returns_2019, add a column to hold different color hex codes, and then pass the data to hchart() using a ., and set aesthetics with hcaes(). The key is to first filter() down to our sector of choice, in this caes filter(sector == "Information Technology").

sector_returns_2019 %>%
  mutate(color_col = if_else(daily_return > 0, "#6495ed", "#ff9999"),
          date = ymd(date)) %>% 
  filter(sector == "Information Technology") %>% 
  hchart(., hcaes(x = date, y = daily_return, color = color_col),
         type = "column",
         pointWidth = 4)

That chart looks good and we can hover on the bars to see the exact daily return and date for each observation.

I love highcharter but, I must say, it doesn’t have a good way to facet_wrap() and build separate charts for each sector. We can dream about some sort of hc_facet function but for now we’ll need to build a custom function for that job, similar to what we did for the jobs report.

We start by spreading our data to wide format, because we’re going to build a chart using each column.

sector_returns_2019_wide <- 
  sector_returns_2019 %>% 
  select(-adjClose) %>% 
  spread(sector, daily_return)

sector_returns_2019_wide

# A tibble: 252 x 12
   date       `Consumer Discr… `Consumer Stapl…   Energy Financials
   <date>                <dbl>            <dbl>    <dbl>      <dbl>
 1 2019-01-02         0.00755          -0.00593  0.0195    0.00836 
 2 2019-01-03        -0.0219           -0.00576 -0.00997  -0.0227  
 3 2019-01-04         0.0326            0.0211   0.0335    0.0327  
 4 2019-01-07         0.0224           -0.00137  0.0148    0.00124 
 5 2019-01-08         0.0110            0.00914  0.00771   0.000823
 6 2019-01-09         0.00507          -0.00933  0.0157    0.00492 
 7 2019-01-10        -0.00287           0.00584  0.00289   0.000409
 8 2019-01-11         0.000957          0.00291 -0.00595   0.00245 
 9 2019-01-14        -0.00614          -0.00485 -0.00242   0.00691 
10 2019-01-15         0.00900           0.00988  0.00355   0.00888 
# … with 242 more rows, and 7 more variables: `Health Care` <dbl>,
#   Industrials <dbl>, `Information Technology` <dbl>, Market <dbl>,
#   Materials <dbl>, `Real Estate` <dbl>, Utilities <dbl>

Now for our function that will create a separate highchart for each sector, we start with map and pass in the columns names from that wide tibble we just created. That’s how we will iterate over each sector. After mapping across the names, we use function(x) to pass the column name into our code flow.

map(names(sector_returns_2019_wide[2:11]), function(x){

sector_returns_2019_hc <- 
  sector_returns_2019 %>% 
  filter(sector == x) %>% 
   mutate(coloract = if_else(daily_return > 0, "#6495ed", "#ff9999"))

highchart() %>%
  hc_title(text = paste(x, "2018 daily returns", sep = " ")) %>%
  hc_add_series(sector_returns_2019_hc,  
                type = "column", 
                pointWidth = 4,
                hcaes(x = date,
                      y = daily_return,
                      color = coloract),
                name = "daily return") %>% 
  hc_xAxis(type = "datetime") %>%
  hc_tooltip(pointFormat = "{point.date}: {point.daily_return: .4f}%") %>% 
  hc_legend(enabled = FALSE) %>% 
  hc_exporting(enabled = TRUE)
})

If you run the code chunk above, it will create 11 separate highcharts of our data as stand alone charts, meaning they won’t be layed out with any structure. That’s fine, but I want to be able to lay these out in a grid and control the height of each chart. For that we use hw_grid(rowheight = 300, ncol = 3) %>% htmltools::browsable(). That will create a grid that displays each of the charts - one for each sector’s daily returns.

map(names(sector_returns_2019_wide[2:11]), function(x){

sector_returns_2019_hc <- 
  sector_returns_2019 %>% 
  filter(sector == x) %>% 
   mutate(coloract = if_else(daily_return > 0, "#6495ed", "#ff9999"),
          date = ymd(date))

highchart() %>%
  hc_title(text = paste(x, "2018 daily returns", sep = " ")) %>%
  hc_add_series(sector_returns_2019_hc,  
                type = "column", 
                pointWidth = 4,
                hcaes(x = date,
                      y = daily_return,
                      color = coloract),
                name = "daily return") %>% 
  hc_xAxis(type = "datetime") %>%
  hc_tooltip(pointFormat = "{point.date}: {point.daily_return: .4f}%") %>% 
  hc_legend(enabled = FALSE) %>% 
  hc_exporting(enabled = TRUE)
}) %>% 
  hw_grid(rowheight = 300, ncol = 3) %>% 
  htmltools::browsable()

That’s all for today. In general, the flow here was to create a tibble of tickers and labels, grab price data, visualize daily returns organized by the original labels. We applied it to sectors, but it could just as easily be applied to other labels, like risk levels, geography, beta, etc.

Shameless book plug for those who read to the end: if you like this sort of thing, check out my book Reproducible Finance with R!

Thanks for reading and see you next time.