In this post, we will review how to import ETF prices and transform them to monthly returns.
We will use a few packages and for completeness, we will cover several paths to getting our desired end result. Obviously it’s not necessary to utilize all of our paths here. The intention is to show several methods so that the best one can be chosen for a given challenge or project.
Let’s load up our packages.
library(tidyverse)
library(tidyquant)
library(timetk)
library(tibbletime)
We are building toward analyzing the returns of a 5-asset portfolio consisting of the following.
+ SPY (S&P500 fund) weighted 25%
+ EFA (a non-US equities fund) weighted 25%
+ IJS (a small-cap value fund) weighted 20%
+ EEM (an emerging-mkts fund) weighted 20%
+ AGG (a bond fund) weighted 10%
I chose those 5 assets because they seem to offer a balanced blend of large, small, international, emerging and bond exposure. We will eventually build a Shiny app to let users choose different assets and see different results. Today, we are just going to work with the individual prices/returns of those 5 assets.
Importing the data
On to step 1, wherein we import adjusted prices for the 5 ETFs to be used in our portfolio and save them to an xts
object called prices
.
We need a vector of ticker symbols that we will then pass to Yahoo! Finance via the getSymbols
function from the quantmod
package. This will return an object with the opening price, closing price, adjusted price, daily high, daily low and daily volume. We don’t want to work with all of those, though. The adjusted price is what we need.
To isolate the adjusted price, we use the map
function from the purrr
package and apply Ad(get(.))
to the imported prices. This will ‘get’ the adjusted price from each of our individual price objects. We could stop here and have the right substance, but the format wouldn’t be great as we would have a list
of 5 adjusted prices. The map
function returns a list by default but I find them to be unwieldy.
The reduce(merge)
function will allow us to merge the lists into one object and coerce back to an xts
structure. Finally, we want intuitive column names and use colnames<-
to rename the columns. The rename
function from dplyr
will not work well here because the object structure is still xts
.
# The symbols vector holds our tickers.
symbols <- c("SPY","EFA", "IJS", "EEM","AGG")
prices <- getSymbols(symbols,
src = 'yahoo',
from = "2012-12-31",
to = "2017-12-31",
auto.assign = TRUE,
warnings = FALSE) %>%
map(~Ad(get(.))) %>%
reduce(merge) %>%
`colnames<-`(symbols)
The XTS World
We now have an xts
object of the adjusted prices for our 5 assets. Have a quick peek.
head(prices)
SPY EFA IJS EEM AGG
2012-12-31 127.7356 48.20629 74.81863 39.63340 97.98471
2013-01-02 131.0094 48.95237 76.87150 40.41088 97.87003
2013-01-03 130.7135 48.47759 76.76980 40.12492 97.62305
2013-01-04 131.2875 48.72346 77.37086 40.20534 97.72893
2013-01-07 130.9287 48.51151 76.91773 39.90150 97.67600
2013-01-08 130.5520 48.24020 76.62183 39.54404 97.76419
Our first reading is from January 3, 2005 (the first trading day of that year) and we have daily prices. Let’s stay in thexts
world and convert to monthly prices using a call to to.monthly(prices, indexAt = "last", OHLC = FALSE)
from quantmod
. The argument index = "last"
tells the function whether we want to index to the first day of the month or the last day.
prices_monthly <- to.monthly(prices, indexAt = "last", OHLC = FALSE)
head(prices_monthly)
SPY EFA IJS EEM AGG
2012-12-31 127.7356 48.20629 74.81863 39.63340 97.98471
2013-01-31 134.2744 50.00364 78.82265 39.51723 97.37608
2013-02-28 135.9876 49.35931 80.10801 38.61464 97.95142
2013-03-28 141.1512 50.00364 83.39879 38.22143 98.04794
2013-04-30 143.8630 52.51315 83.50081 38.68614 98.99760
2013-05-31 147.2596 50.92775 87.08048 36.81840 97.01658
We now have an xts
object, and we have moved from daily prices to monthly prices.
Now we’ll call Return.calculate(prices_monthly, method = "log")
to convert to returns and save as an object called assed_returns_xts
. Note this will give us log returns by the method = "log"
argument. We could have used method = "discrete"
to get simple returns.
asset_returns_xts <- na.omit(Return.calculate(prices_monthly, method = "log"))
head(asset_returns_xts)
SPY EFA IJS EEM AGG
2013-01-31 0.04992311 0.03660641 0.052133484 -0.002935494 -0.0062309021
2013-02-28 0.01267821 -0.01296938 0.016175381 -0.023105250 0.0058910464
2013-03-28 0.03726766 0.01296938 0.040257940 -0.010235048 0.0009849727
2013-04-30 0.01903006 0.04896773 0.001222544 0.012085043 0.0096390038
2013-05-31 0.02333571 -0.03065563 0.041976371 -0.049483592 -0.0202136957
2013-06-28 -0.01343432 -0.02715331 -0.001402974 -0.054739116 -0.0157787232
From a substantive perspective, we could stop right now. Our task has been accomplished: we have imported daily prices, trimmed to adjusted prices, moved to monthly prices and transformed to monthly log returns.
If we wished to visualize these log returns, we would go straight to highcharter
and pass in the various xts
series.
For now, though, let’s take a look at a few more methods.
The Tidyverse and Tidyquant World
We now take the same raw data, which is the prices
object we created upon data import and convert it to monthly returns using 3 alternative methods. We will make use of the dplyr
, tidyquant
, timetk
and tibbletime
packages.
For our first method, we use dplyr
and timetk
to convert our object from an xts
object of prices to a tibble
of monthly returns. Once we convert to a tibble
the tidyverse is available for all sort of manipulations.
Let’s step through the logic before getting to the code chunk.
In the piped workflow below, our first step is to use the tk_tbl(preserve_index = TRUE, rename_index = "date")
function to convert from xts
to tibble
. The two arguments will convert the xts
date index to a date column, and rename it “date”. If we stopped here, we would have a new prices object in tibble
format.
Next we turn to dplyr
to gather
our new data frame into long format and then group_by
asset. We have not done any calculations yet, we have just shifted from wide format, to long, tidy format. Notice that when we gathered our data, we renamed one of the columns to returns
even though the data are still prices. The next step will explain why we did that.
Next, we want to calculate log returns and add those returns to the data frame. We will use mutate
and our own calculation to get log returns: mutate(returns = (log(returns) - log(lag(returns))))
. Notice that I am putting our new log returns into the returns
column by calling returns = ...
. This is going to remove the price data and replace it with log returns data. This is the explanation for why, when we called gather
in the previous step, we renamed the column to returns
. That allows us to simply replace that column with log return data instead of having to create a new column and then delete the price data column.
Our last two steps are to spread
the data back to wide format, which makes it easier to compare to the xts
object and easier to read, but is not a best practice in the tidyverse. We are going to look at this new object and compare to the xts
object above, so we will stick with wide format for now.
Finally, we want to reorder the columns to align with the symbols
vector. That’s important because when we build a portfolio, we will use that vector to coordinate our different weights.
asset_returns_dplyr_byhand <- prices %>%
to.monthly(indexAt = "last", OHLC = FALSE) %>%
tk_tbl(preserve_index = TRUE, rename_index = "date") %>%
gather(asset, returns, -date) %>%
group_by(asset) %>%
mutate(returns = (log(returns) - log(lag(returns)))) %>%
spread(asset, returns) %>%
select(date, symbols)
For our second method in the tidy world, we’ll use the tq_transmute
function from tidyquant
. Instead of using to.monthly
and mutate
, and then supplying our own calculation, we use tq_transmute(mutate_fun = periodReturn, period = "monthly", type = "log")
and go straight from daily prices to monthly log returns. Note that we select the period as ‘monthly’ in that function call, which means we can pass in the raw daily prices
xts
object..
asset_returns_tq_builtin <- prices %>%
tk_tbl(preserve_index = TRUE, rename_index = "date") %>%
gather(asset, prices, -date) %>%
group_by(asset) %>%
tq_transmute(mutate_fun = periodReturn, period = "monthly", type = "log") %>%
spread(asset, monthly.returns) %>%
select(date, symbols)
Our third method in the tidy world will produce the same output as the previous two - a tibble
of monthly log returns - but we will also introduce the tibbletime
package and it’s function as_period
. As the name implies, this function allows us to cast the prices time series from daily to monthly (or weekly or quarterly etc.) in our tibble
instead of having to apply the to.monthly
function to the xts
object as we did previously.
Furthermore, unlike the previous code chunk above where we went from daily prices straight to monthly returns, here we go from daily prices to monthly prices to monthly returns. That is, we will first create a tibble
of monthly prices, then pipe to create monthly returns.
We don’t have a substantive reason for doing that here, but it could prove useful if there’s a time when we need to get monthly prices in isolation during a tidyverse-based piped workflow.
asset_returns_tbltime <-
prices %>%
tk_tbl(preserve_index = TRUE, rename_index = "date") %>%
tbl_time(index = date) %>%
as_period("monthly", side = "end") %>%
gather(asset, returns, -date) %>%
group_by(asset) %>%
tq_transmute(mutate_fun = periodReturn, type = "log") %>%
spread(asset, monthly.returns) %>%
select(date, symbols)
Let’s take a peek at our 4 monthly log return objects.
head(asset_returns_xts)
SPY EFA IJS EEM AGG
2013-01-31 0.04992311 0.03660641 0.052133484 -0.002935494 -0.0062309021
2013-02-28 0.01267821 -0.01296938 0.016175381 -0.023105250 0.0058910464
2013-03-28 0.03726766 0.01296938 0.040257940 -0.010235048 0.0009849727
2013-04-30 0.01903006 0.04896773 0.001222544 0.012085043 0.0096390038
2013-05-31 0.02333571 -0.03065563 0.041976371 -0.049483592 -0.0202136957
2013-06-28 -0.01343432 -0.02715331 -0.001402974 -0.054739116 -0.0157787232
head(asset_returns_dplyr_byhand)
# A tibble: 6 x 6
date SPY EFA IJS EEM AGG
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2012-12-31 NA NA NA NA NA
2 2013-01-31 0.0499 0.0366 0.0521 -0.00294 -0.00623
3 2013-02-28 0.0127 -0.0130 0.0162 -0.0231 0.00589
4 2013-03-28 0.0373 0.0130 0.0403 -0.0102 0.000985
5 2013-04-30 0.0190 0.0490 0.00122 0.0121 0.00964
6 2013-05-31 0.0233 -0.0307 0.0420 -0.0495 -0.0202
head(asset_returns_tq_builtin)
# A tibble: 6 x 6
date SPY EFA IJS EEM AGG
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2012-12-31 0 0 0 0 0
2 2013-01-31 0.0499 0.0366 0.0521 -0.00294 -0.00623
3 2013-02-28 0.0127 -0.0130 0.0162 -0.0231 0.00589
4 2013-03-28 0.0373 0.0130 0.0403 -0.0102 0.000985
5 2013-04-30 0.0190 0.0490 0.00122 0.0121 0.00964
6 2013-05-31 0.0233 -0.0307 0.0420 -0.0495 -0.0202
head(asset_returns_tbltime)
# A time tibble: 6 x 6
# Index: date
date SPY EFA IJS EEM AGG
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2012-12-31 0 0 0 0 0
2 2013-01-31 0.0499 0.0366 0.0521 -0.00294 -0.00623
3 2013-02-28 0.0127 -0.0130 0.0162 -0.0231 0.00589
4 2013-03-28 0.0373 0.0130 0.0403 -0.0102 0.000985
5 2013-04-30 0.0190 0.0490 0.00122 0.0121 0.00964
6 2013-05-31 0.0233 -0.0307 0.0420 -0.0495 -0.0202
First, have a look at the left most column in each object, where the date is stored. The asset_returns_xts
has a date index, not a column. It is accessed via index(asset_returns_xts)
. The data frame objects have a column called “date”, accessed via the $date
convention, e.g. asset_returns_dplyr_byhand$date
.
Second, notice the first observation in each of the objects. asset_returns_xts
elided December 31, 2012. asset_returns_dplyr_byhand
has an NA
for December 31, 2012, and asset_returns_tq_builtin
and asset_returns_tbltime
both have a 0
.
Do we care? Does it matter to our final calculations? Perhaps and probably. Either way, we need to be aware of the consequences of taking our raw data and reshaping it via different methods.
Third, each of these objects is in “wide” format, which in this case means there is a column for each of our assets. When we called spread
at the end of the piped code flow, we put the data frames back to wide format.
This is the format that xts
likes and it’s the format that is easier to read as a human. However, the tidyverse wants this data to be in long or tidy format so that each variable has its own column.
For our asset_returns objects, that would mean a column called “date”, a column called “asset” and a column called “returns”. To see that in action, here is how it looks.
asset_returns_long_format <-
asset_returns_dplyr_byhand %>%
gather(asset, returns, -date)
head(asset_returns_long_format)
# A tibble: 6 x 3
date asset returns
<date> <chr> <dbl>
1 2012-12-31 SPY NA
2 2013-01-31 SPY 0.0499
3 2013-02-28 SPY 0.0127
4 2013-03-28 SPY 0.0373
5 2013-04-30 SPY 0.0190
6 2013-05-31 SPY 0.0233
We now have 3 columns and if we want to run an operation like mutate
on each asset, we just need to call goup_by(asset)
in the piped flow.
That’s all for today. Next time we will visualize these individual asset returns (and will notice that our log returns have a normal distribution, which will be important when we run simulations) before combining them into portfolio returns based on the different weightings. Thanks and see you next time.