# Transform Daily Prices to Monthly Log Returns

by Jonathan Regenstein

In this post, we will review how to import ETF prices and transform them to monthly returns.

We will use a few packages and for completeness, we will cover several paths to getting our desired end result. Obviously it’s not necessary to utilize all of our paths here. The intention is to show several methods so that the best one can be chosen for a given challenge or project.

library(tidyverse)
library(tidyquant)
library(timetk)
library(tibbletime)

We are building toward analyzing the returns of a 5-asset portfolio consisting of the following.

+ SPY (S&P500 fund) weighted 25%
+ EFA (a non-US equities fund) weighted 25%
+ IJS (a small-cap value fund) weighted 20%
+ EEM (an emerging-mkts fund) weighted 20%
+ AGG (a bond fund) weighted 10%

I chose those 5 assets because they seem to offer a balanced blend of large, small, international, emerging and bond exposure. We will eventually build a Shiny app to let users choose different assets and see different results. Today, we are just going to work with the individual prices/returns of those 5 assets.

### Importing the data

On to step 1, wherein we import adjusted prices for the 5 ETFs to be used in our portfolio and save them to an xts object called prices.

We need a vector of ticker symbols that we will then pass to Yahoo! Finance via the getSymbols function from the quantmod package. This will return an object with the opening price, closing price, adjusted price, daily high, daily low and daily volume. We don’t want to work with all of those, though. The adjusted price is what we need.

To isolate the adjusted price, we use the map function from the purrr package and apply Ad(get(.)) to the imported prices. This will ‘get’ the adjusted price from each of our individual price objects. We could stop here and have the right substance, but the format wouldn’t be great as we would have a list of 5 adjusted prices. The map function returns a list by default but I find them to be unwieldy.

The reduce(merge) function will allow us to merge the lists into one object and coerce back to an xts structure. Finally, we want intuitive column names and use colnames<- to rename the columns. The rename function from dplyr will not work well here because the object structure is still xts.

# The symbols vector holds our tickers.
symbols <- c("SPY","EFA", "IJS", "EEM","AGG")

prices <- getSymbols(symbols,
src = 'yahoo',
from = "2012-12-31",
to = "2017-12-31",
auto.assign = TRUE,
warnings = FALSE) %>%
reduce(merge) %>%
colnames<-(symbols)

### The XTS World

We now have an xts object of the adjusted prices for our 5 assets. Have a quick peek.

head(prices)
                SPY      EFA      IJS      EEM      AGG
2012-12-31 127.7356 48.20629 74.81863 39.63340 97.98471
2013-01-02 131.0094 48.95237 76.87150 40.41088 97.87003
2013-01-03 130.7135 48.47759 76.76980 40.12492 97.62305
2013-01-04 131.2875 48.72346 77.37086 40.20534 97.72893
2013-01-07 130.9287 48.51151 76.91773 39.90150 97.67600
2013-01-08 130.5520 48.24020 76.62183 39.54404 97.76419

Our first reading is from January 3, 2005 (the first trading day of that year) and we have daily prices. Let’s stay in thextsworld and convert to monthly prices using a call to to.monthly(prices, indexAt = "last", OHLC = FALSE) from quantmod. The argument index = "last" tells the function whether we want to index to the first day of the month or the last day.

prices_monthly <- to.monthly(prices, indexAt = "last", OHLC = FALSE)

head(prices_monthly)
                SPY      EFA      IJS      EEM      AGG
2012-12-31 127.7356 48.20629 74.81863 39.63340 97.98471
2013-01-31 134.2744 50.00364 78.82265 39.51723 97.37608
2013-02-28 135.9876 49.35931 80.10801 38.61464 97.95142
2013-03-28 141.1512 50.00364 83.39879 38.22143 98.04794
2013-04-30 143.8630 52.51315 83.50081 38.68614 98.99760
2013-05-31 147.2596 50.92775 87.08048 36.81840 97.01658

We now have an xts object, and we have moved from daily prices to monthly prices.

Now we’ll call Return.calculate(prices_monthly, method = "log") to convert to returns and save as an object called assed_returns_xts. Note this will give us log returns by the method = "log" argument. We could have used method = "discrete" to get simple returns.

asset_returns_xts <- na.omit(Return.calculate(prices_monthly, method = "log"))

head(asset_returns_xts)
                   SPY         EFA          IJS          EEM           AGG
2013-01-31  0.04992311  0.03660641  0.052133484 -0.002935494 -0.0062309021
2013-02-28  0.01267821 -0.01296938  0.016175381 -0.023105250  0.0058910464
2013-03-28  0.03726766  0.01296938  0.040257940 -0.010235048  0.0009849727
2013-04-30  0.01903006  0.04896773  0.001222544  0.012085043  0.0096390038
2013-05-31  0.02333571 -0.03065563  0.041976371 -0.049483592 -0.0202136957
2013-06-28 -0.01343432 -0.02715331 -0.001402974 -0.054739116 -0.0157787232

From a substantive perspective, we could stop right now. Our task has been accomplished: we have imported daily prices, trimmed to adjusted prices, moved to monthly prices and transformed to monthly log returns.

If we wished to visualize these log returns, we would go straight to highcharter and pass in the various xts series.

For now, though, let’s take a look at a few more methods.

### The Tidyverse and Tidyquant World

We now take the same raw data, which is the prices object we created upon data import and convert it to monthly returns using 3 alternative methods. We will make use of the dplyr, tidyquant, timetk and tibbletime packages.

For our first method, we use dplyr and timetk to convert our object from an xts object of prices to a tibble of monthly returns. Once we convert to a tibble the tidyverse is available for all sort of manipulations.

Let’s step through the logic before getting to the code chunk.

In the piped workflow below, our first step is to use the tk_tbl(preserve_index = TRUE, rename_index = "date") function to convert from xts to tibble. The two arguments will convert the xts date index to a date column, and rename it “date”. If we stopped here, we would have a new prices object in tibble format.

Next we turn to dplyr to gather our new data frame into long format and then group_by asset. We have not done any calculations yet, we have just shifted from wide format, to long, tidy format. Notice that when we gathered our data, we renamed one of the columns to returns even though the data are still prices. The next step will explain why we did that.

Next, we want to calculate log returns and add those returns to the data frame. We will use mutate and our own calculation to get log returns: mutate(returns = (log(returns) - log(lag(returns)))). Notice that I am putting our new log returns into the returns column by calling returns = .... This is going to remove the price data and replace it with log returns data. This is the explanation for why, when we called gather in the previous step, we renamed the column to returns. That allows us to simply replace that column with log return data instead of having to create a new column and then delete the price data column.

Our last two steps are to spread the data back to wide format, which makes it easier to compare to the xts object and easier to read, but is not a best practice in the tidyverse. We are going to look at this new object and compare to the xts object above, so we will stick with wide format for now.

Finally, we want to reorder the columns to align with the symbols vector. That’s important because when we build a portfolio, we will use that vector to coordinate our different weights.

asset_returns_dplyr_byhand <- prices %>%
to.monthly(indexAt = "last", OHLC = FALSE) %>%
tk_tbl(preserve_index = TRUE, rename_index = "date") %>%
gather(asset, returns, -date) %>%
group_by(asset) %>%
mutate(returns = (log(returns) - log(lag(returns)))) %>%
select(date, symbols)

For our second method in the tidy world, we’ll use the tq_transmute function from tidyquant. Instead of using to.monthly and mutate, and then supplying our own calculation, we use tq_transmute(mutate_fun = periodReturn, period = "monthly", type = "log") and go straight from daily prices to monthly log returns. Note that we select the period as ‘monthly’ in that function call, which means we can pass in the raw daily prices xts object..

asset_returns_tq_builtin <- prices %>%
tk_tbl(preserve_index = TRUE, rename_index = "date") %>%
gather(asset, prices, -date) %>%
group_by(asset) %>%
tq_transmute(mutate_fun = periodReturn, period = "monthly", type = "log") %>%
select(date, symbols)

Our third method in the tidy world will produce the same output as the previous two - a tibble of monthly log returns - but we will also introduce the tibbletime package and it’s function as_period. As the name implies, this function allows us to cast the prices time series from daily to monthly (or weekly or quarterly etc.) in our tibble instead of having to apply the to.monthly function to the xts object as we did previously.

Furthermore, unlike the previous code chunk above where we went from daily prices straight to monthly returns, here we go from daily prices to monthly prices to monthly returns. That is, we will first create a tibble of monthly prices, then pipe to create monthly returns.

We don’t have a substantive reason for doing that here, but it could prove useful if there’s a time when we need to get monthly prices in isolation during a tidyverse-based piped workflow.

asset_returns_tbltime <-
prices %>%
tk_tbl(preserve_index = TRUE, rename_index = "date") %>%
tbl_time(index = date) %>%
as_period("monthly", side = "end") %>%
gather(asset, returns, -date) %>%
group_by(asset) %>%
tq_transmute(mutate_fun = periodReturn, type = "log") %>%
select(date, symbols)

Let’s take a peek at our 4 monthly log return objects.

head(asset_returns_xts)
                   SPY         EFA          IJS          EEM           AGG
2013-01-31  0.04992311  0.03660641  0.052133484 -0.002935494 -0.0062309021
2013-02-28  0.01267821 -0.01296938  0.016175381 -0.023105250  0.0058910464
2013-03-28  0.03726766  0.01296938  0.040257940 -0.010235048  0.0009849727
2013-04-30  0.01903006  0.04896773  0.001222544  0.012085043  0.0096390038
2013-05-31  0.02333571 -0.03065563  0.041976371 -0.049483592 -0.0202136957
2013-06-28 -0.01343432 -0.02715331 -0.001402974 -0.054739116 -0.0157787232
head(asset_returns_dplyr_byhand)
# A tibble: 6 x 6
date           SPY      EFA      IJS       EEM        AGG
<date>       <dbl>    <dbl>    <dbl>     <dbl>      <dbl>
1 2012-12-31 NA       NA      NA        NA        NA
2 2013-01-31  0.0499   0.0366  0.0521   -0.00294  -0.00623
3 2013-02-28  0.0127  -0.0130  0.0162   -0.0231    0.00589
4 2013-03-28  0.0373   0.0130  0.0403   -0.0102    0.000985
5 2013-04-30  0.0190   0.0490  0.00122   0.0121    0.00964
6 2013-05-31  0.0233  -0.0307  0.0420   -0.0495   -0.0202  
head(asset_returns_tq_builtin)
# A tibble: 6 x 6
date          SPY     EFA     IJS      EEM       AGG
<date>      <dbl>   <dbl>   <dbl>    <dbl>     <dbl>
1 2012-12-31 0       0      0        0        0
2 2013-01-31 0.0499  0.0366 0.0521  -0.00294 -0.00623
3 2013-02-28 0.0127 -0.0130 0.0162  -0.0231   0.00589
4 2013-03-28 0.0373  0.0130 0.0403  -0.0102   0.000985
5 2013-04-30 0.0190  0.0490 0.00122  0.0121   0.00964
6 2013-05-31 0.0233 -0.0307 0.0420  -0.0495  -0.0202  
head(asset_returns_tbltime)
# A time tibble: 6 x 6
# Index: date
date          SPY     EFA     IJS      EEM       AGG
<date>      <dbl>   <dbl>   <dbl>    <dbl>     <dbl>
1 2012-12-31 0       0      0        0        0
2 2013-01-31 0.0499  0.0366 0.0521  -0.00294 -0.00623
3 2013-02-28 0.0127 -0.0130 0.0162  -0.0231   0.00589
4 2013-03-28 0.0373  0.0130 0.0403  -0.0102   0.000985
5 2013-04-30 0.0190  0.0490 0.00122  0.0121   0.00964
6 2013-05-31 0.0233 -0.0307 0.0420  -0.0495  -0.0202  

First, have a look at the left most column in each object, where the date is stored. The asset_returns_xts has a date index, not a column. It is accessed via index(asset_returns_xts). The data frame objects have a column called “date”, accessed via the $date convention, e.g. asset_returns_dplyr_byhand$date.

Second, notice the first observation in each of the objects. asset_returns_xts elided December 31, 2012. asset_returns_dplyr_byhand has an NA for December 31, 2012, and asset_returns_tq_builtin and asset_returns_tbltime both have a 0. Do we care? Does it matter to our final calculations? Perhaps and probably. Either way, we need to be aware of the consequences of taking our raw data and reshaping it via different methods.

Third, each of these objects is in “wide” format, which in this case means there is a column for each of our assets. When we called spread at the end of the piped code flow, we put the data frames back to wide format.

This is the format that xts likes and it’s the format that is easier to read as a human. However, the tidyverse wants this data to be in long or tidy format so that each variable has its own column. For our asset_returns objects, that would mean a column called “date”, a column called “asset” and a column called “returns”. To see that in action, here is how it looks.

asset_returns_long_format <-
asset_returns_dplyr_byhand %>%
gather(asset, returns, -date)

head(asset_returns_long_format)
# A tibble: 6 x 3
date       asset returns
<date>     <chr>   <dbl>
1 2012-12-31 SPY   NA
2 2013-01-31 SPY    0.0499
3 2013-02-28 SPY    0.0127
4 2013-03-28 SPY    0.0373
5 2013-04-30 SPY    0.0190
6 2013-05-31 SPY    0.0233

We now have 3 columns and if we want to run an operation like mutate on each asset, we just need to call goup_by(asset) in the piped flow.

That’s all for today. Next time we will visualize these individual asset returns (and will notice that our log returns have a normal distribution, which will be important when we run simulations) before combining them into portfolio returns based on the different weightings. Thanks and see you next time.