IPO Exploration: Part II

2019-11-15

by Jonathan Regenstein

In a previous post we explored IPOs and IPO returns by sector and year since 2004. Today, let’s investigate how porfolios formed on those IPOs have performed. We will need to grab the price histories of the tickers, then form portfolios, then calculate their performance, and then rank those performances in some way.

Since there’s several hundred IPOs for which we need to pull returns data, today’s post will be a bit data intensive. If you don’t want to tax your machine too much, you can always subset down the data before running the script. With that, let’s do this!

I won’t go through the code again in detail since we covered it last time but our first step is to get the tickers and IPO years for a large universe of companies. Here is the code to retrieve the tickers for all companies that IPO’d between 2004 and 2014 - and have survived (meaning today’s analysis is going to have serious survivorship bias).

nasdaq <-
  tq_exchange("NASDAQ")

amex  <- 
  tq_exchange("AMEX")

nyse <- 
  tq_exchange("NYSE")

company_ipo_sector <-
  nasdaq %>% 
  bind_rows(amex) %>% 
  bind_rows(nyse) %>% 
  select(symbol, company, ipo.year, sector) %>% 
  filter(!is.na(ipo.year) & ipo.year < 2015)

And here is what we get:

company_ipo_sector %>% 
  head()

# A tibble: 6 x 4
  symbol company                                ipo.year sector           
  <chr>  <chr>                                     <dbl> <chr>            
1 PIH    1347 Property Insurance Holdings, Inc.     2014 Finance          
2 FLWS   1-800 FLOWERS.COM, Inc.                    1999 Consumer Services
3 VNET   21Vianet Group, Inc.                       2011 Technology       
4 TWOU   2U, Inc.                                   2014 Technology       
5 JOBS   51job, Inc.                                2004 Technology       
6 ABIL   Ability Inc.                               2014 Capital Goods

There are about 1,830 tickers in this data frame and it conveniently includes a column called ipo.year, which we’ll make us of later in this post. Our next step is to download the price histories of these tickers from tiingo. We’ll need to divide our tickers into smaller subsets and pull in the data in pieces so we don’t hit our tiingo API limits.

tickers_2004_2006 <-
  company_ipo_sector %>% 
  distinct(company, .keep_all = TRUE) %>% 
  filter(!is.na(sector) &  between(ipo.year, 2004, 2006)) %>% 
  group_by(ipo.year) %>%
  pull(symbol)

tickers_2007_2009 <-
  company_ipo_sector %>% 
  distinct(company, .keep_all = TRUE) %>%
  filter(!is.na(sector) &  between(ipo.year, 2007, 2009)) %>% 
  group_by(ipo.year) %>%
  pull(symbol)

tickers_2010_2012 <-
  company_ipo_sector %>% 
  distinct(company, .keep_all = TRUE) %>% 
  filter(!is.na(sector) &  between(ipo.year, 2010, 2012)) %>% 
  group_by(ipo.year) %>%
  pull(symbol)

tickers_2013_2014 <-
  company_ipo_sector %>% 
  distinct(company, .keep_all = TRUE) %>% 
  filter(!is.na(sector) &  between(ipo.year, 2013, 2014)) %>% 
  group_by(ipo.year) %>%
  pull(symbol)

We now have 4 sets of tickers. Let’s pass them to the riingo function one at a time and set the resample_frequency to monthly, so we’re pulling in 12 prices per year per company instead of ~252.

prices_riingo_2004_2006 <-
  tickers_2004_2006 %>%
  riingo_prices(start_date = "2004-01-01", resample_frequency = "monthly") %>% 
  group_by(ticker)


prices_riingo_2007_2009 <-
  tickers_2007_2009 %>%
  riingo_prices(start_date = "2007-01-01", resample_frequency = "monthly") %>% 
  group_by(ticker)

prices_riingo_2010_2012 <-
  tickers_2010_2012 %>%
  riingo_prices(start_date = "2010-01-01", resample_frequency = "monthly") %>% 
  group_by(ticker)


prices_riingo_2013_2014 <-
  tickers_2013_2014 %>%
  riingo_prices(start_date = "2013-01-01", resample_frequency = "monthly") %>% 
  group_by(ticker)


prices_riingo_full <-
  prices_riingo_2004_2006 %>% 
  bind_rows(prices_riingo_2007_2009) %>% 
  bind_rows(prices_riingo_2010_2012) %>% 
  bind_rows(prices_riingo_2013_2014)

We have our prices, now let’s add a column of monthly returns with a call to mutate(monthly_returns = close/lag(close) - 1).

prices_riingo_full <-
  prices_riingo_full %>% 
  group_by(ticker) %>% 
  mutate(monthly_returns = close/lag(close) - 1)

One more step, I won’t bore you with the details but after a lot slogging, I discovered a few errors in my data, where the ipo.year came after the first year in which the ticker was traded. That’s probably because of some equities being relisted on other exchanges or changing ticker names, but suffice it to say, it jacked up my portfolio analysis! I’m going to remove those problematic tickers with the code below. First, we filter(date == min(date)) to get the first day for which we have returns. Then we isolate the year of that first trade with mutate(first_trade = year(date)). If that year doesn’t match ipo.year, we pull() the ticker.

tickers_remove_mismatch_dates <- 
prices_riingo_full %>% 
  left_join(company_ipo_sector, by = c("ticker" = "symbol")) %>% 
  filter(date == min(date)) %>% 
  select(ticker, date, ipo.year) %>% 
  mutate(first_trade = year(date)) %>% 
  filter(ipo.year != first_trade) %>% 
  pull(ticker)

We now have a vector of the tickers to remove but we haven’t removed them yet. I separated these two steps so I could take a peak at those tickers first and maybe come back to later to address this issue in a better way.

Let’s go ahead and remove them (I’m going to remove PGTI also, it was showing up duplicate downstream in my code).

prices_riingo_full <- 
 prices_riingo_full %>% 
 filter(!(ticker %in% tickers_remove_mismatch_dates) & ticker != 'PGTI')

A final step before saving: we’ll left_join() the company_ipo_sector tibble so that we have company name and ipo.year as part of this object.

prices_riingo_full <- 
prices_riingo_full %>% 
  left_join(company_ipo_sector, by = c("ticker" = "symbol"))

To avoid going through that process each time we revisit this project, we can save this price/returns data as a .RDS file.

write_rds(prices_riingo_full, "prices_riingo_full.RDS")

But another exciting way to save and share data is the new pins package (see intro to pins here). Pins is a package that makes it easy to save and share data. I’m going to use RStudio Connect for that but you could use github or kaggle as well.

First, we register our Connect server.

library(pins)

pins::board_register(server = "path to your RStudio Connect server here", key = "your API key here", board = "rsconnect")

Now we’re ready to pin this data up to Connect where we or any of our colleagues could pull down this data in the future and take advantage of all the gymnastics we just performed to get this in shape.

pins::pin(
  x = prices_riingo_full, 
  name = "ipo_riingo_prices_pins", 
  description = "Monthly price history of IPOs from 2004 - 2014.", 
  board = "rsconnect"
)

For the curious, here’s how to retreive that data from RStudio Connect.

ipo_riingo_prices_pins <- 
  pins::pin_get(name = "ipo_riingo_prices_pins", 
  board = "rsconnect")

Now, back to our ultimate goal: we want to see how portfolios that allocated dollars to IPOs each year would have performed. That is, we’ll build a porftolio for 2004, 2005…through to 2014, 11 portfolios consisting solely of the companies that IPO’d in a specific year. Obviously we are dealing with selection bias, meaning if we had implemented this strategy, we’d have invested in lots of companies that went to zero and delisted. Those dead companies didn’t make it to today’s project because we pulled the tickers of companies currently listed on the AMEX, NYSE and NASDAQ. So don’t try this at home and nothing in this post is investment advice.

Let’s start by building a portfolio of stocks that IPO’d in 2006, assuming we invest equally in each company. We’ll select() a few columns needed for the analysis and then filter(ipo.year == 2006). Next we use the built-in tq_portfolio() function from tidyquant wherein we specify the assets_col as our tickers, the returns_col as monthly_returns and paste together a name that incorporates 2006. We also set it to rebalance_on = "months" so that each month we make sure to maintain an equal weighting to each company. We do not need to specify a weights argument because equal weighting is the default.

prices_riingo_full %>%
  select(ticker, date, monthly_returns, ipo.year) %>% 
  filter(ipo.year == 2006) %>% 
  group_by(ticker) %>% 
  filter(!is.na(monthly_returns)) %>%
  tq_portfolio(assets_col  = ticker,
               returns_col = monthly_returns,
               col_rename  = paste(2006, "_port_returns", sep = ""),
               wealth.index = F,
               rebalance_on = "months") %>% 
  tail()

# A tibble: 6 x 2
  date                `2006_port_returns`
  <dttm>                            <dbl>
1 2019-05-31 00:00:00             0.261  
2 2019-06-28 00:00:00             0.0533 
3 2019-07-31 00:00:00             0.0346 
4 2019-08-30 00:00:00            -0.00789
5 2019-09-30 00:00:00            -0.00539
6 2019-10-31 00:00:00             0.0233

  # slice(1:3, (n()-3):n())

That code works for 2006, but now we want to scale it for more general use. We can turn the code flow into a function that takes year as an argument and then apply the function to all of our IPO returns on a year-by-year basis. I’m also going to add a second argument, show_growth, that can be set to TRUE or FALSE if we wish output the growth of a dollar instead of monthly returns.

ipo_by_year_portfolios <- function(year, show_growth = F){
  
  prices_riingo_full %>%
  select(ticker, date, monthly_returns, ipo.year) %>% 
  filter(ipo.year == year) %>% 
  tq_portfolio(assets_col  = ticker,
               returns_col = monthly_returns,
               col_rename  = paste(year, "_port_returns", sep = ""),
               wealth.index = show_growth,
               rebalance_on = "months")
  
}

Let’s test that function on 2006 and confirm we get the same result.

ipo_by_year_portfolios(2006)  %>% 
  tail()

# A tibble: 6 x 2
  date                `2006_port_returns`
  <dttm>                            <dbl>
1 2019-05-31 00:00:00             0.261  
2 2019-06-28 00:00:00             0.0533 
3 2019-07-31 00:00:00             0.0346 
4 2019-08-30 00:00:00            -0.00789
5 2019-09-30 00:00:00            -0.00539
6 2019-10-31 00:00:00             0.0233

And now we want to apply that function to every year in our data set and mash the results together into one data frame.

We’ll start by creating a vector of years from 2004 to 2014.

years_numeric <- seq(2004, 2014, by = 1)

And now we map() our function across that vector of years. If we just ran map(years_numeric, ipo_by_year_portfolios), it would result in 11 lists of portfolio returns, one for each year of our IPOs. I want a data frame with all the results stored together. Let’s use reduce(left_join) for that task.

returns_each_year_ipo_portfolios <-
map(years_numeric, ipo_by_year_portfolios) %>% 
  reduce(left_join) 

returns_each_year_ipo_portfolios %>% 
  head()

# A tibble: 6 x 12
  date                `2004_port_retu… `2005_port_retu… `2006_port_retu…
  <dttm>                         <dbl>            <dbl>            <dbl>
1 2004-02-27 00:00:00          0                     NA               NA
2 2004-03-31 00:00:00         -0.00296               NA               NA
3 2004-04-30 00:00:00         -0.00141               NA               NA
4 2004-05-31 00:00:00          0.00703               NA               NA
5 2004-06-30 00:00:00         -0.0194                NA               NA
6 2004-07-30 00:00:00         -0.0472                NA               NA
# … with 8 more variables: `2007_port_returns` <dbl>,
#   `2008_port_returns` <dbl>, `2009_port_returns` <dbl>,
#   `2010_port_returns` <dbl>, `2011_port_returns` <dbl>,
#   `2012_port_returns` <dbl>, `2013_port_returns` <dbl>,
#   `2014_port_returns` <dbl>

We now have one data frame with a date column that runs from the beginning of 2004, the first year for which we pulled in IPO tickers, to the today’s date, the last date for which we have returns for these companies. Then we have one column with the returns for each yearly portfolio constructed by equal weighting each company that IPO’d from 2004 to 2014. Since each column represents a year, we start out with values for the 2004 column but the rest have NA for all of 2004. Then in 2005, the 2005 portfolio springs into existence, but the rest of the columns have NA. And then each year another column gets values as new companies IPO in those years.

We constructed this so that each year we allocated fresh money to the new IPOs, equal weighted, with no regard to the past or any other market conditions. We didn’t have to do any of that, of course. We could use a very similar construction to tweak our existing portfolio based on what happened last year, or look at the past performance of basket of funds in a sector and then allocate to certain tickers going forward. In that case, we would map our function to first create a signal year-by-year, then run our allocation algorithm based on the signal from the previous year. Also note that we’re doing something a bit unusual by constructing each portfolio by year and then doing nothing beyond a monthly rebalance. Not quite set it and forget it, but pretty close.

Let’s dig in to our new portfolio returns object by calculating the Sharpe Ratio for each of these IPO portfolios. We currently have our data in wide, decidely un-tidy format so let’s pivot_longer() this data to make it tidy (long time readers or dplyers might note that we would have used gather() for this a few months ago - pivot_longer() has replaced gather() in the latest release of dplyr). We are treating each yearly portfolio as a discrete return stream so can goup_by() the portfolio_by_year column. From there we call tq_performance(), same as if these return streams were from ETFs or any other asset. Our result should be one sharpe ratio for each year, or 11 Sharpe Ratios.

port_sharpes <-
  returns_each_year_ipo_portfolios %>%
  pivot_longer(-date, names_to = "portfolio_by_year", values_to = "monthly_return") %>%
  group_by(portfolio_by_year) %>%
  arrange(portfolio_by_year, date) %>%
  filter(!is.na(monthly_return)) %>%
  tq_performance(Ra = monthly_return,
                 performance_fun = SharpeRatio,
                 Rf = 0,
                 FUN= "StdDev") %>%
  `colnames<-`(c("portfolio_by_year", "port_sharpe"))%>%
  add_column(year = years_numeric)

port_sharpes

# A tibble: 11 x 3
# Groups:   portfolio_by_year [11]
   portfolio_by_year port_sharpe  year
   <chr>                   <dbl> <dbl>
 1 2004_port_returns       0.234  2004
 2 2005_port_returns       0.192  2005
 3 2006_port_returns       0.249  2006
 4 2007_port_returns       0.190  2007
 5 2008_port_returns       0.142  2008
 6 2009_port_returns       0.220  2009
 7 2010_port_returns       0.279  2010
 8 2011_port_returns       0.152  2011
 9 2012_port_returns       0.309  2012
10 2013_port_returns       0.182  2013
11 2014_port_returns       0.218  2014

Let’s port this data to plotly and create a bar chart, where the height of each bar is the portfolio’s Sharpe Ratio.

(
port_sharpes %>% 
  ggplot(aes(x = year, y = port_sharpe, fill = portfolio_by_year)) + 
  geom_col(width = .4) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
  labs(x = "", y = "sharpe by year", title = "IPO Portfolio Sharpe Ratios") +
  theme_minimal()
) %>% 
  ggplotly()

Here is the exact same data, except displayed as a point chart. Notice how defaults to a slightly different scale, with the y-axis not extending all the way to 0. That makes the poor performance of 2008 seem even more dramatic. Were the companies that IPO’d in 2008 all dogs, or is it just a bad idea to invest in IPOs when the market is in a sharp correction because of a financial crisis?

(
port_sharpes %>% 
  ggplot(aes(x = year, y = port_sharpe, fill = portfolio_by_year)) + 
  geom_point(size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
  labs(x = "", y = "sharpe by year", title = "IPO Portfolio Sharpe Ratios") +
  theme_minimal()
) %>% 
  ggplotly()

If you’re an avid reader of these posts, it won’t surprise you to learn that we’re going to calculate the rolling Sharpe Ratio as well. We can use the same code as last time and the blazing fast roll package.

library(roll)
returns_each_year_ipo_portfolios %>%
  pivot_longer(-date, names_to = "portfolio_by_year", values_to = "monthly_return") %>%
  group_by(portfolio_by_year) %>%
  arrange(portfolio_by_year, date) %>% 
  mutate( rolling_mean = roll_mean(as.matrix(monthly_return), 6, complete_obs  = T),
         rolling_sd = roll_sd(as.matrix(monthly_return), 6, complete_obs  = T),
         rolling_sharpe = rolling_mean/rolling_sd) %>% 
  na.omit() %>% 
  head()

# A tibble: 6 x 6
# Groups:   portfolio_by_year [1]
  date                portfolio_by_ye… monthly_return rolling_mean
  <dttm>              <chr>                     <dbl>        <dbl>
1 2004-07-30 00:00:00 2004_port_retur…       -0.0472     -0.0107  
2 2004-08-31 00:00:00 2004_port_retur…        0.00956    -0.00906 
3 2004-09-30 00:00:00 2004_port_retur…        0.0493     -0.000354
4 2004-10-29 00:00:00 2004_port_retur…        0.00395     0.000539
5 2004-11-30 00:00:00 2004_port_retur…        0.131       0.0211  
6 2004-12-31 00:00:00 2004_port_retur…        0.0737      0.0367  
# … with 2 more variables: rolling_sd <dbl>, rolling_sharpe <dbl>

And let’s port these rolling calculations straight to plotly.

(
 returns_each_year_ipo_portfolios %>%
  pivot_longer(-date, names_to = "portfolio_by_year", values_to = "monthly_return") %>%
  group_by(portfolio_by_year) %>%
  arrange(portfolio_by_year, date) %>% 
  mutate( rolling_mean = roll_mean(as.matrix(monthly_return), 6, complete_obs  = T),
         rolling_sd = roll_sd(as.matrix(monthly_return), 6, complete_obs  = T),
         rolling_sharpe = rolling_mean/rolling_sd,
         date = ymd(date)) %>% 
  na.omit() %>% 
  ggplot(aes(x = date, y = rolling_sharpe, color = portfolio_by_year)) +
   scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
  geom_line() +
   theme_minimal() 
) %>% 
  ggplotly()

Try double clicking on 2004_port_returns and then clicking on 2008_port_returns to isolate just those two. Here’s what you should see:

(
 returns_each_year_ipo_portfolios %>%
  pivot_longer(-date, names_to = "portfolio_by_year", values_to = "monthly_return") %>%
  group_by(portfolio_by_year) %>%
  arrange(portfolio_by_year, date) %>% 
  mutate( rolling_mean = roll_mean(as.matrix(monthly_return), 6, complete_obs  = T),
         rolling_sd = roll_sd(as.matrix(monthly_return), 6, complete_obs  = T),
         rolling_sharpe = rolling_mean/rolling_sd,
         date = ymd(date)) %>% 
  na.omit() %>% 
   filter(portfolio_by_year %in% c("2004_port_returns", "2008_port_returns")) %>% 
  ggplot(aes(x = date, y = rolling_sharpe, color = portfolio_by_year)) +
  geom_line() +
   scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
  geom_line() +
   theme_minimal()
) %>% 
  ggplotly()

Recall that the 2004 portfolio has a Sharpe Ratio that is ~2x the Sharpe Ratio of the 2008 portfolio. But look at how dramatically the 2004 portfolio plummeted through March of 2008. Would you have been able to ride out that storm? I dunno if I would have.

And with that…we’re done!

Next week we’ll add a short post on how to add a benchmark to today’s work, since the performance of these or any portfolio isn’t of much use without a benchmark. Thanks for reading and see you next time!

If you like this sort of code through check out my book, Reproducible Finance with R.

Not specific to finance but I’ve been using tons of code learned at Business Science University course.

I’m also going to be posting weekly code snippets on linkedin, connect with me there if you’re keen for some R finance stuff.

Thanks for reading and see you next time!