State Unemployment

Jonathan Regenstein 2020-04-30

In today’s Reproducible Finance post, we will explore state-level unemployment claims which get released every Thursday. The last few weeks have shown huge spikes in those claims, of course, due to the coronavirus and statewide lockdown orders, and it got me wondering how these times will look to data scientists in the future. Let’s start by importing unemployment insurance claims data for Georgia. This is a data series that’s reported by all 50 states.

Outlier Days with R and Python

Jonathan Regenstein 2020-03-17

Welcome to another installment of Reproducible Finance. Today’s post will be topical as we look at the historical behavior of the stock market after days of extreme returns and it will also explore one of my favorite coding themes of 2020 - the power of RMarkdown as an R/Python collaboration tool. This post originated when Rishi Singh, the founder of tiingo and one of the nicest people I have encountered in this crazy world, sent over a note about recent market volatility along with some Python code for analyzing that volatility.

Portfolio Attribution

Jonathan Regenstein 2020-01-19

In a series of previous posts, we explored IPOs and IPO returns by sector and year since 2004, then examined the returns of portfolios constructed by investing in IPOs each year, and, thirdly, added a benchmark so that we can compare our IPO portfolios to something besides themselves.

Market Structure Part 1: Order Volume Density

Jonathan Regenstein 2020-01-11

Welcome to another installment of Reproducible Finance! Inspired by a great visualization in Hands on Time Series with R by Rami Krispin, today we’ll investigate some market structure data and get to know the Midas data source provided by the SEC. Let’s start by importing data from the SEC website for the 2nd quarter of 2019. If you navigate to the SEC website here https://www.sec.gov/opa/data/market-structure/market-structure-data-security-and-exchange.html and right click on the link labeled ‘2019 Q2’, you can copy the link address as https://www.

Looking Back on 2019: Part 2

Jonathan Regenstein 2020-01-04

Welcome to the second installment of Reproducible Finance 2020! In the previous post, we looked back on the daily returns for several market sectors in 2019. Today, we’ll continue that theme and look at some summary statistics for 2019.

Looking back on 2019: part 1

Jonathan Regenstein 2020-01-03

Welcome to Reproducible Finance 2020! It’s a new year, a new beginning, the Earth has completed one more trip around the sun and that means it’s time to look back on the previous January to December cycle.

ETF Survival Scraper

Jonathan Regenstein 2019-11-29

Scrape etf closures etf_closures_url <- "https://www.etf.com/etf-watch-tables/etf-closures" etf_closures_html <- read_html(etf_closures_url) etf_closures_tibble <- etf_closures_html %>% html_nodes(xpath = '//*[@id="article-body-content"]/table') %>% html_table(fill = TRUE) %>% .[[1]] %>% rename(close_date = X1, fund = X2, ticker = X3) %>% slice(-1:-2) %>% filter(!(ticker == "Source: FactSet") | !(nchar(close_date) < 2)) %>% filter(!(ticker %in% c("2019", "2018", "2017", "2016"))) %>% mutate(close_date = str_replace(close_date, "20177", "2017"), ticker = case_when(nchar(fund) < 8 ~ fund, TRUE ~ ticker), fund = case_when(nchar(close_date) > 10 ~ close_date, TRUE ~ fund), close_date = case_when(between(nchar(close_date), 3, 10) ~ close_date)) %>% fill(close_date) %>% filter(nchar(fund) > 4) %>% mutate(close_date = case_when(nchar(close_date) > 4 ~ lubridate::ymd(lubridate::parse_date_time(close_date, "%m/%d/%Y")), nchar(close_date) == 4 ~ lubridate::ymd(close_date, truncated = 2L) + months(6))) ## Warning: 564 failed to parse.

IPO Benchmark Addendum

Jonathan Regenstein 2019-11-18

In a previous post, we examined the returns of portfolios constructed by investing in IPOs each year. This is a brief addendum on how to compare those portfolios to a benchmark. Recall that we saved the following object as both an RDS file and as a pin on RStudio Connect. That object contains the time series of monthly closing prices, monthly returns, tickers, ipo year and sector. Here’s a peek.

IPO Exploration: Part I

Jonathan Regenstein 2019-11-15

Inspired by recent headlines like Fear Overtakes Greed in IPO Market after WeWork Debacle and This Year’s IPO Class is Least Profitable since the Tech Bubble, today we’ll explore historical IPO data and next time we’ll look at the the performance of IPO driven-portfolios constructed during the ten-year period from 2004 - 2014. I’ll admit I’ve often wondered how a portfolio that allocated money to new IPOs each year might perform since this has to be an ultimate example of a few headline gobbling whales dominating the collective consciousness.

IPO Exploration: Part II

Jonathan Regenstein 2019-11-15

In a previous post we explored IPOs and IPO returns by sector and year since 2004. Today, let’s investigate how porfolios formed on those IPOs have performed. We will need to grab the price histories of the tickers, then form portfolios, then calculate their performance, and then rank those performances in some way. Since there’s several hundred IPOs for which we need to pull returns data, today’s post will be a bit data intensive.

Tech Dividends

Jonathan Regenstein 2019-08-25

In a previous post, we explored the dividend history of stocks included in the SP500. Today we’ll extend that anlaysis to cover the Nasdaq because, well, because in the previous post I said I would do that. We’ll also explore a different source for dividend data, do some string cleaning and check out ways to customize a tooltip in plotly. Bonus feature: we’ll get into some animation too.

Summer Vix

Jonathan Regenstein 2019-08-06

In a previous post, from way back in August of 2017, we explored the relationship between the VIX and the past, realized volatility of the S&P 500 and reproduced some interesting work from AQR on the meaning of the VIX. With the recent market and VIX rollercoaster, this seemed a good time to revisit the old post, update some code and see if we can tweak the data visualizations to shed some light on the recent market activity.

Dividend Discovery

Jonathan Regenstein 2019-07-10

Welcome to a mid-summer addition of Reproducible Finance with R. Today we’ll explore the dividend histories of some stocks in the S&P 500. By way of history, for all you young tech IPO and crypto investors out there, way back, a long time ago in the dark ages, companies used to take pains to generate free cash flow and then return some of that free cash to investors in the form of dividends.

Momentum Investing with R

Jonathan Regenstein 2019-05-22

After an extended hiatus, Reproducible Finance is back! We’ll celebrate by changing focus a bit and coding up an investment strategy called Momentum. Before we even tiptoe in that direction, please note that this is not intended as investment advice and it’s not intended to be a script that can be implemented for trading.

Rolling Origin Fama French

Jonathan Regenstein 2019-03-14

Today we continue our work on sampling so that we can run models on subsets of our data and then test the accuracy of the models on data not included in those subsets. In the machine learning prediction world, that’s often called our training data and our testing data, but we’re not going to do any machine learning prediction today. We’ll stay with our good’ol Fama French regression models for the reasons explained last time: the goal is to explore a new of sampling our data and I prefer to do that in the context of a familiar model and data set.

rsampling fama french

Jonathan Regenstein 2019-03-13

Today we will continue our work on Fama French factor models, but more as a vehicle to explore some of the awesome stuff happening in the world of tidy models. For new readers who want get familiar with Fama French before diving into this post, see here where we covered importing and wrangling the data, here where we covered rolling models and visualization, my most recent previous post here where we covered managing many models, and if you’re into Shiny, this flexdashboard.

2018 Sector Analysis Part 2

Jonathan Regenstein 2019-02-18

Welcome to the second installment of Reproducible Finance 2019! In the previous post, we looked back on the daily returns for several market sectors in 2018. Today, we’ll continue that theme and look at some summary statistics for 2018, and then extend out to previous years and different ways of visualizing our data.

battlefin presentation

Jonathan Regenstein 2019-01-29

Introducing R and RStudio + Statistical programming language -> by data scientists, for data scientists + Base R + 17,000 packages + RStudio + Shiny + sparklyr -> big data + tensorflow -> AI + Rmarkdown -> reproducible reports + database connectors + htmlwidgets Packages for today library(tidyverse) library(tidyquant) library(timetk) library(tibbletime) library(highcharter) library(PerformanceAnalytics) More packages for finance here: https://cran.

Looking back on 2018: part 1

Jonathan Regenstein 2019-01-14

Welcome to Reproducible Finance 2019! It’s a new year, a new beginning, the Earth has completed one more trip around the sun and that means it’s time to look back on the previous January to December cycle.

Many Factor Models

Jonathan K. Regenstein, Jr. 2018-11-26

Today we will return to the Fama French (FF) model of asset returns and use it as a proxy for fitting and evaluating multiple linear models. In a previous post, we reviewed how to run the FF 3 factor model on a the returns of a portfolio. That is, we ran one model on one set of returns. Today we will run multiple models on multiple streams of returns, which will allow us to compare those models and hopefully build a code scaffolding that can be used when we wish to explore other factor models.