State Unemployment

In today’s Reproducible Finance post, we will explore state-level unemployment claims which get released every Thursday. The last few weeks have shown huge spikes in those claims, of course, due to the coronavirus and statewide lockdown orders, and it got me wondering how these times will look to data scientists in the future. Let’s start by importing unemployment insurance claims data for Georgia. This is a data series that’s reported by all 50 states.

Read more

Share Comments

Outlier Days with R and Python

Welcome to another installment of Reproducible Finance. Today’s post will be topical as we look at the historical behavior of the stock market after days of extreme returns and it will also explore one of my favorite coding themes of 2020 - the power of RMarkdown as an R/Python collaboration tool. This post originated when Rishi Singh, the founder of tiingo and one of the nicest people I have encountered in this crazy world, sent over a note about recent market volatility along with some Python code for analyzing that volatility.

Read more

Share Comments · ·

Portfolio Attribution

In a series of previous posts, we explored IPOs and IPO returns by sector and year since 2004, then examined the returns of portfolios constructed by investing in IPOs each year, and, thirdly, added a benchmark so that we can compare our IPO portfolios to something besides themselves.

Read more

Share Comments

Market Structure Part 1: Order Volume Density

Welcome to another installment of Reproducible Finance! Inspired by a great visualization in Hands on Time Series with R by Rami Krispin, today we’ll investigate some market structure data and get to know the Midas data source provided by the SEC. Let’s start by importing data from the SEC website for the 2nd quarter of 2019. If you navigate to the SEC website here https://www.sec.gov/opa/data/market-structure/market-structure-data-security-and-exchange.html and right click on the link labeled ‘2019 Q2’, you can copy the link address as https://www.

Read more

Share Comments

Looking Back on 2019: Part 2

Welcome to the second installment of Reproducible Finance 2020! In the previous post, we looked back on the daily returns for several market sectors in 2019. Today, we’ll continue that theme and look at some summary statistics for 2019.

Read more

Share Comments · · ·

Looking back on 2019: part 1

Welcome to Reproducible Finance 2020! It’s a new year, a new beginning, the Earth has completed one more trip around the sun and that means it’s time to look back on the previous January to December cycle.

Read more

Share Comments · · · · ·

ETF Survival Scraper

Scrape etf closures etf_closures_url <- "https://www.etf.com/etf-watch-tables/etf-closures" etf_closures_html <- read_html(etf_closures_url) etf_closures_tibble <- etf_closures_html %>% html_nodes(xpath = '//*[@id="article-body-content"]/table') %>% html_table(fill = TRUE) %>% .[[1]] %>% rename(close_date = X1, fund = X2, ticker = X3) %>% slice(-1:-2) %>% filter(!(ticker == "Source: FactSet") | !(nchar(close_date) < 2)) %>% filter(!(ticker %in% c("2019", "2018", "2017", "2016"))) %>% mutate(close_date = str_replace(close_date, "20177", "2017"), ticker = case_when(nchar(fund) < 8 ~ fund, TRUE ~ ticker), fund = case_when(nchar(close_date) > 10 ~ close_date, TRUE ~ fund), close_date = case_when(between(nchar(close_date), 3, 10) ~ close_date)) %>% fill(close_date) %>% filter(nchar(fund) > 4) %>% mutate(close_date = case_when(nchar(close_date) > 4 ~ lubridate::ymd(lubridate::parse_date_time(close_date, "%m/%d/%Y")), nchar(close_date) == 4 ~ lubridate::ymd(close_date, truncated = 2L) + months(6))) ## Warning: 564 failed to parse.

Read more

Share Comments ·

IPO Benchmark Addendum

In a previous post, we examined the returns of portfolios constructed by investing in IPOs each year. This is a brief addendum on how to compare those portfolios to a benchmark. Recall that we saved the following object as both an RDS file and as a pin on RStudio Connect. That object contains the time series of monthly closing prices, monthly returns, tickers, ipo year and sector. Here’s a peek.

Read more

Share Comments

IPO Exploration: Part I

Inspired by recent headlines like Fear Overtakes Greed in IPO Market after WeWork Debacle and This Year’s IPO Class is Least Profitable since the Tech Bubble, today we’ll explore historical IPO data and next time we’ll look at the the performance of IPO driven-portfolios constructed during the ten-year period from 2004 - 2014. I’ll admit I’ve often wondered how a portfolio that allocated money to new IPOs each year might perform since this has to be an ultimate example of a few headline gobbling whales dominating the collective consciousness.

Read more

Share Comments

IPO Exploration: Part II

In a previous post we explored IPOs and IPO returns by sector and year since 2004. Today, let’s investigate how porfolios formed on those IPOs have performed. We will need to grab the price histories of the tickers, then form portfolios, then calculate their performance, and then rank those performances in some way. Since there’s several hundred IPOs for which we need to pull returns data, today’s post will be a bit data intensive.

Read more

Share Comments

Tech Dividends

In a previous post, we explored the dividend history of stocks included in the SP500. Today we’ll extend that anlaysis to cover the Nasdaq because, well, because in the previous post I said I would do that. We’ll also explore a different source for dividend data, do some string cleaning and check out ways to customize a tooltip in plotly. Bonus feature: we’ll get into some animation too.

Read more

Share Comments

Summer Vix

In a previous post, from way back in August of 2017, we explored the relationship between the VIX and the past, realized volatility of the S&P 500 and reproduced some interesting work from AQR on the meaning of the VIX. With the recent market and VIX rollercoaster, this seemed a good time to revisit the old post, update some code and see if we can tweak the data visualizations to shed some light on the recent market activity.

Read more

Share Comments

Dividend Discovery

Welcome to a mid-summer addition of Reproducible Finance with R. Today we’ll explore the dividend histories of some stocks in the S&P 500. By way of history, for all you young tech IPO and crypto investors out there, way back, a long time ago in the dark ages, companies used to take pains to generate free cash flow and then return some of that free cash to investors in the form of dividends.

Read more

Share Comments ·

Momentum Investing with R

After an extended hiatus, Reproducible Finance is back! We’ll celebrate by changing focus a bit and coding up an investment strategy called Momentum. Before we even tiptoe in that direction, please note that this is not intended as investment advice and it’s not intended to be a script that can be implemented for trading.

Read more

Share Comments

Rolling Origin Fama French

Today we continue our work on sampling so that we can run models on subsets of our data and then test the accuracy of the models on data not included in those subsets. In the machine learning prediction world, that’s often called our training data and our testing data, but we’re not going to do any machine learning prediction today. We’ll stay with our good’ol Fama French regression models for the reasons explained last time: the goal is to explore a new of sampling our data and I prefer to do that in the context of a familiar model and data set.

Read more

Share Comments ·

rsampling fama french

Today we will continue our work on Fama French factor models, but more as a vehicle to explore some of the awesome stuff happening in the world of tidy models. For new readers who want get familiar with Fama French before diving into this post, see here where we covered importing and wrangling the data, here where we covered rolling models and visualization, my most recent previous post here where we covered managing many models, and if you’re into Shiny, this flexdashboard.

Read more

Share Comments ·

2018 Sector Analysis Part 2

Welcome to the second installment of Reproducible Finance 2019! In the previous post, we looked back on the daily returns for several market sectors in 2018. Today, we’ll continue that theme and look at some summary statistics for 2018, and then extend out to previous years and different ways of visualizing our data.

Read more

Share Comments

battlefin presentation

Introducing R and RStudio + Statistical programming language -> by data scientists, for data scientists + Base R + 17,000 packages + RStudio + Shiny + sparklyr -> big data + tensorflow -> AI + Rmarkdown -> reproducible reports + database connectors + htmlwidgets Packages for today library(tidyverse) library(tidyquant) library(timetk) library(tibbletime) library(highcharter) library(PerformanceAnalytics) More packages for finance here: https://cran.

Read more

Share Comments

Looking back on 2018: part 1

Welcome to Reproducible Finance 2019! It’s a new year, a new beginning, the Earth has completed one more trip around the sun and that means it’s time to look back on the previous January to December cycle.

Read more

Share Comments ·

Many Factor Models

Today we will return to the Fama French (FF) model of asset returns and use it as a proxy for fitting and evaluating multiple linear models. In a previous post, we reviewed how to run the FF 3 factor model on a the returns of a portfolio. That is, we ran one model on one set of returns. Today we will run multiple models on multiple streams of returns, which will allow us to compare those models and hopefully build a code scaffolding that can be used when we wish to explore other factor models.

Read more

Share Comments · ·