New Wooldridge Econometrics Package!

2017-10-16

by Jonathan Regenstein

Attention econ students, professors and aficianados - an awesome new R package has arrived for the fall semester. It’s called wooldridge and as you might expect, it’s a companion R package to the ~~Bible of econometrics~~ popular Wooldridge text used in lots of econometrics classes. Thanks to [@justinmshea](https://www.linkedin.com/in/justinmshea/) for building and contributing to CRAN!

The vignette has a nice summary and worked example from every chapter of the book - here’s an excerpt:

This vignette contains examples from every chapter of Introductory Econometrics: A Modern Approach by Jeffrey M. Wooldridge. Each example illustrates how to load data, build econometric models, and compute estimates with R. Economics students new to both econometrics and R may find the introduction to both a bit challenging. In particular, the process of loading and preparing data prior to building one’s first econometric model can present challenges. The wooldridge data package aims to lighten this task.

Honestly, the best thing to do is head straight to the vignette but below is a quick worked example from Chapter 10 on time series.

library(tidyverse)
library(broom)
library(wooldridge)

# Load up the data for Chapter 10, example 10.2 and take a look at the summary
data("intdef")
summary(intdef)

##       year            i3              inf              rec       
##  Min.   :1948   Min.   : 0.950   Min.   :-1.200   Min.   :14.40  
##  1st Qu.:1962   1st Qu.: 2.893   1st Qu.: 1.675   1st Qu.:17.48  
##  Median :1976   Median : 4.735   Median : 3.050   Median :17.80  
##  Mean   :1976   Mean   : 4.908   Mean   : 3.884   Mean   :17.92  
##  3rd Qu.:1989   3rd Qu.: 6.515   3rd Qu.: 5.425   3rd Qu.:18.50  
##  Max.   :2003   Max.   :14.030   Max.   :13.500   Max.   :20.90  
##                                                                  
##       out             def              i3_1            inf_1       
##  Min.   :11.60   Min.   :-4.600   Min.   : 0.950   Min.   :-1.200  
##  1st Qu.:18.57   1st Qu.: 0.300   1st Qu.: 2.975   1st Qu.: 1.650  
##  Median :19.45   Median : 1.450   Median : 4.810   Median : 3.100  
##  Mean   :19.52   Mean   : 1.602   Mean   : 4.979   Mean   : 3.913  
##  3rd Qu.:21.23   3rd Qu.: 2.950   3rd Qu.: 6.570   3rd Qu.: 5.450  
##  Max.   :23.50   Max.   : 6.100   Max.   :14.030   Max.   :13.500  
##                                   NA's   :1        NA's   :1       
##      def_1             ci3                 cinf              cdef        
##  Min.   :-4.600   Min.   :-3.340000   Min.   :-9.3000   Min.   :-3.2000  
##  1st Qu.: 0.300   1st Qu.:-0.605000   1st Qu.:-1.1500   1st Qu.:-0.8500  
##  Median : 1.400   Median : 0.120000   Median : 0.2000   Median : 0.1000  
##  Mean   : 1.569   Mean   :-0.000364   Mean   :-0.1055   Mean   : 0.1455  
##  3rd Qu.: 2.900   3rd Qu.: 0.920000   3rd Qu.: 1.1000   3rd Qu.: 1.2000  
##  Max.   : 6.100   Max.   : 2.970000   Max.   : 6.6000   Max.   : 4.4000  
##  NA's   :1        NA's   :1           NA's   :1         NA's   :1        
##       y77        
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.4821  
##  3rd Qu.:1.0000  
##  Max.   :1.0000  
##

Now let’s run example 10.2 and examine the effects of inflation and deficits on interest rates.

The variable i3 is the three-month Treasury-bill rate, inf is the annual inflation rate based on the consumer price index (CPI), and def is the federal budget deficit as a percentage of GDP. The equation to be estimated is:

\[\hat{i3_{t}}=\beta_0 + \beta_1inf_{t} + \beta_2def_{t} + e\] We will run the same regression as the book and the vignette. The only wrinkle is that we will use the broom package to the clean up the results and visualize predicted values.

# Run regression
tbill_model <- lm(i3 ~ inf + def, data = intdef)

# tidy the results
tidy(tbill_model)

##          term  estimate  std.error statistic      p.value
## 1 (Intercept) 1.7332658 0.43196700  4.012496 1.897506e-04
## 2         inf 0.6058659 0.08213481  7.376481 1.117901e-09
## 3         def 0.5130579 0.11838406  4.333843 6.572384e-05

We can glance at our results and use dplyr’s select verb to choose a handful of columns for viewing.

glance(tbill_model) %>% 
  select( r.squared, adj.r.squared, sigma)

##   r.squared adj.r.squared    sigma
## 1 0.6020677     0.5870514 1.843163

Let’s round out our use of broom and tinker with the augment function, which will augment our original data set with fitted/predicted values and residuals from the model.

intdef_augmented <- augment(tbill_model, intdef)

head(intdef_augmented)

##   year   i3  inf  rec  out        def i3_1 inf_1      def_1        ci3
## 1 1948 1.04  8.1 16.2 11.6 -4.6000004   NA    NA         NA         NA
## 2 1949 1.10 -1.2 14.5 14.3 -0.1999998 1.04   8.1 -4.6000004 0.06000006
## 3 1950 1.22  1.3 14.4 15.6  1.2000008 1.10  -1.2 -0.1999998 0.12000000
## 4 1951 1.55  7.9 16.1 14.2 -1.9000006 1.22   1.3  1.2000008 0.32999992
## 5 1952 1.77  1.9 19.0 19.4  0.3999996 1.55   7.9 -1.9000006 0.22000003
## 6 1953 1.93  0.8 18.7 20.4  1.6999989 1.77   1.9  0.3999996 0.15999997
##   cinf      cdef y77   .fitted   .se.fit     .resid       .hat   .sigma
## 1   NA        NA   0 4.2807132 0.8770291 -3.2407132 0.22641254 1.789275
## 2 -9.3  4.400001   0 0.9036153 0.5129939  0.1963848 0.07746346 1.860585
## 3  2.5  1.400001   0 3.1365612 0.3255787 -1.9165612 0.03120215 1.841105
## 4  6.6 -3.100001   0 5.5447960 0.6066185 -3.9947960 0.10831876 1.765902
## 5 -6.0  2.300000   0 3.0896339 0.3208423 -1.3196339 0.03030092 1.851498
## 6 -1.1  1.299999   0 3.0901562 0.3543082 -1.1601563 0.03695174 1.853566
##        .cooksd .std.resid
## 1 0.3898648026 -1.9990429
## 2 0.0003444265  0.1109308
## 3 0.0119815965 -1.0564339
## 4 0.2133166776 -2.2952289
## 5 0.0055060468 -0.7270616
## 6 0.0052616616 -0.6413996

Now we can visualize the predicted or .fitted versus actual i3 values.

intdef_augmented %>% 
  ggplot(aes(x = year)) + 
  geom_line(aes(y = i3, color = "i3")) +
  geom_line(aes(y = .fitted, color = "predicted"))

Let’s add in our predictors as well and see if anything jumps out as interesting.

intdef_augmented %>% 
  ggplot(aes(x = year)) + 
  geom_line(aes(y = i3, color = "i3")) +
  geom_line(aes(y = .fitted, color = "predicted")) +
  geom_line(aes(y = inf, color = "inflation")) +
  geom_line(aes(y = def, color = "deficit"))

Since 2000, interest rates have been decreasing, while the deficit as percent of GDP has been increasing. Remember that our model returned a positive beta for the def variable: an increasing deficit should lead to increasing interest rates and here’s some background on the causal link. That relationship hasn’t held since 2000 and the predictions are suffering for it.

That’s all for today. Thanks again to justinmshea for the new fantastically useful wooldridge package. Happy econometricsing!