Basic Bayesian examples

R Simple multiple regression

library(brms)

I’m going to start with a simple full-factorial regression using three predictors of one outcome variable. We’ll first run it as a standard linear model (regression) using “lm” and then do a Bayesian version using the default settings using “brm”.

jack=lm(Murder ~ Assault*Rape*UrbanPop, data=USArrests)
jill=brm(Murder ~ Assault*Rape*UrbanPop, data=USArrests, cores=4)

## Compiling the C++ model

## Start sampling

## Warning: There were 1309 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

## Warning: There were 2396 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded

## Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low

## Warning: Examine the pairs() plot to diagnose sampling problems

summary(jack)

## 
## Call:
## lm(formula = Murder ~ Assault * Rape * UrbanPop, data = USArrests)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1508 -1.2188 -0.1551  1.3878  6.7749 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)  
## (Intercept)           -9.493e+00  1.095e+01  -0.867   0.3908  
## Assault                1.155e-01  5.406e-02   2.137   0.0385 *
## Rape                   9.047e-01  8.059e-01   1.123   0.2680  
## UrbanPop               8.132e-02  1.798e-01   0.452   0.6535  
## Assault:Rape          -4.083e-03  3.299e-03  -1.238   0.2228  
## Assault:UrbanPop      -8.658e-04  9.014e-04  -0.961   0.3423  
## Rape:UrbanPop         -8.713e-03  1.182e-02  -0.737   0.4653  
## Assault:Rape:UrbanPop  4.348e-05  4.888e-05   0.889   0.3788  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.517 on 42 degrees of freedom
## Multiple R-squared:  0.7137, Adjusted R-squared:  0.666 
## F-statistic: 14.95 on 7 and 42 DF,  p-value: 1.313e-09

library(car)  # library contains the 'vif' command

## Loading required package: carData

vif(jack)     # to test for multicollinearity

##               Assault                  Rape              UrbanPop 
##             156.92552             440.58625              52.39664 
##          Assault:Rape      Assault:UrbanPop         Rape:UrbanPop 
##             890.06043             263.27244             703.99981 
## Assault:Rape:UrbanPop 
##            1178.06502

summary(jill)

## Warning: The model has not converged (some Rhats are > 1.1). Do not analyse the results! 
## We recommend running more iterations and/or setting stronger priors.

## Warning: There were 1309 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Murder ~ Assault * Rape * UrbanPop 
##    Data: USArrests (Number of observations: 50) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept                -9.73     13.39   -36.15     8.45          2 2.84
## Assault                   0.11      0.07     0.00     0.26          3 1.80
## Rape                      0.87      0.91    -0.13     2.80          2 3.63
## UrbanPop                  0.08      0.23    -0.25     0.50          3 2.53
## Assault:Rape             -0.00      0.00    -0.01     0.00          2 2.69
## Assault:UrbanPop         -0.00      0.00    -0.00     0.00          3 1.82
## Rape:UrbanPop            -0.01      0.01    -0.04     0.01          2 3.47
## Assault:Rape:UrbanPop     0.00      0.00    -0.00     0.00          2 2.76
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     3.05      0.59     2.19     4.24          3 2.02
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Lots of warnings here. The Bayesian model shows poor convergence properties - high Rhat values. The plot of the MCMC sampling confirms this.

plot(jill)

The issue here is one of multicollinearity. There is non-essential collinearity between the main effects and the interactions (see the results of the “vif” command - vif = variance inflation factor). To address this issue, we’ll center the variables but I won’t standardize them. I usually create new columns.

USArrests$c.Assault = scale(USArrests$Assault, scale=F)
USArrests$c.Rape = scale(USArrests$Rape, scale=F)
USArrests$c.UrbanPop = scale(USArrests$UrbanPop, scale=F)

jack=lm(Murder ~ c.Assault*c.Rape*c.UrbanPop, data=USArrests)
jill=brm(Murder ~ c.Assault*c.Rape*c.UrbanPop, data=USArrests, cores=4)

## Compiling the C++ model

## Start sampling

## Warning: There were 1 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded

## Warning: Examine the pairs() plot to diagnose sampling problems

Re-analysis after centering

Now, let’s look at the results again after centering

summary(jack) # for standard lm model

## 
## Call:
## lm(formula = Murder ~ c.Assault * c.Rape * c.UrbanPop, data = USArrests)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1508 -1.2188 -0.1551  1.3878  6.7749 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  8.484e+00  5.072e-01  16.727  < 2e-16 ***
## c.Assault                    3.257e-02  6.968e-03   4.675 3.04e-05 ***
## c.Rape                       1.230e-01  6.324e-02   1.945   0.0585 .  
## c.UrbanPop                  -9.388e-02  4.448e-02  -2.111   0.0408 *  
## c.Assault:c.Rape            -1.234e-03  7.403e-04  -1.666   0.1031    
## c.Assault:c.UrbanPop         5.732e-05  5.015e-04   0.114   0.9095    
## c.Rape:c.UrbanPop           -1.288e-03  4.475e-03  -0.288   0.7748    
## c.Assault:c.Rape:c.UrbanPop  4.348e-05  4.888e-05   0.889   0.3788    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.517 on 42 degrees of freedom
## Multiple R-squared:  0.7137, Adjusted R-squared:  0.666 
## F-statistic: 14.95 on 7 and 42 DF,  p-value: 1.313e-09

vif(jack)     # Note much less collinearity

##                   c.Assault                      c.Rape 
##                    2.607332                    2.712887 
##                  c.UrbanPop            c.Assault:c.Rape 
##                    3.205184                    2.016545 
##        c.Assault:c.UrbanPop           c.Rape:c.UrbanPop 
##                    3.407201                    3.379031 
## c.Assault:c.Rape:c.UrbanPop 
##                    4.333142

summary(jill) # for Bayesian model

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Murder ~ c.Assault * c.Rape * c.UrbanPop 
##    Data: USArrests (Number of observations: 50) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI
## Intercept                       8.48      0.52     7.43     9.51
## c.Assault                       0.03      0.01     0.02     0.05
## c.Rape                          0.12      0.07    -0.01     0.25
## c.UrbanPop                     -0.09      0.05    -0.19    -0.00
## c.Assault:c.Rape               -0.00      0.00    -0.00     0.00
## c.Assault:c.UrbanPop            0.00      0.00    -0.00     0.00
## c.Rape:c.UrbanPop              -0.00      0.00    -0.01     0.01
## c.Assault:c.Rape:c.UrbanPop     0.00      0.00    -0.00     0.00
##                             Eff.Sample Rhat
## Intercept                         2569 1.00
## c.Assault                         2269 1.00
## c.Rape                            2132 1.00
## c.UrbanPop                        2049 1.00
## c.Assault:c.Rape                  3674 1.00
## c.Assault:c.UrbanPop              2472 1.00
## c.Rape:c.UrbanPop                 2080 1.00
## c.Assault:c.Rape:c.UrbanPop       2068 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     2.60      0.30     2.09     3.25       2177 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

print(summary(jill), digits = 3) # for greater precision, change digits

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Murder ~ c.Assault * c.Rape * c.UrbanPop 
##    Data: USArrests (Number of observations: 50) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI
## Intercept                      8.482     0.523    7.434    9.513
## c.Assault                      0.033     0.007    0.018    0.047
## c.Rape                         0.123     0.066   -0.008    0.252
## c.UrbanPop                    -0.094     0.046   -0.185   -0.004
## c.Assault:c.Rape              -0.001     0.001   -0.003    0.000
## c.Assault:c.UrbanPop           0.000     0.001   -0.001    0.001
## c.Rape:c.UrbanPop             -0.001     0.005   -0.010    0.008
## c.Assault:c.Rape:c.UrbanPop    0.000     0.000   -0.000    0.000
##                             Eff.Sample  Rhat
## Intercept                         2569 1.000
## c.Assault                         2269 1.001
## c.Rape                            2132 1.001
## c.UrbanPop                        2049 1.002
## c.Assault:c.Rape                  3674 0.999
## c.Assault:c.UrbanPop              2472 1.001
## c.Rape:c.UrbanPop                 2080 1.001
## c.Assault:c.Rape:c.UrbanPop       2068 1.001
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample  Rhat
## sigma    2.598     0.296    2.091    3.246       2177 1.001
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

plot(jill)

Model comparison

I’d like to compare the complex full factorial model with a main-effects-only model using Bayes Factor and WAIC, but I need to refit them using “save_all_pars = T.”

jill=brm(Murder ~ c.Assault*c.Rape*c.UrbanPop, data=USArrests, save_all_pars = T, iter=5000, cores=4)
jill2=brm(Murder ~ c.Assault+c.Rape+c.UrbanPop, data=USArrests, save_all_pars = T, iter=5000, cores=4)
waic(jill, jill2)

##                WAIC    SE
## jill         245.85 10.44
## jill2        243.18 10.78
## jill - jill2   2.67  4.98

bayes_factor(jill, jill2)

## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 6
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5

## The estimated Bayes factor in favor of x1 over x2 is equal to: 0

We find that the simpler model, jill2, is likely to predict better out-of-sample and is much more strongly supported by the data. But, WAIT… these models used data-informed priors for the intercept and error distribution, but flat priors for the regression weights. So, although the WAIC can still be interpreted, the BF cannot. So, I need to rerun it with narrower priors.

jill3=brm(Murder ~ c.Assault*c.Rape*c.UrbanPop, data=USArrests, prior=c(set_prior("normal(0,1)", class= "b")), save_all_pars = T, iter=5000, cores=4)
jill4=brm(Murder ~ c.Assault+c.Rape+c.UrbanPop, data=USArrests, prior=c(set_prior("normal(0,1)", class= "b")), save_all_pars = T, iter=5000, cores=4)
bayes_factor(jill3, jill4) # evidence in favor of jill3 over jill4

## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5

## The estimated Bayes factor in favor of x1 over x2 is equal to: 0

bayes_factor(jill4, jill3) # evidence in favor of jill4 over jill3

## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 6

## The estimated Bayes factor in favor of x1 over x2 is equal to: 1.413627e+12

Graphing

To obtain a nicer graph of the results, try these variations from the bayesplot library. I’m going to plot the full main effects model (jill4).

library(bayesplot)

# The following command shows the parts of our results from which we will extract parts to plot
attributes(as.array(jill4))

## $dim
## [1] 2500    4    7
## 
## $dimnames
## $dimnames$iterations
## NULL
## 
## $dimnames$chains
## [1] "chain:1" "chain:2" "chain:3" "chain:4"
## 
## $dimnames$parameters
## [1] "b_Intercept"    "b_c.Assault"    "b_c.Rape"       "b_c.UrbanPop"  
## [5] "sigma"          "temp_Intercept" "lp__"

mcmc_dens(as.array(jill4), pars=c("b_c.Assault", "b_c.Rape", "b_c.UrbanPop"))

mcmc_areas(as.array(jill4), pars=c("b_c.Assault", "b_c.Rape", "b_c.UrbanPop"), prob=.95)

# following defaults to .5 interval and .95 interval
mcmc_intervals(as.array(jill4), pars=c("b_c.Assault", "b_c.Rape", "b_c.UrbanPop"))

# The following breaks out the results by the four chains - everything is nicely consistent. 
mcmc_violin(as.array(jill4), pars=c("b_c.Assault", "b_c.Rape", "b_c.UrbanPop"))

Publishable plots

A basic plotting function is provided in marginal_effects. It shows each variable’s effect when the others are set to a value of zero.

marginal_effects(jill4)

To gain greater control, you can use ggplot. In this example, I’m only plotting assaults by using the “[[1]]” to subset them out. To see the parts of marginal_effects, you can use summary which shows each part in order; Assaults is the first, hence the use of [[1]].

By using ggplot, I can control the y-axis range and the axis labels.

library(ggplot2)
summary(marginal_effects(jill4))

##            Length Class      Mode
## c.Assault  9      data.frame list
## c.Rape     9      data.frame list
## c.UrbanPop 9      data.frame list

# Note that I'm adding the mean back into my centered c.Assault value for plotting
ggplot(data=marginal_effects(jill4)[[1]],
      aes(x=c.Assault+mean(USArrests$Assault), y=estimate__)) + geom_line() + ylim(0,20) + 
      xlab("Assaults") + ylab("Murders") + 
      geom_ribbon(aes(ymax=upper__, ymin=lower__), alpha=.3)