library(brms)
I’m going to start with a simple full-factorial regression using three predictors of one outcome variable. We’ll first run it as a standard linear model (regression) using “lm” and then do a Bayesian version using the default settings using “brm”.
jack=lm(Murder ~ Assault*Rape*UrbanPop, data=USArrests)
jill=brm(Murder ~ Assault*Rape*UrbanPop, data=USArrests, cores=4)
## Compiling the C++ model
## Start sampling
## Warning: There were 1309 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: There were 2396 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems
summary(jack)
##
## Call:
## lm(formula = Murder ~ Assault * Rape * UrbanPop, data = USArrests)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1508 -1.2188 -0.1551 1.3878 6.7749
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.493e+00 1.095e+01 -0.867 0.3908
## Assault 1.155e-01 5.406e-02 2.137 0.0385 *
## Rape 9.047e-01 8.059e-01 1.123 0.2680
## UrbanPop 8.132e-02 1.798e-01 0.452 0.6535
## Assault:Rape -4.083e-03 3.299e-03 -1.238 0.2228
## Assault:UrbanPop -8.658e-04 9.014e-04 -0.961 0.3423
## Rape:UrbanPop -8.713e-03 1.182e-02 -0.737 0.4653
## Assault:Rape:UrbanPop 4.348e-05 4.888e-05 0.889 0.3788
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.517 on 42 degrees of freedom
## Multiple R-squared: 0.7137, Adjusted R-squared: 0.666
## F-statistic: 14.95 on 7 and 42 DF, p-value: 1.313e-09
library(car) # library contains the 'vif' command
## Loading required package: carData
vif(jack) # to test for multicollinearity
## Assault Rape UrbanPop
## 156.92552 440.58625 52.39664
## Assault:Rape Assault:UrbanPop Rape:UrbanPop
## 890.06043 263.27244 703.99981
## Assault:Rape:UrbanPop
## 1178.06502
summary(jill)
## Warning: The model has not converged (some Rhats are > 1.1). Do not analyse the results!
## We recommend running more iterations and/or setting stronger priors.
## Warning: There were 1309 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Murder ~ Assault * Rape * UrbanPop
## Data: USArrests (Number of observations: 50)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept -9.73 13.39 -36.15 8.45 2 2.84
## Assault 0.11 0.07 0.00 0.26 3 1.80
## Rape 0.87 0.91 -0.13 2.80 2 3.63
## UrbanPop 0.08 0.23 -0.25 0.50 3 2.53
## Assault:Rape -0.00 0.00 -0.01 0.00 2 2.69
## Assault:UrbanPop -0.00 0.00 -0.00 0.00 3 1.82
## Rape:UrbanPop -0.01 0.01 -0.04 0.01 2 3.47
## Assault:Rape:UrbanPop 0.00 0.00 -0.00 0.00 2 2.76
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma 3.05 0.59 2.19 4.24 3 2.02
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Lots of warnings here. The Bayesian model shows poor convergence properties - high Rhat values. The plot of the MCMC sampling confirms this.
plot(jill)
The issue here is one of multicollinearity. There is non-essential collinearity between the main effects and the interactions (see the results of the “vif” command - vif = variance inflation factor). To address this issue, we’ll center the variables but I won’t standardize them. I usually create new columns.
USArrests$c.Assault = scale(USArrests$Assault, scale=F)
USArrests$c.Rape = scale(USArrests$Rape, scale=F)
USArrests$c.UrbanPop = scale(USArrests$UrbanPop, scale=F)
jack=lm(Murder ~ c.Assault*c.Rape*c.UrbanPop, data=USArrests)
jill=brm(Murder ~ c.Assault*c.Rape*c.UrbanPop, data=USArrests, cores=4)
## Compiling the C++ model
## Start sampling
## Warning: There were 1 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: Examine the pairs() plot to diagnose sampling problems
Now, let’s look at the results again after centering
summary(jack) # for standard lm model
##
## Call:
## lm(formula = Murder ~ c.Assault * c.Rape * c.UrbanPop, data = USArrests)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1508 -1.2188 -0.1551 1.3878 6.7749
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.484e+00 5.072e-01 16.727 < 2e-16 ***
## c.Assault 3.257e-02 6.968e-03 4.675 3.04e-05 ***
## c.Rape 1.230e-01 6.324e-02 1.945 0.0585 .
## c.UrbanPop -9.388e-02 4.448e-02 -2.111 0.0408 *
## c.Assault:c.Rape -1.234e-03 7.403e-04 -1.666 0.1031
## c.Assault:c.UrbanPop 5.732e-05 5.015e-04 0.114 0.9095
## c.Rape:c.UrbanPop -1.288e-03 4.475e-03 -0.288 0.7748
## c.Assault:c.Rape:c.UrbanPop 4.348e-05 4.888e-05 0.889 0.3788
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.517 on 42 degrees of freedom
## Multiple R-squared: 0.7137, Adjusted R-squared: 0.666
## F-statistic: 14.95 on 7 and 42 DF, p-value: 1.313e-09
vif(jack) # Note much less collinearity
## c.Assault c.Rape
## 2.607332 2.712887
## c.UrbanPop c.Assault:c.Rape
## 3.205184 2.016545
## c.Assault:c.UrbanPop c.Rape:c.UrbanPop
## 3.407201 3.379031
## c.Assault:c.Rape:c.UrbanPop
## 4.333142
summary(jill) # for Bayesian model
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Murder ~ c.Assault * c.Rape * c.UrbanPop
## Data: USArrests (Number of observations: 50)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI
## Intercept 8.48 0.52 7.43 9.51
## c.Assault 0.03 0.01 0.02 0.05
## c.Rape 0.12 0.07 -0.01 0.25
## c.UrbanPop -0.09 0.05 -0.19 -0.00
## c.Assault:c.Rape -0.00 0.00 -0.00 0.00
## c.Assault:c.UrbanPop 0.00 0.00 -0.00 0.00
## c.Rape:c.UrbanPop -0.00 0.00 -0.01 0.01
## c.Assault:c.Rape:c.UrbanPop 0.00 0.00 -0.00 0.00
## Eff.Sample Rhat
## Intercept 2569 1.00
## c.Assault 2269 1.00
## c.Rape 2132 1.00
## c.UrbanPop 2049 1.00
## c.Assault:c.Rape 3674 1.00
## c.Assault:c.UrbanPop 2472 1.00
## c.Rape:c.UrbanPop 2080 1.00
## c.Assault:c.Rape:c.UrbanPop 2068 1.00
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma 2.60 0.30 2.09 3.25 2177 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
print(summary(jill), digits = 3) # for greater precision, change digits
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Murder ~ c.Assault * c.Rape * c.UrbanPop
## Data: USArrests (Number of observations: 50)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI
## Intercept 8.482 0.523 7.434 9.513
## c.Assault 0.033 0.007 0.018 0.047
## c.Rape 0.123 0.066 -0.008 0.252
## c.UrbanPop -0.094 0.046 -0.185 -0.004
## c.Assault:c.Rape -0.001 0.001 -0.003 0.000
## c.Assault:c.UrbanPop 0.000 0.001 -0.001 0.001
## c.Rape:c.UrbanPop -0.001 0.005 -0.010 0.008
## c.Assault:c.Rape:c.UrbanPop 0.000 0.000 -0.000 0.000
## Eff.Sample Rhat
## Intercept 2569 1.000
## c.Assault 2269 1.001
## c.Rape 2132 1.001
## c.UrbanPop 2049 1.002
## c.Assault:c.Rape 3674 0.999
## c.Assault:c.UrbanPop 2472 1.001
## c.Rape:c.UrbanPop 2080 1.001
## c.Assault:c.Rape:c.UrbanPop 2068 1.001
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma 2.598 0.296 2.091 3.246 2177 1.001
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot(jill)
I’d like to compare the complex full factorial model with a main-effects-only model using Bayes Factor and WAIC, but I need to refit them using “save_all_pars = T.”
jill=brm(Murder ~ c.Assault*c.Rape*c.UrbanPop, data=USArrests, save_all_pars = T, iter=5000, cores=4)
jill2=brm(Murder ~ c.Assault+c.Rape+c.UrbanPop, data=USArrests, save_all_pars = T, iter=5000, cores=4)
waic(jill, jill2)
## WAIC SE
## jill 245.85 10.44
## jill2 243.18 10.78
## jill - jill2 2.67 4.98
bayes_factor(jill, jill2)
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 6
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## The estimated Bayes factor in favor of x1 over x2 is equal to: 0
We find that the simpler model, jill2, is likely to predict better out-of-sample and is much more strongly supported by the data. But, WAIT… these models used data-informed priors for the intercept and error distribution, but flat priors for the regression weights. So, although the WAIC can still be interpreted, the BF cannot. So, I need to rerun it with narrower priors.
jill3=brm(Murder ~ c.Assault*c.Rape*c.UrbanPop, data=USArrests, prior=c(set_prior("normal(0,1)", class= "b")), save_all_pars = T, iter=5000, cores=4)
jill4=brm(Murder ~ c.Assault+c.Rape+c.UrbanPop, data=USArrests, prior=c(set_prior("normal(0,1)", class= "b")), save_all_pars = T, iter=5000, cores=4)
bayes_factor(jill3, jill4) # evidence in favor of jill3 over jill4
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## The estimated Bayes factor in favor of x1 over x2 is equal to: 0
bayes_factor(jill4, jill3) # evidence in favor of jill4 over jill3
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 6
## The estimated Bayes factor in favor of x1 over x2 is equal to: 1.413627e+12
To obtain a nicer graph of the results, try these variations from the bayesplot library. I’m going to plot the full main effects model (jill4).
library(bayesplot)
# The following command shows the parts of our results from which we will extract parts to plot
attributes(as.array(jill4))
## $dim
## [1] 2500 4 7
##
## $dimnames
## $dimnames$iterations
## NULL
##
## $dimnames$chains
## [1] "chain:1" "chain:2" "chain:3" "chain:4"
##
## $dimnames$parameters
## [1] "b_Intercept" "b_c.Assault" "b_c.Rape" "b_c.UrbanPop"
## [5] "sigma" "temp_Intercept" "lp__"
mcmc_dens(as.array(jill4), pars=c("b_c.Assault", "b_c.Rape", "b_c.UrbanPop"))
mcmc_areas(as.array(jill4), pars=c("b_c.Assault", "b_c.Rape", "b_c.UrbanPop"), prob=.95)
# following defaults to .5 interval and .95 interval
mcmc_intervals(as.array(jill4), pars=c("b_c.Assault", "b_c.Rape", "b_c.UrbanPop"))
# The following breaks out the results by the four chains - everything is nicely consistent.
mcmc_violin(as.array(jill4), pars=c("b_c.Assault", "b_c.Rape", "b_c.UrbanPop"))
A basic plotting function is provided in marginal_effects. It shows each variable’s effect when the others are set to a value of zero.
marginal_effects(jill4)
To gain greater control, you can use ggplot. In this example, I’m only plotting assaults by using the “[[1]]” to subset them out. To see the parts of marginal_effects, you can use summary which shows each part in order; Assaults is the first, hence the use of [[1]].
By using ggplot, I can control the y-axis range and the axis labels.
library(ggplot2)
summary(marginal_effects(jill4))
## Length Class Mode
## c.Assault 9 data.frame list
## c.Rape 9 data.frame list
## c.UrbanPop 9 data.frame list
# Note that I'm adding the mean back into my centered c.Assault value for plotting
ggplot(data=marginal_effects(jill4)[[1]],
aes(x=c.Assault+mean(USArrests$Assault), y=estimate__)) + geom_line() + ylim(0,20) +
xlab("Assaults") + ylab("Murders") +
geom_ribbon(aes(ymax=upper__, ymin=lower__), alpha=.3)