Example with interactions

This file demonstrates how to run a model with multiple predictors taht interact. A good understanding of centering and effect coding from regression will help. My focus will be on the nlme syntax

We begin by loading the necessary R libraries.

The Basics - Loading data, preparing for analysis

library(nlme)     # loads the library used for nonlinear multilevel modeling commands
library(lattice)  # loads the library that includes xyplot

Load data and set up for analysis using effect coding (contr.sum) and centered predictors (by subtracting the mean).

myd<-read.csv("Young E1 DelayDiscounting.dat")
myd$Cluster<-as.factor(myd$Cluster)
contrasts(myd$Sex)=contr.sum(2)
contrasts(myd$Cluster)=contr.sum(3)
myd$c.VGExp<-myd$VGExp-mean(myd$VGExperience, na.rm=T)  # create centered experience variable

Run the analysis

Run a model with the full interaction between Sex and (centered) VG Experience. For starting values we need 4: male-intercept, female-intercept, male-VG slope, female-VG slope. Note that by using effect coding, the interpretation is different, but thinking of it this way allows you to count up the necessary number of starting values.

myr<-nlme(Indifference.point~1000/(1+exp(logk)*Delay), fixed=logk~Sex*c.VGExp, random=logk~1|Subject, data=subset(myd, !is.na(c.VGExp)), start=c(logk=-6, 0, 0, 0))
summary(myr)

## Nonlinear mixed-effects model fit by maximum likelihood
##   Model: Indifference.point ~ 1000/(1 + exp(logk) * Delay) 
##  Data: subset(myd, !is.na(c.VGExp)) 
##        AIC      BIC    logLik
##   2233.021 2251.693 -1110.511
## 
## Random effects:
##  Formula: logk ~ 1 | Subject
##         logk.(Intercept) Residual
## StdDev:         2.505964 143.1833
## 
## Fixed effects: logk ~ Sex * c.VGExp 
##                       Value Std.Error  DF   t-value p-value
## logk.(Intercept)  -6.746734 0.8316339 128 -8.112626  0.0000
## logk.Sex1          1.385526 0.8316339 128  1.666029  0.0982
## logk.c.VGExp       0.274983 0.1754461 128  1.567338  0.1195
## logk.Sex1:c.VGExp -0.068744 0.1754461 128 -0.391822  0.6958
##  Correlation: 
##                   lg.(I) lgk.S1 l..VGE
## logk.Sex1         -0.772              
## logk.c.VGExp      -0.751  0.803       
## logk.Sex1:c.VGExp  0.803 -0.751 -0.844
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -6.869146311 -0.438504679 -0.002750696  0.256652846  2.771505684 
## 
## Number of Observations: 166
## Number of Groups: 35

As in regression based on centered and effect-coded variables, the intercept is the grand mean k. The regression equation is -6.75 + 1.39*Sex + .27*c.VGExp - .069*Sex*c.VGExp. To derive predictions, Sex is -1 for males and +1 for females (see the contrasts).

Each p-value evaluates each term: Intercept > 0, Sex1 dummy > 0 (main effect of sex), VG Exp > 0 (main effect of VG Exp), Sex1:c.VGExp > 0 (interaction).

You can also use the emmeans library to derive for group means and (supposedly) group slopes.

library(emmeans)
emmeans(myr, "Sex", param="logk")

## NOTE: Results may be misleading due to involvement in interactions

##  Sex    emmean        SE  df  lower.CL  upper.CL
##  f   -5.361208 0.5553082 128  -6.45998 -4.262436
##  m   -8.132260 1.5464251 128 -11.19213 -5.072394
## 
## Confidence level used: 0.95

To compare the effects of video game experience separately for each sex, you should be able to use the emtrends command, but I can’t get it to work for this model. So, I’ll use the more laborious multcomp library. Although the non-significant Sex*VGExp interaction indicates these slopes don’t differ, I’ll demonstrate how to estimate the two slopes.

The four elements in the matrix correspond to the four fixed effects in the model summary. Since we are testing slopes, the 1 in the third column is the main effect for slope and the +1 or -1 in the fourth column indicates the sex adjustments.

# The following isn't working
#emtrends(myr, "Sex", var="c.VGExp", param="logk") # Alternatively you can use the following syntax

library(multcomp)
contrast.matrix <- rbind(
   'Male VG Exp slope' = c(0, 0, 1, 1),         
   'Female VG Exp slope' = c(0, 0, 1, -1)
 )
comps <- glht(myr, contrast.matrix)
summary(comps)

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: nlme.formula(model = Indifference.point ~ 1000/(1 + exp(logk) * 
##     Delay), data = subset(myd, !is.na(c.VGExp)), fixed = logk ~ 
##     Sex * c.VGExp, random = logk ~ 1 | Subject, start = c(logk = -6, 
##     0, 0, 0))
## 
## Linear Hypotheses:
##                          Estimate Std. Error z value Pr(>|z|)  
## Male VG Exp slope == 0    0.20624    0.09679   2.131   0.0651 .
## Female VG Exp slope == 0  0.34373    0.33285   1.033   0.5125  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Note that the the effect of videogame experience on the discounting rate is a bit weaker for the males, but there’s a lot of overlap.

par(mar=c(4,10,4,2)) # necessary to change margins to avoid truncation of labels
plot(comps)

Interacting categorical variables

Now things get complicated for my 2 x 3 design. Six parameters are estimated. One of the challenges is counting the number of parameters for the starting values. When they’re categorical, it’s easy: 2(Sex) x 3(Cluster) = 6.

myr<-nlme(Indifference.point~1000/(1+exp(logk)*Delay), fixed=logk~Sex*Cluster, random=logk~1|Subject, data=subset(myd, !is.na(c.VGExp)&!is.na(Cluster)), start=c(logk=-6, 0, 0, 0, 0, 0))
summary(myr)

## Nonlinear mixed-effects model fit by maximum likelihood
##   Model: Indifference.point ~ 1000/(1 + exp(logk) * Delay) 
##  Data: subset(myd, !is.na(c.VGExp) & !is.na(Cluster)) 
##        AIC     BIC    logLik
##   2177.139 2201.79 -1080.569
## 
## Random effects:
##  Formula: logk ~ 1 | Subject
##         logk.(Intercept) Residual
## StdDev:         2.517503 145.5232
## 
## Fixed effects: logk ~ Sex * Cluster 
##                        Value Std.Error  DF   t-value p-value
## logk.(Intercept)   -6.100571 0.7346577 122 -8.303964  0.0000
## logk.Sex1           0.217885 0.7346577 122  0.296581  0.7673
## logk.Cluster1       0.969568 1.0648788 122  0.910496  0.3644
## logk.Cluster2      -1.226881 0.9231293 122 -1.329046  0.1863
## logk.Sex1:Cluster1 -0.335137 1.0648788 122 -0.314718  0.7535
## logk.Sex1:Cluster2 -0.674706 0.9231293 122 -0.730890  0.4662
##  Correlation: 
##                    lg.(I) lgk.S1 lgk.C1 lgk.C2 l.S1:C1
## logk.Sex1          -0.004                             
## logk.Cluster1       0.070 -0.678                      
## logk.Cluster2      -0.335  0.169 -0.373               
## logk.Sex1:Cluster1 -0.678  0.070 -0.472  0.425        
## logk.Sex1:Cluster2  0.169 -0.335  0.425  0.129 -0.373 
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -6.752183568 -0.513404742 -0.002461481  0.270338455  2.727017331 
## 
## Number of Observations: 161
## Number of Groups: 34

To derive the group estimates, you can again use emmeans. The following also shows the brainless post hoc approach (Tukey’s) testing for all (15!) possible pairwise comparisons.

emmeans(myr, ~Sex*Cluster, param="logk")

##  Sex Cluster    emmean        SE  df   lower.CL    upper.CL
##  f   1       -5.248255 0.5966088 122  -6.429302 -4.06720833
##  m   1       -5.013752 2.5513227 122 -10.064350  0.03684652
##  f   2       -7.784273 1.5660770 122 -10.884479 -4.68406746
##  m   2       -6.870631 1.0756504 122  -8.999989 -4.74127416
##  f   3       -4.615529 2.5500953 122  -9.663698  0.43263893
##  m   3       -7.070986 1.3145705 122  -9.673310 -4.46866235
## 
## Confidence level used: 0.95

pairs(emmeans(myr, ~Sex*Cluster, param="logk"))

##  contrast    estimate       SE  df t.ratio p.value
##  f,1 - m,1 -0.2345035 2.620151 122  -0.089  1.0000
##  f,1 - f,2  2.5360182 1.675870 122   1.513  0.6565
##  f,1 - m,2  1.6223764 1.230027 122   1.319  0.7740
##  f,1 - f,3 -0.6327258 2.618955 122  -0.242  0.9999
##  f,1 - m,3  1.8227308 1.443620 122   1.263  0.8046
##  m,1 - f,2  2.7705216 2.993634 122   0.925  0.9392
##  m,1 - m,2  1.8568798 2.768803 122   0.671  0.9848
##  m,1 - f,3 -0.3982223 3.607247 122  -0.110  1.0000
##  m,1 - m,3  2.0572343 2.870077 122   0.717  0.9796
##  f,2 - m,2 -0.9136418 1.899900 122  -0.481  0.9967
##  f,2 - f,3 -3.1687440 2.992588 122  -1.059  0.8965
##  f,2 - m,3 -0.7132874 2.044674 122  -0.349  0.9993
##  m,2 - f,3 -2.2551022 2.767672 122  -0.815  0.9643
##  m,2 - m,3  0.2003545 1.698564 122   0.118  1.0000
##  f,3 - m,3  2.4554566 2.868986 122   0.856  0.9561
## 
## P value adjustment: tukey method for comparing a family of 6 estimates

It’s better to do targeted comparisons. In this situation, I might want to know male/female differences for each cluster (3 comparisons instead of 15).

pairs(emmeans(myr, ~Sex|Cluster, param="logk"))

## Cluster = 1:
##  contrast   estimate       SE  df t.ratio p.value
##  f - m    -0.2345035 2.620151 122  -0.089  0.9288
## 
## Cluster = 2:
##  contrast   estimate       SE  df t.ratio p.value
##  f - m    -0.9136418 1.899900 122  -0.481  0.6315
## 
## Cluster = 3:
##  contrast   estimate       SE  df t.ratio p.value
##  f - m     2.4554566 2.868986 122   0.856  0.3938

If you compare these results to the prior 15-comparisons output, you’ll notice that the p-values drop because you aren’t penalized for those 12 other comparisons. They’re still not significant, but this is the proper approach for this data set.

ANOVA?

It would be great to get an ANOVA table for this situation due to the three-level cluster variable, but the anova command uses the wrong sum-of-squares (sequential). Easy solution:

anova(myr, type="marginal")

##                  numDF denDF  F-value p-value
## logk.(Intercept)     1   122 68.95581  <.0001
## logk.Sex             1   122  0.08796  0.7673
## logk.Cluster         2   122  0.98293  0.3772
## logk.Sex:Cluster     2   122  0.46767  0.6276