Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating by-group estimates and CIs for a random intercept grouping variable in lmer() #894

Open
OmidGhasemi21 opened this issue Aug 23, 2023 · 10 comments
Labels
feature idea 🔥 New feature or request

Comments

@OmidGhasemi21
Copy link

Hi there,

I am trying to extract by-group marginal means and their CIs of an lmer() model with group as a random effect. For example:

library(lme4)
library(tidyverse)
library(parameters)

data <- read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv") %>%
  drop_na()

model <- lmer(flipper_length_mm ~ bill_depth_mm + (bill_depth_mm | species),
              data = data)

I can access the bill_depth_mm estimate for each species by using coef(model), but I was wondering if it is possible to get the CIs with parameters. Reading the tutorials, I realized that I can get SEs using:

standard_error(model, effects= "random")

I was wondering if these are SEs for the by-group estimates and if calculating CIs manually makes sense here. Is there any function in parameters that I am missing for calculating CIs?

@bwiernik
Copy link
Contributor

Obtaining those standard errors is difficult because it requires knowing the correlation between the fixed effects and random effects estimates.

Because of the algebraic tricks that lme4 uses, getting those correlations isn't really possible.

In glmmTMB, getting those correlations is feasible, but hasn't yet been implemented. See glmmTMB/glmmTMB#691

So, neither of packages currently provides a way to get these standard errors. If uncertainty for coef() is needed, I suggest using the brms package instead.

@mattansb
Copy link
Member

mattansb commented Aug 23, 2023

@bwiernik can lme4::bootMer(use.u=TRUE) be used?

library(lme4)

fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

get_randint <- function(mod) {
  RE <- coef(mod)[[1]]
  setNames(RE[["(Intercept)"]], nm = rownames(RE))
}

boot_res <- bootMer(fm1, FUN = get_randint, nsim = 599, use.u = TRUE)

confint(boot_res)
#>        2.5 %   97.5 %
#> 308 239.9207 281.6592
#> 309 194.1790 236.5172
#> 310 196.8933 239.6700
#> 330 245.2556 283.7539
#> 331 246.0527 288.3115
#> 332 239.8875 275.8517
#> 333 247.3167 282.7469
#> 334 228.4844 262.1989
#> 335 219.6058 265.8239
#> 337 261.5971 304.9581
#> 349 212.0846 252.6489
#> 350 226.4115 269.0781
#> 351 233.6814 272.5746
#> 352 250.7109 288.1504
#> 369 233.9859 272.8441
#> 370 214.5085 255.1081
#> 371 233.9799 271.1498
#> 372 244.7220 281.5601

@strengejacke
Copy link
Member

You can set group_level = TRUE:

library(lme4)
#> Loading required package: Matrix
library(tidyverse)
library(parameters)

data <- read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv") %>%
  drop_na()
#> New names:
#> • `` -> `...1`
#> Rows: 344 Columns: 9
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): species, island, sex
#> dbl (6): ...1, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

model <- lmer(flipper_length_mm ~ bill_depth_mm + (bill_depth_mm | species),
              data = data)

model_parameters(model, group_level = TRUE)
#> # Fixed Effects
#> 
#> Parameter     | Coefficient |   SE |           95% CI | t(327) |      p
#> -----------------------------------------------------------------------
#> (Intercept)   |      146.84 | 8.31 | [130.49, 163.20] |  17.66 | < .001
#> bill depth mm |        3.24 | 0.92 | [  1.43,   5.05] |   3.52 | < .001
#> 
#> # Random Effects: species
#> 
#> Parameter                 | Coefficient |   SE |          95% CI
#> ----------------------------------------------------------------
#> (Intercept) [Adelie]      |        9.58 | 6.00 | [ -2.17, 21.33]
#> (Intercept) [Chinstrap]   |       -8.81 | 7.98 | [-24.45,  6.82]
#> (Intercept) [Gentoo]      |       -0.77 | 6.52 | [-13.54, 12.01]
#> bill_depth_mm [Adelie]    |       -1.40 | 0.33 | [ -2.04, -0.76]
#> bill_depth_mm [Chinstrap] |       -0.10 | 0.43 | [ -0.95,  0.75]
#> bill_depth_mm [Gentoo]    |        1.50 | 0.43 | [  0.65,  2.36]
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#>   using a Wald t-distribution approximation.

Created on 2023-08-24 with reprex v2.0.2

@bwiernik
Copy link
Contributor

@strengejacke Those are the group-level random effects estimates (as returned by ranef()), not the group-level total effects estimates (fixed + random, as returned by coef()).

@mattansb yes, you could use bootMer() here:

library(lme4)
(fm1 <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy))
coef(fm1)
boot1 <- bootMer(fm1, \(x) coef(x)$Subject[, "Days"], use.u = FALSE, nsim = 100)
cbind(boot1$t0, t(apply(boot1$t, 2, quantile, probs = c(.025, .975))))

@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented Aug 24, 2023

@mattansb
Copy link
Member

@bwiernik I thought I'd want use.u = TRUE here - can you explain? 🙏

@OmidGhasemi21
Copy link
Author

OmidGhasemi21 commented Aug 24, 2023

Hi all,

I really appreciate your help.

@bwiernik, my original plan is to use brms, but I thought it would be good to have a frequentist study to make sure that the results are robust and stay the same with lmer.

I did not understand @bwiernik codes, but with @mattansb codes, I calculated CIs for group-level estimates and the results are a bit weird (not sure if I did it correctly).

fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

get_randint <- function(mod) {
  RE <- coef(mod)[[1]]
  setNames(RE[["Days"]], nm = rownames(RE)) # I changed (Intercept) to Days here
}

boot_res <- bootMer(fm1, FUN = get_randint, nsim = 599, use.u = TRUE)

boot_res %>%
  confint() %>%
  data.frame() %>%
  rownames_to_column("Subject") %>% 
  rename(conf.low = X2.5.., conf.high = X97.5..) %>%
  left_join(., coef(fm1)$Subject %>%
              data.frame() %>%
              rownames_to_column("Subject") %>%
              select(estimate = Days, Subject), by="Subject") %>% 
  ggplot(aes(x = estimate, y = Subject)) +
  geom_pointrange(aes(x = estimate, y = Subject, xmin= conf.low, xmax= conf.high)) +
  geom_vline(xintercept = 0, linetype = "dashed", size = .2)+
  theme_bw()

Here is the result:

Screenshot 2023-08-25 at 9 12 56 am

Should not the estimates of coef() be the midpoint of CI?

And speaking of comparing Bayesian and Frequentist analyses, is it normal that CIs are narrower than HDIs?

Screenshot 2023-08-25 at 9 31 07 am

@bwiernik
Copy link
Contributor

Bootstrapped CIs aren't necessarily symmetrical (one of the advantages of bootstrap is that it can accurately estiamte sampling distributions that are non-normal), but I would expect these intervals to be roughly symmetrical, so not exactly sure why that's not the case here. It is consistent when I increase the number of nsim and set a different random seed, so it's not just monte carlo error.

Personally, I would trust the HMC-based intervals given by brms over these parametric bootstrap intervals from bootMer() and, in my experience, the random effects estimates from brms are more plausible. (I find that lmer() seems to overly shrink values across groups in my work.) So I would recommend using brms in general for this type of inference—the default priors are essentially non-informative and so the default brms estimates have very good frequentist properties.

(One note, you generally want to specify type = "perc" when you use confint() bootstraps.)

library(lme4)
library(tidyverse)

fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

get_randslope <- function(mod) {
  RE <- coef(mod)[[1]]
  setNames(RE[["Days"]], nm = rownames(RE)) # I changed (Intercept) to Days here
}

boot_res <- bootMer(fm1, FUN = get_randslope, nsim = 5000, use.u = TRUE, seed = 33939292)

boot_res %>%
  confint(type = "perc") %>%
  as.data.frame() %>%
  rownames_to_column("Subject") %>% 
  rename(CI_low = "2.5 %", CI_high = "97.5 %") %>%
  left_join(enframe(boot_res$t0, name = "Subject", value = "Estimate"), ., by = "Subject") %>% 
  ggplot() +
  aes(x = Estimate, y = Subject, xmin = CI_low, xmax = CI_high) +
  geom_pointrange() +
  geom_vline(xintercept = 0, linetype = "dashed", linewidth = .2)+
  theme_bw()

@bwiernik
Copy link
Contributor

@bwiernik I thought I'd want use.u = TRUE here - can you explain? 🙏

Yes you would want use.u = TRUE. My bad

@strengejacke strengejacke added the feature idea 🔥 New feature or request label Sep 1, 2023
@bbolker
Copy link

bbolker commented May 11, 2024

Updating. These are probably available now due to improvements in lme4 (predict(..., se.fit = TRUE))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature idea 🔥 New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants