model_parameters(): show both raw and standardized coefficients #655

bwiernik · 2022-01-23T19:13:30Z

Ala https://gist.github.com/bwiernik/55571550aa45d57ddaf7cd680c3211c3

library(dplyr, warn.conflicts = FALSE)
library(parameters)

mod <- lm(mpg ~ disp + wt,  data = mtcars)

# Both
left_join(
  parameters(mod),
  select(
    parameters(mod, standardize = "refit"), 
    Parameter, "Std. Coefficient" = Coefficient, 
    "Std._CI_low" = CI_low, "Std._CI_high" = CI_high
  ),
  by = "Parameter"
)
#> Parameter   | Coefficient |       SE |         95% CI | t(29) |      p | Std. Coefficient |    Std. 95% CI
#> ----------------------------------------------------------------------------------------------------------
#> (Intercept) |       34.96 |     2.16 | [30.53, 39.39] | 16.15 | < .001 |         1.85e-17 | [-0.17,  0.17]
#> disp        |       -0.02 | 9.19e-03 | [-0.04,  0.00] | -1.93 | 0.064  |            -0.36 | [-0.75,  0.02]
#> wt          |       -3.35 |     1.16 | [-5.73, -0.97] | -2.88 | 0.007  |            -0.54 | [-0.93, -0.16]
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using a
#>   Wald t-distribution approximation.

I'm thinking an argument like add_standardized. FALSE (default) to replace raw coefficients with standardized (if any), TRUE to add the retain the raw results and add standardized coefficients and CI (if any).

The text was updated successfully, but these errors were encountered:

mattansb · 2022-01-24T07:47:40Z

How would you deal with interactions?

library(dplyr, warn.conflicts = FALSE)
library(parameters)

mod <- lm(mpg ~ disp * wt,  data = mtcars)

# Both
left_join(
  parameters(mod),
  select(
    parameters(mod, standardize = "refit"), 
    Parameter, "Std. Coefficient" = Coefficient, 
    "Std._CI_low" = CI_low, "Std._CI_high" = CI_high
  ),
  by = "Parameter"
)
#> Parameter   | Coefficient |       SE |         95% CI | t(28) |      p | Std. Coefficient |    Std. 95% CI
#> ----------------------------------------------------------------------------------------------------------
#> (Intercept) |       44.08 |     3.12 | [37.68, 50.48] | 14.11 | < .001 |            -0.20 | [-0.39, -0.02]
#> disp        |       -0.06 |     0.01 | [-0.08, -0.03] | -4.26 | < .001 |            -0.38 | [-0.71, -0.06]
#> wt          |       -6.50 |     1.31 | [-9.19, -3.81] | -4.95 | < .001 |            -0.62 | [-0.94, -0.29]
#> disp * wt   |        0.01 | 3.26e-03 | [ 0.01,  0.02] |  3.60 | 0.001  |             0.24 | [ 0.10,  0.37]
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using a
#>   Wald t-distribution approximation.

^{Created on 2022-01-24 by the reprex package (v2.0.1)}

Coefficient wt and Std. Coefficient wt no longer have the same meaning...

(Unless you use one of the post-hoc methods:)

left_join(
  parameters(mod),
  select(
    parameters(mod, standardize = "basic"), 
    Parameter, "Std. Coefficient" = Std_Coefficient, 
    "Std._CI_low" = CI_low, "Std._CI_high" = CI_high
  ),
  by = "Parameter"
)
#> Parameter   | Coefficient |       SE |         95% CI | t(28) |      p | Std. Coefficient |    Std. 95% CI
#> ----------------------------------------------------------------------------------------------------------
#> (Intercept) |       44.08 |     3.12 | [37.68, 50.48] | 14.11 | < .001 |             0.00 | [ 0.00,  0.00]
#> disp        |       -0.06 |     0.01 | [-0.08, -0.03] | -4.26 | < .001 |            -1.16 | [-1.72, -0.60]
#> wt          |       -6.50 |     1.31 | [-9.19, -3.81] | -4.95 | < .001 |            -1.05 | [-1.49, -0.62]
#> disp * wt   |        0.01 | 3.26e-03 | [ 0.01,  0.02] |  3.60 | 0.001  |             1.30 | [ 0.56,  2.04]
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using a
#>   Wald t-distribution approximation.

^{Created on 2022-01-24 by the reprex package (v2.0.1)}

bwiernik · 2022-01-24T11:55:01Z

I think it is fine to not do anything special. If someone is going to have an interaction, the raw variables should at least be centered anyway

strengejacke · 2022-04-02T22:00:03Z

what about p-values, which might change?

library(datawizard)
library(parameters)

mod <- lm(mpg ~ disp * wt + gear,  data = mtcars)

data_merge(
  parameters(mod),
  data_select(
    parameters(mod, standardize = "refit"), 
    c("Parameter", "Coefficient", "CI_low", "CI_high", "p")
  ) |> data_addprefix("Std._", select = 2:5),
  by = "Parameter",
  join = "left"
)
#> Parameter   | Coefficient |       SE |         95% CI | t(27) |      p | Std._Coefficient |    Std. 95% CI |   Std._p
#> ---------------------------------------------------------------------------------------------------------------------
#> (Intercept) |       47.60 |     4.97 | [37.40, 57.81] |  9.57 | < .001 |            -0.21 | [-0.39, -0.02] |     0.03
#> disp        |       -0.06 |     0.01 | [-0.09, -0.03] | -4.32 | < .001 |            -0.40 | [-0.73, -0.07] |     0.02
#> wt          |       -6.77 |     1.35 | [-9.54, -4.00] | -5.01 | < .001 |            -0.65 | [-0.99, -0.31] | 5.22e-04
#> gear        |       -0.68 |     0.74 | [-2.20,  0.85] | -0.91 | 0.370  |            -0.08 | [-0.27,  0.10] |     0.37
#> disp * wt   |        0.01 | 3.27e-03 | [ 0.01,  0.02] |  3.64 | 0.001  |             0.24 | [ 0.10,  0.37] | 1.14e-03
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed using a
#>   Wald t-distribution approximation.

^{Created on 2022-04-02 by the reprex package (v2.0.1)}

bwiernik · 2022-04-03T08:04:12Z

p-value based inference should be done on the raw coefficients. If both raw and standardized coefficients are shown like suggested here, the standardized values are just to aid interpretation of the effect sizes.

Strictly speaking, both the p values and CIs we currently report for standardized effects aren't strictly accurate because they don't include uncertainty in the means or SDs used to standardize. This is usually ignored by most folks. But, ideally we would include options to compute CIs accounting for those (cf. here and here).

strengejacke · 2022-04-03T08:25:45Z

The supplemental (https://static-content.springer.com/esm/art%3A10.1007%2Fs11336-013-9380-y/MediaObjects/11336_2013_9380_MOESM2_ESM.pdf) has code how to do this, if I understand right. Must take a closer look at it.

But given the fact that "usual" CIs from standardized coefficient are not accurate, we probably should only report the std. coefficients along the raw coefficients for now?

bwiernik · 2022-04-03T08:32:36Z

No, report the coefficient and CI in the usual way. The inaccuracy is not too large, and it is what people generally use. It's good enough to enable interpretation of uncertainty in the standardized metric

bwiernik · 2022-04-03T08:33:17Z

(Note that I linked to updated R code in the issue Mattan opened.)

vincentarelbundock · 2022-06-25T10:15:06Z

Unsollicited opinion, but I’m starting to worry a lot about feature and argument sprawl in the easystats packages. Documentation is already hard to read because there are so many options and arguments, and those often depend on model type.

Also, this is already kinda supported, because we can call compare_parameters on the output of parameters:

library(parameters)

mod1 <- lm(mpg ~ factor(cyl) + hp + am, mtcars) |>
    parameters()
mod2 <- lm(mpg ~ factor(cyl) + hp + am, mtcars) |>
    parameters(standardize = "refit")
compare_parameters(mod1, mod2)
#> Parameter                     |                 mod1 |                 mod2
#> ---------------------------------------------------------------------------
#> (Intercept)                   | 27.30 (24.37, 30.22) |  0.40 (-0.08,  0.88)
#> hp                            | -0.04 (-0.07, -0.01) | -0.50 (-0.84, -0.16)
#> am                            |  4.16 ( 1.58,  6.74) |  0.34 ( 0.13,  0.56)
#> cyl (6)                       | -3.92 (-7.08, -0.77) |                     
#> cyl (8)                       | -3.53 (-8.67,  1.60) |                     
#> factor(cyl)-0.104987808575239 |                      | -0.65 (-1.17, -0.13)
#> factor(cyl)1.01488214956065   |                      | -0.59 (-1.44,  0.27)
#> ---------------------------------------------------------------------------
#> Observations                  |                   32 |                   32

bwiernik · 2022-06-25T10:24:13Z

This is the sort of feature that new users of R expect to be easy to do, but is inexplicably difficult

strengejacke added the enhancement 💥 Implemented features can be improved or revised label Jan 28, 2022

mattansb mentioned this issue May 3, 2022

Better CIs (and p-values?) for std-coefs #705

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_parameters(): show both raw and standardized coefficients #655

model_parameters(): show both raw and standardized coefficients #655

bwiernik commented Jan 23, 2022

mattansb commented Jan 24, 2022

bwiernik commented Jan 24, 2022

strengejacke commented Apr 2, 2022

bwiernik commented Apr 3, 2022

strengejacke commented Apr 3, 2022

bwiernik commented Apr 3, 2022

bwiernik commented Apr 3, 2022

vincentarelbundock commented Jun 25, 2022

bwiernik commented Jun 25, 2022

model_parameters(): show both raw and standardized coefficients #655

model_parameters(): show both raw and standardized coefficients #655

Comments

bwiernik commented Jan 23, 2022

mattansb commented Jan 24, 2022

bwiernik commented Jan 24, 2022

strengejacke commented Apr 2, 2022

bwiernik commented Apr 3, 2022

strengejacke commented Apr 3, 2022

bwiernik commented Apr 3, 2022

bwiernik commented Apr 3, 2022

vincentarelbundock commented Jun 25, 2022

bwiernik commented Jun 25, 2022