Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between "significant' and "not significant" is not itself statistically significant #67

Open
vincentarelbundock opened this issue Apr 2, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@vincentarelbundock
Copy link

I just learned about Johnson-Neyman plot this morning when I saw a Stack Overflow question about the interactions package.

I wonder if it would be a good idea for the documentation to refer to the classic Gelman & Stern paper The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant

This is obviously not a technical issue with your package (which is great!), but highlighting significant and non-significant regions with sharply different colors might encourage bad statistical practice.

@vincentarelbundock vincentarelbundock added the enhancement New feature or request label Apr 2, 2023
@jacob-long
Copy link
Owner

I've spent some time thinking about this basic interpretational issue in the past although I confess I've forgotten some of it. I think I've seen some discussion about this very thing somewhere.

It's sort of a weird issue: The interaction term is itself a test of differences in slopes, so it is directly addressing the problem at the heart of that paper I so often cite in my peer review reports. On the other hand, it's not the case that the statistical test of the interaction proves some important distinction between the points along the x-axis immediately before and after the transition from blue to pink (or however it may be signified). But then, at what threshold are we allowed to say there is a significant difference between the slopes at different values of X?

@vincentarelbundock
Copy link
Author

Yeah, I agree that this stuff can get super confusing. Here’s how I have come to think about this.

There are basically 3 scientific questions. Each requires a different test. None of them requires us to compare the different color regions in a Johnson-Neyman plot.

  1. Does the slope of Y with respect to X depend on the value of Z?
  2. Is the slope of Y with respect to X different from 0 when Z=1?
  3. Is the slope of Y with respect to X different when Z=1 or Z=2?

Consider this model:

library(ggplot2)
library(interactions)
library(marginaleffects)

mod <- lm(mpg ~ hp * wt, data = mtcars)

Question 1: Does the slope of Y with respect to X depend on the value of Z?

In an linear model like this, the answer to this question can be read off immediately from the interaction coefficient:

summary(mod)
# 
# Call:
# lm(formula = mpg ~ hp * wt, data = mtcars)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -3.0632 -1.6491 -0.7362  1.4211  4.5513 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept) 49.80842    3.60516  13.816 5.01e-14 ***
# hp          -0.12010    0.02470  -4.863 4.04e-05 ***
# wt          -8.21662    1.26971  -6.471 5.20e-07 ***
# hp:wt        0.02785    0.00742   3.753 0.000811 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 2.153 on 28 degrees of freedom
# Multiple R-squared:  0.8848,  Adjusted R-squared:  0.8724 
# F-statistic: 71.66 on 3 and 28 DF,  p-value: 2.981e-13

The hp:wt coefficient is equal to 0.0278 and it is statistically significant. Yes, the slope of mpg with respect to hp depend on the value of wt.

Question 2: Is the slope of Y with respect to X different from 0 when Z=1?

The answer to question 2 can be read off of the interation plot:

plot_slopes(mod, variables = "hp", condition = "wt") + 
    geom_hline(yintercept = 0, color = "orange")

For any point on the x-axis, we know if the slopes of mpg with respect to hp is significantly different from 0 when the interval clears the orange line.

This is equivalent to this command, but I intentially drew the plot without colors to emphasize that all we need is the interval to answer Question 2.

johnson_neyman(mod, "hp", "wt")

Question 3: Is the slope of Y with respect to X different when Z=1 or Z=2?

This is just a restatement of Question 1. We already know the answer: Yes, wt moderates the slope. If we want a more specific estimate we can do things like:

slopes(mod, variables = "hp", newdata = datagrid(wt = c(3.5, 4)))
# 
#  Term Estimate Std. Error      z Pr(>|z|)   2.5 %   97.5 %  hp  wt
#    hp -0.02263    0.00788 -2.872  0.00408 -0.0381 -0.00719 147 3.5
#    hp -0.00871    0.00969 -0.899  0.36886 -0.0277  0.01029 147 4.0
# 
# Columns: rowid, term, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, mpg, hp, wt

The above gives use slopes at two points. We can compare them with the hypothesis argument:

slopes(mod,
    variables = "hp",
    newdata = datagrid(wt = c(3.5, 4)),
    hypothesis = "b1 = b2")
# 
#   Term Estimate Std. Error     z Pr(>|z|)   2.5 %   97.5 %
#  b1=b2  -0.0139    0.00371 -3.75   <0.001 -0.0212 -0.00665
# 
# Columns: term, estimate, std.error, statistic, p.value, conf.low, conf.high

Again, we confirm that the two slopes are different from each other. And this is trivially true for any small change on the x-axis, just because the interaction coefficient is significant.

slopes(mod,
    variables = "hp",
    newdata = datagrid(wt = c(3, 3.0001)),
    hypothesis = "b1 = b2")
# 
#   Term  Estimate Std. Error     z Pr(>|z|)     2.5 %    97.5 %
#  b1=b2 -2.78e-06   7.94e-07 -3.51   <0.001 -4.34e-06 -1.23e-06
# 
# Columns: term, estimate, std.error, statistic, p.value, conf.low, conf.high

My conclusion

It seems to me like the standard approach already gives us all the tools to answer the basic questions of interest precisely and correctly. And I’m not sure what additional question the Johnson-Neyman plot answers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants