Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with or_gam #48

Open
rpouillot opened this issue Mar 27, 2021 · 1 comment
Open

Issue with or_gam #48

rpouillot opened this issue Mar 27, 2021 · 1 comment

Comments

@rpouillot
Copy link

Hi,

I had to estimate some odds-ratios from gam models and found your package. After some tries, I found that the confidence intervals were unexpectedly small.
I checked the code of your or_gam function and found that the standard errors (se) of the odds-ratios (in the "link" scale) were estimated as the difference between the se of the estimates (since pred_gam2_ci_low <- pred_gam1[1] - (2 * pred_gam1[2]) - pred_gam2[1] - (2 * pred_gam2[2]) which is equivalent to pred_gam2_ci_low <- (pred_gam1[1] - pred_gam2[1]) - (2 * (pred_gam1[2] - pred_gam2[2])) ) and I think it is not statistically sound. The variance of the difference in the estimates is actually equal to the sum of the variances minus twice their covariances.

I can illustrates the issue as following.

library(oddsratio)
library(mgcv)
library(MASS)

#####################################################
# CHECK GLM

fit_glm <- glm(admit ~ gre + gpa + rank,
               data = data_glm,
               family = "binomial"
) # fit model

or_glm(data = data_glm, model = fit_glm, incr = list(gre = 380, gpa = 4))

#  predictor oddsratio ci_low (2.5) ci_high (97.5)          increment
#1       gre     2.364        1.054          5.396                380
#2       gpa    24.932        1.899        349.524                  4
#3     rank2     0.509        0.272          0.945 Indicator variable
#4     rank3     0.262        0.132          0.512 Indicator variable
#5     rank4     0.212        0.091          0.471 Indicator variable

# MASS equivalence
data_glm$greScaled <- data_glm$gre / 380  
data_glm$gpaScaled <- data_glm$gpa  / 4  

fit_glmScaled <- glm(admit ~ greScaled + gpaScaled + rank,
               data = data_glm,
               family = "binomial"
) # fit model Scaled

# USING MASS function: we got the same results
exp(cbind(coef(fit_glmScaled), confint(fit_glmScaled)))

#Waiting for profiling to be done...
#                             2.5 %      97.5 %
#(Intercept)  0.0185001 0.001889165   0.1665354
#greScaled    2.3642995 1.053676012   5.3958610
#gpaScaled   24.9319521 1.898727216 349.5235388
#rank2        0.5089310 0.272289674   0.9448343
#rank3        0.2617923 0.131641717   0.5115181
#rank4        0.2119375 0.090715546   0.4706961

# No issue here.

#####################################################
# CHECK GAM, without smoothing: this is "similar" to a classical glm

fit_gamAsGLM <- gam(admit ~ gre + gpa + rank,
               data = data_glm,
               family = "binomial"
) 

or_gam(data = data_glm, model = fit_gamAsGLM, pred = "gre", values = c(0,380))
#  predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1       gre      0    380    2.3643      1.146592         4.87524

or_gam(data = data_glm, model = fit_gamAsGLM, pred = "gpa", values = c(0,4))
#  predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1       gpa      0      4  24.93195      4.042141        153.7804

# We should get results similar as `or_glm`. The CIs from or_gam are (sometime largely) underestimated.

Here is a proposal to solve the issue. It is based on help('predict.gam') example to estimate the variance of sum of predictions using lpmatrix. Here, we want the variance of a difference but it is straightforward.

# Evaluate "other parameters"
meanGre <- mean(data_glm$gre)
meanGpa <- mean(data_glm$gpa)
# Any rank is valid, because it is not smoothed (additive effect)
AnyRank <- data_glm$rank[1]

# Build a test data with values
testdatagre <- data.frame(gre = c(0, 380),
                          gpa = meanGpa, rank = AnyRank)

testdatagpa <- data.frame(gre = meanGre,
                          gpa = c(0, 4), rank = AnyRank)


# This function provides Or and ci for odds ratio.
# Note: it could easily be vectorised for your 'slice' option
# (just increase the size of `testdata`. The results will be relatively to the first line)

or_gam2 <- function(model, testdata){
  n <- nrow(testdata)
  Xp <- predict(model, testdata, type="lpmatrix")
  a <- cbind(-1, diag(n-1))
  Xs <- a %*% Xp
  # log Odds-Ratios
  LnOR <- Xs %*% coef(model)
  OR <- exp(LnOR) # gives OR
  # variance of the prediction
  var.LnOR <- Xs %*% model$Vp %*% t(Xs)
  sd.LnOR <- sqrt(diag(var.LnOR))
  return(c(OR=OR, 
           lower= exp(LnOR-1.96*sd.LnOR), 
           upper= exp(LnOR+1.96*sd.LnOR)))
}

or_gam2(fit_gamAsGLM, testdatagre)

#      OR    lower    upper 
# 2.364300 1.046731 5.340350 

or_gam2(fit_gamAsGLM, testdatagpa)
#        OR      lower      upper 
#24.931952   1.849078 336.168826 

We get much closer CIs. Note that they are not strictly the same because the se estimations in the gam function
are completely different than in the glm function. Here is an example with real gam

fit_gam <- gam(admit ~ s(gre) + s(gpa) + rank,
               data = data_glm,
               family = "binomial"
) # fit model

or_gam(data = data_glm, model = fit_gam, pred = "gpa", values=c(3.4,3.5))

# predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1       gpa    3.4    3.5  1.219184      1.211133        1.227289

# The CIs are extremely tights.  

testdatagpa <- data.frame(gre = meanGre,
                          gpa = c(3.4,3.5), rank = AnyRank)

or_gam2(fit_gam, testdatagpa)
#      OR    lower    upper 
#1.219184 1.026426 1.448141 

This function could simply be adapted to your framework. Hope it helps. Let me know what you think.

@pat-s
Copy link
Owner

pat-s commented Dec 22, 2022

Hi @rpouillot

Thanks for the detailed description and sorry for my late answer (didn't watch the repo unfortunately).
I think I've never aimed to fit a GLM without smoothing - to me, smoothing is the essential part of the GAM.

That being sad, I also cannot fully judge if the CI values of the GAM are possibly not sound when not using smoothers. Does this only apply to this case of also when using smoothers?

I'm open to approaches that outline the issue and propose an alternative solution (like yours).
Is this still of interest to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants