You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had to estimate some odds-ratios from gam models and found your package. After some tries, I found that the confidence intervals were unexpectedly small.
I checked the code of your or_gam function and found that the standard errors (se) of the odds-ratios (in the "link" scale) were estimated as the difference between the se of the estimates (since pred_gam2_ci_low <- pred_gam1[1] - (2 * pred_gam1[2]) - pred_gam2[1] - (2 * pred_gam2[2]) which is equivalent to pred_gam2_ci_low <- (pred_gam1[1] - pred_gam2[1]) - (2 * (pred_gam1[2] - pred_gam2[2])) ) and I think it is not statistically sound. The variance of the difference in the estimates is actually equal to the sum of the variances minus twice their covariances.
I can illustrates the issue as following.
library(oddsratio)
library(mgcv)
library(MASS)
#####################################################
# CHECK GLM
fit_glm <- glm(admit ~ gre + gpa + rank,
data = data_glm,
family = "binomial"
) # fit model
or_glm(data = data_glm, model = fit_glm, incr = list(gre = 380, gpa = 4))
# predictor oddsratio ci_low (2.5) ci_high (97.5) increment
#1 gre 2.364 1.054 5.396 380
#2 gpa 24.932 1.899 349.524 4
#3 rank2 0.509 0.272 0.945 Indicator variable
#4 rank3 0.262 0.132 0.512 Indicator variable
#5 rank4 0.212 0.091 0.471 Indicator variable
# MASS equivalence
data_glm$greScaled <- data_glm$gre / 380
data_glm$gpaScaled <- data_glm$gpa / 4
fit_glmScaled <- glm(admit ~ greScaled + gpaScaled + rank,
data = data_glm,
family = "binomial"
) # fit model Scaled
# USING MASS function: we got the same results
exp(cbind(coef(fit_glmScaled), confint(fit_glmScaled)))
#Waiting for profiling to be done...
# 2.5 % 97.5 %
#(Intercept) 0.0185001 0.001889165 0.1665354
#greScaled 2.3642995 1.053676012 5.3958610
#gpaScaled 24.9319521 1.898727216 349.5235388
#rank2 0.5089310 0.272289674 0.9448343
#rank3 0.2617923 0.131641717 0.5115181
#rank4 0.2119375 0.090715546 0.4706961
# No issue here.
#####################################################
# CHECK GAM, without smoothing: this is "similar" to a classical glm
fit_gamAsGLM <- gam(admit ~ gre + gpa + rank,
data = data_glm,
family = "binomial"
)
or_gam(data = data_glm, model = fit_gamAsGLM, pred = "gre", values = c(0,380))
# predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1 gre 0 380 2.3643 1.146592 4.87524
or_gam(data = data_glm, model = fit_gamAsGLM, pred = "gpa", values = c(0,4))
# predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1 gpa 0 4 24.93195 4.042141 153.7804
# We should get results similar as `or_glm`. The CIs from or_gam are (sometime largely) underestimated.
Here is a proposal to solve the issue. It is based on help('predict.gam') example to estimate the variance of sum of predictions using lpmatrix. Here, we want the variance of a difference but it is straightforward.
# Evaluate "other parameters"
meanGre <- mean(data_glm$gre)
meanGpa <- mean(data_glm$gpa)
# Any rank is valid, because it is not smoothed (additive effect)
AnyRank <- data_glm$rank[1]
# Build a test data with values
testdatagre <- data.frame(gre = c(0, 380),
gpa = meanGpa, rank = AnyRank)
testdatagpa <- data.frame(gre = meanGre,
gpa = c(0, 4), rank = AnyRank)
# This function provides Or and ci for odds ratio.
# Note: it could easily be vectorised for your 'slice' option
# (just increase the size of `testdata`. The results will be relatively to the first line)
or_gam2 <- function(model, testdata){
n <- nrow(testdata)
Xp <- predict(model, testdata, type="lpmatrix")
a <- cbind(-1, diag(n-1))
Xs <- a %*% Xp
# log Odds-Ratios
LnOR <- Xs %*% coef(model)
OR <- exp(LnOR) # gives OR
# variance of the prediction
var.LnOR <- Xs %*% model$Vp %*% t(Xs)
sd.LnOR <- sqrt(diag(var.LnOR))
return(c(OR=OR,
lower= exp(LnOR-1.96*sd.LnOR),
upper= exp(LnOR+1.96*sd.LnOR)))
}
or_gam2(fit_gamAsGLM, testdatagre)
# OR lower upper
# 2.364300 1.046731 5.340350
or_gam2(fit_gamAsGLM, testdatagpa)
# OR lower upper
#24.931952 1.849078 336.168826
We get much closer CIs. Note that they are not strictly the same because the se estimations in the gam function
are completely different than in the glm function. Here is an example with real gam
fit_gam <- gam(admit ~ s(gre) + s(gpa) + rank,
data = data_glm,
family = "binomial"
) # fit model
or_gam(data = data_glm, model = fit_gam, pred = "gpa", values=c(3.4,3.5))
# predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1 gpa 3.4 3.5 1.219184 1.211133 1.227289
# The CIs are extremely tights.
testdatagpa <- data.frame(gre = meanGre,
gpa = c(3.4,3.5), rank = AnyRank)
or_gam2(fit_gam, testdatagpa)
# OR lower upper
#1.219184 1.026426 1.448141
This function could simply be adapted to your framework. Hope it helps. Let me know what you think.
The text was updated successfully, but these errors were encountered:
Thanks for the detailed description and sorry for my late answer (didn't watch the repo unfortunately).
I think I've never aimed to fit a GLM without smoothing - to me, smoothing is the essential part of the GAM.
That being sad, I also cannot fully judge if the CI values of the GAM are possibly not sound when not using smoothers. Does this only apply to this case of also when using smoothers?
I'm open to approaches that outline the issue and propose an alternative solution (like yours).
Is this still of interest to you?
Hi,
I had to estimate some odds-ratios from gam models and found your package. After some tries, I found that the confidence intervals were unexpectedly small.
I checked the code of your
or_gam
function and found that the standard errors (se) of the odds-ratios (in the "link" scale) were estimated as the difference between the se of the estimates (sincepred_gam2_ci_low <- pred_gam1[1] - (2 * pred_gam1[2]) - pred_gam2[1] - (2 * pred_gam2[2])
which is equivalent topred_gam2_ci_low <- (pred_gam1[1] - pred_gam2[1]) - (2 * (pred_gam1[2] - pred_gam2[2]))
) and I think it is not statistically sound. The variance of the difference in the estimates is actually equal to the sum of the variances minus twice their covariances.I can illustrates the issue as following.
Here is a proposal to solve the issue. It is based on
help('predict.gam')
example to estimate the variance of sum of predictions usinglpmatrix
. Here, we want the variance of a difference but it is straightforward.We get much closer CIs. Note that they are not strictly the same because the se estimations in the gam function
are completely different than in the glm function. Here is an example with real gam
This function could simply be adapted to your framework. Hope it helps. Let me know what you think.
The text was updated successfully, but these errors were encountered: