diff --git a/docs/LICENSE.html b/docs/LICENSE.html new file mode 100644 index 0000000..800c774 --- /dev/null +++ b/docs/LICENSE.html @@ -0,0 +1,131 @@ + + + +
+ + + + +YEAR: 2016 +COPYRIGHT HOLDER: Patrick Schratz ++ +
Data source: ?mgcv::predict.gam
library(oddsratio)
+
+fit_gam <- mgcv::gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + offset(x3) + x4,
+ data = data_gam)
To calculate specific increment steps of fit_gam
, we take predictor x2
(randomly chosen) and specify for which values we want to calculate the odds ratio.
+We can see that the odds of response y
happening are 22 times higher when predictor x2
increases from 0.099 to 0.198 while holding all other predictors constant.
or_gam(data = data_gam, model = fit_gam, pred = "x2",
+ values = c(0.099, 0.198))
+#> predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
+#> 1 x2 0.099 0.198 23.32353 23.30424 23.34283
Usually, this calculation is done by setting all predictors to their mean value, predict the response, change the desired predictor to a new value and predict the response again. These actions results in two log odds values, respectively, which are transformed into odds by exponentiating them. Finally, the odds ratio can be calculated from these two odds values.
+If the predictor is an indicator variable, i.e. consists of fixed levels, you can use the function in the same way by just putting in the respective levels you are interested in:
+or_gam(data = data_gam, model = fit_gam,
+ pred = "x4", values = c("A", "B"))
+#> predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
+#> 1 x4 A B 1.377537 1.334837 1.421604
Here, the change in odds of y
happening if predictor x4
is changing from level A
to B
is rather small. In detail, an increase in odds of 37.8% is reported.
To get an impression of odds ratio behaviour throughout the complete range of the smoothing function of the fitted GAM model, you can calculate odds ratios based on percentage breaks of the predictors distribution.
+Here we slice predictor x2
into 5 parts by taking the predictor values of every 20% increment step.
or_gam(data = data_gam, model = fit_gam, pred = "x2",
+ percentage = 20, slice = TRUE)
+#> predictor value1 value2 perc1 perc2 oddsratio CI_low (2.5%)
+#> 1 x2 0.001 0.200 0 20 2510.77 1091.68
+#> 2 x2 0.200 0.400 20 40 0.03 0.03
+#> 3 x2 0.400 0.599 40 60 0.58 0.56
+#> 4 x2 0.599 0.799 60 80 0.06 0.06
+#> 5 x2 0.799 0.998 80 100 0.41 0.75
+#> CI_high (97.5%)
+#> 1 5774.53
+#> 2 0.03
+#> 3 0.60
+#> 4 0.06
+#> 5 0.22
We can see that there is a high odds ratio reported when increasing predictor x2
from 0.008 to 0.206 while all further predictor increases decrease the odds of response y
happening substantially.
Right now, the only (quick) possibility to plot the smoothing functions of a GAM(M) was to use the base plot()
function. The fiddly work to do the same using the ggplot2
plotting system is done by plot_gam()
:
plot_gam(fit_gam, pred = "x2", title = "Predictor 'x2'")
You can further customize the look using other colors or linetypes.
+So now, we have the odds ratios and we have a plot of the smoothing function. Why not combine both? We can do so using insert_or()
. Its main arguments are (i) a ggplot
plotting object containing the smooth function and a data frame returned from or_gam()
containing information about the predictor and the respective values we want to insert.
plot_object <- plot_gam(fit_gam, pred = "x2", title = "Predictor 'x2'")
+or_object <- or_gam(data = data_gam, model = fit_gam,
+ pred = "x2", values = c(0.099, 0.198))
+
+plot <- insert_or(plot_object, or_object, or_yloc = 3,
+ values_xloc = 0.05, arrow_length = 0.02,
+ arrow_col = "red")
+plot
The odds ratio information is always centered between the two vertical lines. Hence it only looks nice if the gap between the two chosen values (here 0.099 and 0.198) is large enough. If the smoothing line crosses your inserted text, you can just correct it adjusting or_yloc
. This param sets the y-location of the inserted odds ratio information.
Depending on the digits of your chosen values (here 3), you might also need to adjust the x-axis location of the two values so that they do not interfer with the vertical line.
+Let’s do all this by inserting another odds ratio into this plot! This time we simply take the already produced plot as an input to insert_or()
and use a new odds ratio object:
or_object2 <- or_gam(data = data_gam, model = fit_gam,
+ pred = "x2", values = c(0.4, 0.6))
+
+insert_or(plot, or_object2, or_yloc = 2.1, values_yloc = 2,
+ line_col = "green4", text_col = "black",
+ rect_col = "green4", rect_alpha = 0.2,
+ line_alpha = 1, line_type = "dashed",
+ arrow_xloc_r = 0.01, arrow_xloc_l = -0.01,
+ arrow_length = 0.02, rect = TRUE)
Using rect = TRUE
, you can additionally highlight certain odds ratio intervals. Aesthetics like opacity or color are fully customizable.
Fit model.
+Data source: http://www.ats.ucla.edu/stat/r/dae/logit.htm
fit_glm <- glm(admit ~ gre + gpa + rank, data = data_glm, family = "binomial")
For GLMs, the odds ratio calculation is simpler because odds ratio changes correspond to fixed predictor increases throughout the complete value range of each predictor.
+Hence, function or_glm
takes the increment steps of each predictor directly as an input in its parameter incr
.
To avoid false predictor/value assignments, the combinations need to be given in a list.
+Odds ratios of indicator variables are computed automatically and do always refer to the base factor level.
+Indicator predictor rank
has four levels. Subsequently, we will get three odds ratio outputs referring to the base factor level (here: rank1).
The output is interpreted as follows: “Having rank2
instead of rank1
while holding all other values constant results in a decrease in odds of 49.1% (1-0.509)”.
or_glm(data = data_glm, model = fit_glm, incr = list(gre = 380, gpa = 5))
+#> predictor oddsratio CI_low (2.5 %) CI_high (97.5 %) increment
+#> 1 gre 2.364 1.054 5.396 380
+#> 2 gpa 55.712 2.229 1511.282 5
+#> 3 rank2 0.509 0.272 0.945 Indicator variable
+#> 4 rank3 0.262 0.132 0.512 Indicator variable
+#> 5 rank4 0.212 0.091 0.471 Indicator variable
You can also set other confident intervals for GLM(M) models. The resulting data frame will automatically adjust its column names to the specified level.
+or_glm(data = data_glm, model = fit_glm,
+ incr = list(gre = 380, gpa = 5), CI = 0.70)
+#> predictor oddsratio CI_low (15 %) CI_high (85 %) increment
+#> 1 gre 2.364 1.540 3.647 380
+#> 2 gpa 55.712 10.084 314.933 5
+#> 3 rank2 0.509 0.366 0.706 Indicator variable
+#> 4 rank3 0.262 0.183 0.374 Indicator variable
+#> 5 rank4 0.212 0.136 0.325 Indicator variable
Resource: | +CRAN | +Travis CI | +Appveyor | +
---|---|---|---|
Platforms: | +Multiple | +Linux & macOS | +Windows | +
R CMD check | ++ | + | + |
Test coverage | ++ | + | + |
Functions for calculation and plotting of odds ratios of Generalized Additive (Mixed) Models and Generalized Linear (Mixed) Models with a binomial response variable (i.e. logistic regression models).
+Install from CRAN:
+install.packages("oddsratio")
Get the development version from Github:
+remotes::install_github("pat-s/oddsratio@dev")
Odds ratio calculation of predictors gre
& gpa
of a fitted model fit_glm
with increment steps of 380 and 5, respectively.
+For factor variables (here: rank
with 4 levels), automatically all odds ratios corresponding to the base level (here: rank1
) are returned including their respective confident intervals. The default level is 95%. However, other levels can be specified with the param CI
. Data source: http://www.ats.ucla.edu/stat/r/dae/logit.htm
pacman::p_load(oddsratio, mgcv)
+df <- data_glm
+df$rank <- factor(df$rank)
+fit_glm <- glm(admit ~ gre + gpa + rank, data = df, family = "binomial")
+
+or_glm(data = df, model = fit_glm,
+ incr = list(gre = 380, gpa = 5, CI = 0.95))
For GAMs, the calculation of odds ratio is different. Due to its non-linear definition, odds ratios do only apply to specific value changes and are not constant throughout the whole value range of the predictor as for GLMs. Hence, odds ratios of GAMs can only be computed for one predictor at a time by holding all other predictors at a fixed value while changing the value of the specific predictor. Confident intervals are currently fixed to the 95% level for GAMs. Data source: ?mgcv::predict.gam()
Here, the usage of or_gam()
is shown by calculating odds ratios of pred x2
for a 20% steps across the whole value range of the predictor.
set.seed(1234)
+n <- 200
+sig <- 2
+df <- gamSim(1, n = n,scale = sig, verbose = FALSE)
+df$x4 <- as.factor(c(rep("A", 50), rep("B", 50), rep("C", 50), rep("D", 50)))
+fit_gam <- mgcv::gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + offset(x3) + x4, data = df)
+
+or_gam(data = df, model = fit_gam, pred = "x2",
+ percentage = 20, slice = TRUE)
If you want to compute a single odds ratio for specific values, simply set param slice = FALSE
:
or_gam(data = df, model = fit_gam,
+ pred = "x2", values = c(0.099, 0.198))
Plotting of GAM smooths is also supported:
+plot_gam(fit_gam, pred = "x2", title = "Predictor 'x2'")
+
+Insert the calculated odds ratios into the smoothing function:
+plot_object <- plot_gam(fit_gam, pred = "x2", title = "Predictor 'x2'")
+or_object <- or_gam(data = df, model = fit_gam,
+ pred = "x2", values = c(0.099, 0.198))
+
+plot <- insert_or(plot_object, or_object, or_yloc = 3,
+ values_xloc = 0.04, line_size = 0.5,
+ line_type = "dotdash", values_yloc = 0.5,
+ arrow_col = "red")
+plot
+
+Insert multiple odds ratios into one smooth:
+or_object2 <- or_gam(data = df, model = fit_gam, pred = "x2",
+ values = c(0.4, 0.6))
+
+insert_or(plot, or_object2, or_yloc = 2.1, values_yloc = 2,
+ line_col = "green4", text_col = "black",
+ rect_col = "green4", rect_alpha = 0.2,
+ line_alpha = 1, line_type = "dashed",
+ arrow_xloc_r = 0.01, arrow_xloc_l = -0.01,
+ arrow_length = 0.01, rect = T)
+
+data_glm
+ + +data(data_glm)+ +
a data.frame
randomly created numerical and non-numerical variables
Taken from http://www.ats.ucla.edu/stat/r/dae/logit.htm, direct download +link: http://www.ats.ucla.edu/stat/data/binary.csv
+ + +This function converts a fitted GAM model into a tidy data frame
+ + +gam_to_df(model = NULL, pred = NULL)+ +
model | +A fitted GAM(M). |
+
---|---|
pred | +Character. Predictor name for which to calculate +the odds ratio. |
+
To be able to plot the smoothing function of a GAM using ggplot2, +some preprocessing is needed coming from the raw fitted GAM model output.
+Used in plot_gam.
+ ++# load data (Source: ?mgcv::gam) +library(mgcv)#>#>
This function inserts calculated odds ratios of GAM(M)s into +a plot of a GAM(M) smoothing function.
+ + +insert_or(plot_object = NULL, or_object = NULL, line_col = "red", + line_size = 1.2, line_type = "solid", line_alpha = 1, text_alpha = 1, + text_size = 4, text_col = "black", rect_alpha = 0.5, rect_col = NULL, + rect = FALSE, arrow = TRUE, values = TRUE, values_yloc = 0, + values_xloc = NULL, or_yloc = 0, arrow_length = NULL, + arrow_yloc = NULL, arrow_col = NULL, arrow_xloc_r = NULL, + arrow_xloc_l = NULL)+ +
plot_object | +A |
+
---|---|
or_object | +A returned data.frame from or_gam. |
+
line_col, line_alpha, line_type, line_size | +Aesthetics of vertical lines. |
+
text_col, text_alpha, text_size | +Aesthetics of inserted values. |
+
rect_col, rect_alpha | +Aesthetics of shaded rectangle. |
+
rect | +Logical. Whether to print a shaded rectangle between the +vertical lines. |
+
arrow | +Logical. Whether to print arrows above the inserted values.
+Default to |
+
values | +Logical. Whether to print predictor value information nearby
+the inserted vertical lines. Default to |
+
values_xloc | +Numeric. X-axis location/shift of values relative to +their vertical line. +Default to 2% of x-axis range. |
+
or_yloc, values_yloc | +Numeric. Specifies y-location of inserted +odds ratio / values. +Relative to plotted y-axis range. A positive/negative value will place the +the text higher/lower. |
+
arrow_xloc_r, arrow_xloc_l, arrow_yloc, arrow_length, arrow_col | +Numeric. +Axis placement options of inserted arrows. +Relative to respective axis ranges. |
+
Returns a ggplot
plotting object
The idea behind this function is to add calculated odds ratio of +fitted GAM models (or_gam) into a plot +showing the smooth function (plot_gam) of the chosen +predictor for which the odds ratio was calculated for. Multiple insertions +can be made by iteratively calling the function (see examples).
+Right now the function does only accept results of
+or_gam with slice = FALSE
.
+If you want to insert multiple odds ratio you have to do it iteratively.
+# load data (Source: ?mgcv::gam) and fit model +library(mgcv) +fit_gam <- gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + + offset(x3) + x4, data = data_gam) # fit model + +# create input objects (plot + odds ratios) +library(oddsratio) +plot_object <- plot_gam(fit_gam, pred = "x2", title = "Predictor 'x2'") +or_object1 <- or_gam(data = data_gam, model = fit_gam, + pred = "x2", values = c(0.099, 0.198)) + +# insert first odds ratios to plot +plot_object <- insert_or(plot_object, or_object1, or_yloc = 3, + values_xloc = 0.04, line_size = 0.5, + line_type = "dotdash", text_size = 6, + values_yloc = 0.5, arrow_col = "red") + +# calculate second odds ratio +or_object2 <- or_gam(data = data_gam, model = fit_gam, pred = "x2", + values = c(0.4, 0.6)) + +# add or_object2 into plot +insert_or(plot_object, or_object2, or_yloc = 2.1, values_yloc = 2, + line_col = "green4", text_col = "black", + rect_col = "green4", rect_alpha = 0.2, + line_alpha = 1, line_type = "dashed", + arrow_xloc_r = 0.01, arrow_xloc_l = -0.01, + arrow_length = 0.01, rect = TRUE)
This function suppresses plotting output of plot function
+ + +no_plot(model = NULL)+ +
model | +A fitted GAM(M). |
+
---|
To prevent unwanted plot printing of plot in a function call +in which the only desire is to work with the returned information of +plot. Used in plot_gam.
+ ++# load data (Source: ?mgcv::gam) +library(mgcv) +n <- 200 +sig <- 2 +dat <- gamSim(1, n = n, scale = sig, verbose = FALSE) +dat$x4 <- as.factor(c(rep("A", 50), rep("B", 50), rep("C", 50), + rep("D", 50))) +fit_gam <- gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + + offset(x3) + x4, data = dat) # fit model + +tmp <- plot(fit_gam, pages = 1) # plot outputtmp <- no_plot(fit_gam) # no plot output
This function calculates odds ratio(s) for specific increment +steps of a GAM(M)s.
+Odds ratios can also be calculated for continuous percentage
+increment steps across the whole predictor distribution using slice = TRUE
.
or_gam(data = NULL, model = NULL, pred = NULL, values = NULL, + percentage = NULL, slice = FALSE, CI = NULL)+ +
data | +The data used for model fitting. |
+
---|---|
model | +A fitted GAM(M). |
+
pred | +Character. Predictor name for which to calculate +the odds ratio. |
+
values | +Numeric vector of length two.
+Predictor values to estimate odds ratio from. Function is written to use the
+first provided value as the "lower" one, i.e. calculating the odds ratio
+'from value1 to value2'. Only used if |
+
percentage | +Numeric. Percentage number to split the
+predictor distribution into.
+A value of 10 would split the predictor distribution by 10% intervals.
+Only needed if |
+
slice | +Logical. |
+
CI | +Numeric. Currently fixed to 95% confidence interval level +(2.5% - 97.5%). +It should not be changed in a function call! |
+
A data frame with (up to) eight columns. perc1
and perc2
+are only returned if slice = TRUE
:
Predictor name
First value of odds ratio calculation
Second value of odds ratio calculation
Percentage value of value1
Percentage value of value2
Calculated odds ratio(s)
Lower (2.5%)
confident interval of odds ratio
Higher (97.5%)
confident interval of odds ratio
Currently supported functions: mgcv::gam, +mgcv::gamm, gam::gam.
+For mgcv::gamm, the model
input of
+or_gam needs to be the gam
output (e.g. fit_gam$gam
).
+# load data (Source: ?mgcv::gam) and fit model +library(mgcv) +fit_gam <- gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + + offset(x3) + x4, data = data_gam) # fit model + +# Calculate OR for specific increment step of continuous variable +or_gam(data = data_gam, model = fit_gam, pred = "x2", + values = c(0.099, 0.198))#> predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%) +#> 1 x2 0.099 0.198 23.32353 23.30424 23.34283+## Calculate OR for change of indicator variable +or_gam(data = data_gam, model = fit_gam, pred = "x4", + values = c("B", "D"))#> predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%) +#> 1 x4 B D 0.4744264 0.4976375 0.452298+## Calculate ORs for percentage increments of predictor distribution +## (here: 20%) +or_gam(data = data_gam, model = fit_gam, pred = "x2", + percentage = 20, slice = TRUE)#> predictor value1 value2 perc1 perc2 oddsratio CI_low (2.5%) CI_high (97.5%) +#> 1 x2 0.001 0.200 0 20 2510.77 1091.68 5774.53 +#> 2 x2 0.200 0.400 20 40 0.03 0.03 0.03 +#> 3 x2 0.400 0.599 40 60 0.58 0.56 0.60 +#> 4 x2 0.599 0.799 60 80 0.06 0.06 0.06 +#> 5 x2 0.799 0.998 80 100 0.41 0.75 0.22+
This function calculates odds ratio(s) for specific +increment steps of GLMs.
+ + +or_glm(data, model, incr, CI = 0.95)+ +
data | +The data used for model fitting. |
+
---|---|
model | +A fitted GLM(M). |
+
incr | +List. Increment values of each predictor. |
+
CI | +numeric. Which confident interval to calculate. +Must be between 0 and 1. Default to 0.95 |
+
A data frame with five columns:
+Predictor name(s)
Calculated odds ratio(s)
Lower confident interval of odds ratio
Higher confident interval of odds ratio
Increment of the predictor(s)
CI_low
and CI_high
are only calculated for GLM models because
+glmmPQL does not return confident intervals due to its penalizing
+behaviour.
Currently supported functions: glm, +glmmPQL
+ ++## Example with glm() +# load data (source: http://www.ats.ucla.edu/stat/r/dae/logit.htm) and +# fit model +fit_glm <- glm(admit ~ gre + gpa + rank, data = data_glm, + family = "binomial") # fit model + +# Calculate OR for specific increment step of continuous variable +or_glm(data = data_glm, model = fit_glm, incr = list(gre = 380, gpa = 5))#> predictor oddsratio CI_low (2.5 %) CI_high (97.5 %) increment +#> 1 gre 2.364 1.054 5.396 380 +#> 2 gpa 55.712 2.229 1511.282 5 +#> 3 rank2 0.509 0.272 0.945 Indicator variable +#> 4 rank3 0.262 0.132 0.512 Indicator variable +#> 5 rank4 0.212 0.091 0.471 Indicator variable+# Calculate OR and change the confidence interval level +or_glm(data = data_glm, model = fit_glm, + incr = list(gre = 380, gpa = 5), CI = .70)#> predictor oddsratio CI_low (15 %) CI_high (85 %) increment +#> 1 gre 2.364 1.540 3.647 380 +#> 2 gpa 55.712 10.084 314.933 5 +#> 3 rank2 0.509 0.366 0.706 Indicator variable +#> 4 rank3 0.262 0.183 0.374 Indicator variable +#> 5 rank4 0.212 0.136 0.325 Indicator variable+## Example with MASS:glmmPQL() +# load data +library(MASS) +data(bacteria) +fit_glmmPQL <- glmmPQL(y ~ trt + week, random = ~1 | ID, + family = binomial, data = bacteria, + verbose = FALSE) + +# Apply function +or_glm(data = bacteria, model = fit_glmmPQL, incr = list(week = 5))#> Warning: No confident interval calculation possible +#> for 'glmmPQL' models +#>#> predictor oddsratio CI_low CI_high increment +#> 1 trtdrug 0.296 NA NA Indicator variable +#> 2 trtdrug+ 0.454 NA NA Indicator variable +#> 3 week 0.485 NA NA 5+
This function plots the smoothing function of selected GAM(M) models
+using the ggplot2
plotting system.
plot_gam(model = NULL, pred = NULL, col_line = "blue", + ci_line_col = "black", ci_line_type = "dashed", ci_fill = "grey", + ci_alpha = 0.4, ci_line_size = 0.8, sm_fun_size = 1.1, title = NULL, + xlab = NULL, ylab = NULL, limits_y = NULL, breaks_y = NULL)+ +
model | +A fitted model of class |
+
---|---|
pred | +The predictor of the fitted model to plot the smooth function of. |
+
col_line | +Character. Sets color for smoothing function. Default to
+ |
+
ci_line_col | +Character. Sets color for confident interval line of
+smoothing function. Default to |
+
ci_line_type | +Character. Sets linetype of confident interval line
+of smoothing function. Default to |
+
ci_fill | +Character. Fill color of area between smoothing function and +its confident interval lines. |
+
ci_alpha | +Numeric (range: 0-1). Opacity value of confidence interval shading. |
+
ci_line_size, sm_fun_size | +Line sizes. |
+
title | +Character. Plot title. |
+
xlab | +Character. X-axis title. |
+
ylab | +Character. Y-axis title. |
+
limits_y | +Numeric of length two. Sets y-axis limits. |
+
breaks_y | +Numeric of length three. Sets y-axis breaks.
+See seq.
+Values need to be given in a |
+
+# load data (Source: ?mgcv::gam) and fit model +library(mgcv) +fit_gam <- mgcv::gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + offset(x3) + x4, + data = data_gam) + +library(oddsratio) +plot_gam(fit_gam, pred = "x2", title = "Predictor 'x2'")+
Found ' + result.length + ' result(s)
'); + + for (var item in result) { + ref = result[item].ref; + searchitem = '