-
Notifications
You must be signed in to change notification settings - Fork 23
/
lead-ols.Rmd
180 lines (160 loc) · 5.88 KB
/
lead-ols.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
title: "Linear Model Analysis of Rosner `lead` Dataset"
author: "FE Harrell"
date: "`r Sys.Date()`"
output:
html_document:
highlight: pygments
toc: yes
major: regression modeling
minor: ordinary least squares
---
# Data Description
```{r contents,results='asis'}
# results='asis' in the chunk header because html(contents) output html directly
require(rms)
knitrSet(lang='markdown')
options(prType='html') # html output for print, summary, anova
getHdata(lead)
html(contents(lead), maxlevels=10, levelType='table')
```
```{r describe}
html(describe(lead))
```
# ANOVA for Three Exposure Groups
```{r anova,results='asis'}
dd <- datadist(lead); options(datadist='dd')
f <- ols(maxfwt ~ group, data=lead)
f
anova(f)
```
# ANCOVA for Three Groups Adjusted for Age
```{r ancova,results='asis'}
fa <- ols(maxfwt ~ age + group, data=lead)
fa
anova(fa)
# Also get group effect by subtracting SSRs or SSEs using anova(f)
# Continuous Lead Exposure Analyses
```{r contexp,results='asis'}
f <- ols(maxfwt ~ age + ld72 + ld73, data=lead)
f # same as print(f)
coef(f) # retrieve coefficients
g <- Function(f) # create an R function that represents the fitted model
# Note that the default values for g's arguments are medians
g
# Estimate mean maxfwt at age 10, .1 quantiles of ld72, ld73 and .9 quantile of ld73
# keeping ld72 at .1 quantile
g(age=10, ld72=21, ld73=c(21, 47)) # more exposure in 1973 decreased y by 6
# Get the same estimates another way but also get std. errors
predict(f, data.frame(age=10, ld72=21, ld73=c(21, 47)), se.fit=TRUE)
```
# Diagnostics Based on Residuals
```{r resid}
r <- resid(f)
par(mfrow=c(2,2)) # 2x2 matrix of plots
plot(fitted(f), r); abline(h=0) # yhat vs. r
with(lead, plot(age, r)); abline(h=0)
with(lead, plot(ld73, r)); abline(h=0)
qqnorm(r) # linearity indicates normality
```
# Partial Effects of Predictors
```{r peffects}
ggplot(Predict(f))
ggplot(Predict(f, age)) # plot only age effect, using default range,
# 10th smallest to 10th largest age
ggplot(Predict(f, age=3:15)) # plot age=3,4,...,15
ggplot(Predict(f, age=seq(3,16,length=150))) # plot age=3-16, 150 points
# Get confidence limits for individual response
ggplot(Predict(f, age, conf.type='individual'))
# Show both types of 0.99 confidence limits on one plot
p1 <- Predict(f, age, conf.int=0.99, conf.type='individual')
p2 <- Predict(f, age, conf.int=0.99, conf.type='mean')
p <- rbind(Individual=p1, Mean=p2)
ggplot(p)
# Non-plotted variables are set to reference values (median and
# mode by default). To control the settings of non-plotted values use e.g.
ggplot(Predict(f, ld73, age=3))
# To make separate lines for two ages:
ggplot(Predict(f, ld73, age=c(3,9))) # add ,conf.int=FALSE to suppress conf. bands
# To plot a 3-d surface for two continuous predictors against
# yhat; color coding for predicted mean maxfwt
bplot(Predict(f, ld72, ld73))
```
# Point Estimates for Partial Effects
The `summary` function can compute point estimates and confidence
intervals for effects of individual predictors, holding other
predictors to selected constants. The constants you hold other
predictors to will only matter when the other predictors interact with
the predictor whose effects are being displayed.
How predictors are changed depend on the type of predictor:
- Categorical predictors: differences against the reference (most
frequent) cell by default
- Continuous predictors: inter-quartile-range effects by default
The estimated effects depend on the type of model:
- `ols`: differences in means
- logistic models: odds ratios and their logs
- Cox models: hazard ratios and their logs
- quantile regression: differences in quantiles
```{r summary,results='asis'}
summary(f) # inter-quartile-range effects
summary(f, age=5) # adjust age to 5 when examining ld72,ld73
# (no effect since no interactions in model)
summary(f, ld73=c(20, 40)) # effect of changing ld73 from 20 to 40
```
There is a `plot` method for `summary` results. By default it
shows 0.9, 0.95, and 0.99 confidence limits.
```{r plotsummary,results='asis',top=2}
# top=2 in chunk header: leave extra room for labels
plot(summary(f))
```
# Getting Predicted Values
## Using `predict`
```{r predict}
predict(f, data.frame(age=3, ld72=21, ld73=21))
# must specify all variables in the model
predict(f, data.frame(age=c(3, 10), ld72=21, ld73=c(21, 47)))
# predictions for (3,21,21) and (10,21,47)
newdat <- expand.grid(age=c(4, 8), ld72=c(21, 47), ld73=c(21, 47))
newdat
predict(f, newdat) # 8 predictions
predict(f, newdat, conf.int=0.95) # also get CLs for mean
predict(f, newdat, conf.int=0.95, conf.type='individual') # CLs for indiv.
```
## The brute-force way
```{r predmanual}
# Model is b1 + b2*age + b3*ld72 + b4*ld73
b <- coef(f)
# For 3 year old with both lead exposures 21
b[1] + b[2]*3 + b[3]*21 + b[4]*21
```
## Using `Function` function
```{r Function}
g <- Function(f)
g(age=c(3, 8), ld72=21, ld73=21) # 2 predictions
g(age=3) # 3 year old at median ld72, ld73
```
# General ANOVA
- Use `anova(fitobject)` to get all total effects and
individual partial effects
- Use `anova(f,age,sex)` to get combined partial effects of
`age` and `sex`, for example
- Store result of `anova` in an object in you want to print it
various ways, or to plot it:
```{r printanova}
options(prType='plain')# so names,subscripts,dots work
an <- anova(f)
an # same as print(an)
print(an, 'names') # print names of variables being tested
print(an, 'subscripts')# print subscripts in coef(f) (ignoring
# the intercept) being tested
print(an, 'dots') # a dot in each position being tested
anova(f, ld72, ld73) # combine effects into a 2 d.f. test
```
# Computing Environment
```{r rsession,echo=FALSE}
si <- sessionInfo(); si$loadedOnly <- NULL
print(si, locale=FALSE)
```
```{r cite,results='asis',echo=FALSE}
print(citation(), style='text')
```