-
Notifications
You must be signed in to change notification settings - Fork 1
/
06-dataviz.Rmd
397 lines (299 loc) · 16.4 KB
/
06-dataviz.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
# Data Visualization in Base R
As was indicated by the title of this section, none of the functions in this section of the document require any external packages in order to be run.
We will begin this section by creating the data set that we will be working with. This data set will consist of a sample of 100 undergraduate students' math and reading test scores. The test scores are on a scale of 0 to 100. Each individual has also been assigned to either the Paper Test format or the Electronic Test format in the TestFormat condition and either the Classroom setting or Home setting in the TestLocation condition. Using this data we will explore scatter plots, bar graphs, histograms, and box plots. The following lines of code create the data set and set up the data frames we will need:
```{r echo = T}
set.seed(100)
MathGrade<-rnorm(n = 100, mean = 70, sd = 10)
set.seed(1000)
ReadingGrade<-rnorm(n = 100, mean = 65, sd = 13)
TestLocation<-c(rep("Classroom",50),rep("Home",50))
TestFormat<-c(rep("Paper",25),rep("Electronic",25),rep("Paper",25),rep("Electronic",25))
Data<-data.frame(MathGrade, ReadingGrade, TestLocation, TestFormat)
#Marginal Data Conditions
PaperTest<-Data[which(Data$TestFormat=="Paper"),]
ElectronicTest<-Data[which(Data$TestFormat=="Electronic"),]
Classroom<-Data[which(Data$TestLocation=="Classroom"),]
Home<-Data[which(Data$TestLocation=="Home"),]
#Cell Conditions
PaperTestHome<-Data[which(Data$TestFormat=="Paper" & Data$TestLocation=="Home"),]
PaperTestClassroom<-Data[which(Data$TestFormat=="Paper" & Data$TestLocation=="Classroom"),]
ElectronicTestHome<-Data[which(Data$TestFormat=="Electronic" & Data$TestLocation=="Home"),]
ElectronicTestClassroom<-Data[which(Data$TestFormat=="Electronic" & Data$TestLocation=="Classroom"),]
```
## Scatter Plot
The first data visualization we will be working with is the scatter plot.
The following line of code creates a basic scatter plot using the ```plot()``` function.
```{r echo = T}
plot(Data$MathGrade,Data$ReadingGrade)
```
This is a very basic plot, and it is lacking a lot of the important details that most visualizations include. To make this a little nicer, lets begin with adding a title.
```{r echo = T}
plot(Data$MathGrade,Data$ReadingGrade,main = "Title")
```
We can also add a subtitle to the graph as well.
```{r echo = T}
plot(Data$MathGrade,Data$ReadingGrade,main = "Title",sub = "Subtitle")
```
Another adjustment we may want to make to the graph is changing the axis labels.
```{r echo = T}
plot(Data$MathGrade,Data$ReadingGrade,main = "Title",sub = "Subtitle",
xlab = "x-axis", ylab = "y-axis")
```
These can also be left blank by just putting ```xlab=""``` and ```lab=""```.
It may also be a good idea to change the axis ranges as well.
```{r echo = T}
plot(Data$MathGrade,Data$ReadingGrade,main = "Title",sub = "Subtitle",
xlab = "x-axis", ylab = "y-axis",xlim = c(0,100),ylim = c(0,100))
```
Depending on how you intend on using the graph, you may also decide to remove the border around it.
```{r echo = T}
plot(Data$MathGrade,Data$ReadingGrade,main = "Title",sub = "Subtitle",
xlab = "x-axis", ylab = "y-axis",xlim = c(0,100),ylim = c(0,100),
frame.plot = F)
```
And maybe the axis scales as well.
```{r echo = T}
plot(Data$MathGrade,Data$ReadingGrade,main = "Title",sub = "Subtitle",
xlab = "x-axis", ylab = "y-axis",xlim = c(0,100),ylim = c(0,100),
axes = F)
```
Now, using everything we have gone over so far in context.
```{r echo = T}
plot(Data$MathGrade,Data$ReadingGrade,main = "Math Grade VS Reading Grade",
sub = "All conditions",xlab = "Math Grade", ylab = "Reading Grade",
xlim = c(0,100),ylim = c(0,100))
```
We can also plot our data based on the different groups we created earlier. The following four plots show the math vs reading scartter plots for each of our four marginal groups: Paper Test, Electronc Test, Classroom location, and Home location.
```{r echo = T}
plot(PaperTest$MathGrade,PaperTest$ReadingGrade,main = "Math Grade VS Reading Grade",
sub = "Paper Test",xlab = "Math Grade", ylab = "Reading Grade",
xlim = c(0,100),ylim = c(0,100))
```
```{r echo = T}
plot(ElectronicTest$MathGrade,ElectronicTest$ReadingGrade,
main = "Math Grade VS Reading Grade",sub = "Electronic Test",xlab = "Math Grade",
ylab = "Reading Grade",xlim = c(0,100),ylim = c(0,100))
```
```{r echo = T}
plot(Classroom$MathGrade,Classroom$ReadingGrade,main = "Math Grade VS Reading Grade",
sub = "Classroom",xlab = "Math Grade", ylab = "Reading Grade",
xlim = c(0,100),ylim = c(0,100))
```
```{r echo = T}
plot(Home$MathGrade,Home$ReadingGrade,main = "Math Grade VS Reading Grade",sub = "Home",
xlab = "Math Grade", ylab = "Reading Grade",xlim = c(0,100),ylim = c(0,100))
```
We can plot the opposing graphs (Paper and Electronc; Classroom and Home) side by side by setting the number of plots on the screen using the following code:
```{r echo = T}
par(mfrow=c(1,2))
```
The first number sets the number of rows of graphs to be displayed and the second sets the number of columns.
Once you have set the number of graphs you want to appear at a time, you can create the graphs.
```{r echo = T}
par(mfrow=c(1,2))
plot(PaperTest$MathGrade,PaperTest$ReadingGrade,main = "Math Grade VS Reading Grade",
sub = "Paper Test",xlab = "Math Grade", ylab = "Reading Grade",
xlim = c(0,100),ylim = c(0,100))
plot(ElectronicTest$MathGrade,ElectronicTest$ReadingGrade,
main = "Math Grade VS Reading Grade",sub = "Electronic Test",xlab = "Math Grade",
ylab = "Reading Grade",xlim = c(0,100),ylim = c(0,100))
```
```{r echo = T}
par(mfrow=c(1,2))
plot(Classroom$MathGrade,Classroom$ReadingGrade,main = "Math Grade VS Reading Grade",
sub = "In Classroom",xlab = "Math Grade", ylab = "Reading Grade",
xlim = c(0,100),ylim = c(0,100))
plot(Home$MathGrade,Home$ReadingGrade,main = "Math Grade VS Reading Grade",
sub = "At Home",xlab = "Math Grade", ylab = "Reading Grade",
xlim = c(0,100),ylim = c(0,100))
```
Running either of the next two lines resets the plots back to one per screen.
```{r echo = T}
par(mfrow=c(1,1))
dev.off()
```
We can also plot different conditions on the same graph using the points function ```col=``` can be used to change the points colors and ```pch=``` can be used to change their shapes.
```{r echo = T}
plot(PaperTest$MathGrade,PaperTest$ReadingGrade,main = "Math Grade VS Reading Grade",
sub = "Test Types",xlab = "Math Grade", ylab = "Reading Grade",
xlim = c(0,100),ylim = c(0,100))
points(ElectronicTest$MathGrade,ElectronicTest$ReadingGrade,col="blue",pch=2)
plot(Classroom$MathGrade,Classroom$ReadingGrade,main = "Math Grade VS Reading Grade",
sub = "Test Locations",xlab = "Math Grade", ylab = "Reading Grade",
xlim = c(0,100),ylim = c(0,100))
points(Home$MathGrade,Home$ReadingGrade,col="blue",pch=2)
```
Now that we are including multiple conditions on one plot, we might want to add a legend so we can ideantify which groups the colors and/or shapes indicate. This can be done using the legend function. The legend's position can be changed by writing different locations such as: topleft, topright, bottomleft, and bottomright.
```{r echo = T}
plot(PaperTest$MathGrade,PaperTest$ReadingGrade,main = "Math Grade VS Reading Grade",
sub = "Test Types",xlab = "Math Grade", ylab = "Reading Grade",
xlim = c(0,100),ylim = c(0,100))
points(ElectronicTest$MathGrade,ElectronicTest$ReadingGrade,col="blue",pch=2)
legend("topleft",legend=c("Paper Test","Electronic Test"),col=c("Black","Blue"),
pch=c(1,2))
plot(Classroom$MathGrade,Classroom$ReadingGrade,main = "Math Grade VS Reading Grade",
sub = "Test Locations",xlab = "Math Grade", ylab = "Reading Grade",
xlim = c(0,100),ylim = c(0,100))
points(Home$MathGrade,Home$ReadingGrade,col="blue",pch=2)
legend("bottomleft",legend=c("Classroom","Home"),col=c("Black","Blue"),pch=c(1,2))
```
Finally, we can plot our four different condition combinations as seperate graphs.
```{r echo = T}
par(mfrow=c(2,2))
plot(ElectronicTestClassroom$MathGrade,ElectronicTestClassroom$ReadingGrade,
main = "Math Grade VS Reading Grade",sub = "Electronic/Classroom",xlab = "Math Grade",
ylab = "Reading Grade",xlim = c(0,100),ylim = c(0,100))
plot(ElectronicTestHome$MathGrade,ElectronicTestHome$ReadingGrade,
main = "Math Grade VS Reading Grade",sub = "Electronic/Home",xlab = "Math Grade",
ylab = "Reading Grade",xlim = c(0,100),ylim = c(0,100))
plot(PaperTestClassroom$MathGrade,PaperTestClassroom$ReadingGrade,
main = "Math Grade VS Reading Grade",sub = "Paper/Classroom",xlab = "Math Grade",
ylab = "Reading Grade",xlim = c(0,100),ylim = c(0,100))
plot(PaperTestHome$MathGrade,PaperTestHome$ReadingGrade,
main = "Math Grade VS Reading Grade",sub = "Paper/Home",xlab = "Math Grade",
ylab = "Reading Grade",xlim = c(0,100),ylim = c(0,100))
par(mfrow=c(1,1))
```
Or all on one graph with different point colors and shapes.
```{r echo = T}
plot(ElectronicTestClassroom$MathGrade,ElectronicTestClassroom$ReadingGrade,
main = "Math Grade VS Reading Grade",sub = "Test Locations",xlab = "Math Grade",
ylab = "Reading Grade",xlim = c(0,100),ylim = c(0,100))
points(ElectronicTestHome$MathGrade,ElectronicTestHome$ReadingGrade,col="Blue",pch=2)
points(PaperTestClassroom$MathGrade,PaperTestClassroom$ReadingGrade,col="Orange",pch=3)
points(PaperTestHome$MathGrade,PaperTestHome$ReadingGrade,col="Red",pch=4)
legend("bottomleft",legend=c("Electronic/Classroom","Electronic/Home","Paper/Classroom",
"Paper/Home"),col=c("Black","Blue","Orange","Red"),pch=c(1,2,3,4))
```
We can also add lines and text to any plot, which we will explore in the Histogram section.
## Bar Graph
The next data visualization we will play with is the bar graph. However, before we can begin working with the barplot function and its arguments, we have to calculate the group means and create a new variable containing the group names to use when labeling our bars within the graphs. We will only be using the means for the math grades for these examples, but the same rules apply for the reading grade means as well.
```{r echo=T}
mathmeanslocation<-c(mean(Home$MathGrade),mean(Classroom$MathGrade))
mathmeanstype<-c(mean(ElectronicTest$MathGrade),mean(PaperTest$MathGrade))
mathmeanstypelocation<-c(mean(ElectronicTestClassroom$MathGrade),
mean(ElectronicTestHome$MathGrade),
mean(PaperTestClassroom$MathGrade),
mean(PaperTestHome$MathGrade))
typelocationnameslong<-c("Electronic Test/Classrom","Electronic Test/Home",
"Paper Test/Classrom","Paper Test/Home")
typelocationnames<-c("Electronic Test","Electronic Test","Paper Test","Paper Test")
```
The following line of code creates a basic bar graph using the ```barplot()``` function.
```{r echo = T}
barplot(mathmeanstypelocation,names.arg = typelocationnameslong)
```
The default bar color is just white. We can change this using the `col=` arguement.
```{r echo=T}
barplot(mathmeanstypelocation,names.arg = typelocationnameslong,
col = c("Blue","Red","Blue","Red"))
```
We should also add some labels to make it more clear what our graph is displaying.
```{r echo = T}
barplot(mathmeanstypelocation,names.arg = typelocationnameslong,
col = c("Blue","Red","Blue","Red"),main ="Math Grade Means",xlab ="Condition",
ylab ="Mean Grade")
```
Sometimes a legend might also help. We have added one to the next graphy to help shorten the condition names.
```{r echo = T}
barplot(mathmeanstypelocation,names.arg = typelocationnames,
col = c("Blue","Red","Blue","Red"),main ="Math Grade Means",xlab ="Condition",
ylab ="Mean Grade",legend=c("Classroom","Home"))
```
Unfortunately our new legend is covering some of our bar visuals. This can be fixed by changing the display window size or the y-axis limits.
```{r echo = T}
barplot(mathmeanstypelocation,names.arg = typelocationnames,ylim=c(0,100),
col = c("Blue","Red","Blue","Red"),main ="Math Grade Means",xlab ="Condition",
ylab ="Mean Grade",legend=c("Classroom","Home"))
```
In some situations, you may want to change the orientation of the bar labels. Using the `las=` aregument you can keep the horizontal with `las=1` or make them vertical with `las=2`.
```{r echo = T}
barplot(mathmeanstypelocation,names.arg = typelocationnames,ylim=c(0,100),las=2,
col = c("Blue","Red","Blue","Red"),main ="Math Grade Means",xlab ="Condition",
ylab ="Mean Grade",legend=c("Classroom","Home"))
```
Finally we can change the spacing of the bars to make them closer together. Each of the numbers indicates how much space should be between that bar and the one to its right. The value of the first number does not impact the spacing because there is no bar to the left of the first bar. The spacing also increases very quickly, so begin with values below 1 and increase them by 0.1 units at a time.
```{r echo = T}
barplot(mathmeanstypelocation,names.arg = typelocationnames,ylim=c(0,100),las=1,
col = c("Blue","Red","Blue","Red"),main ="Math Grade Means",xlab ="Condition",
ylab ="Mean Grade",legend=c("Classroom","Home"),space=c(0,0,.1,0))
```
## Histogram
Next we will work with the `hist()` function and its arguments to create histograms.
Here is a basic histogram of the math grades.
```{r echo =T}
hist(Data$MathGrade)
```
Just as we did in the bar graphs, we can also color in the bars and change the colors of the boarder around the bars.
```{r echo = T}
hist(Data$MathGrade,col = "Blue",border = "orange")
```
Similarly, like all other plots we can also add a title and labels.
```{r echo = T}
hist(Data$MathGrade,main = "Distribution of Math Grades",xlab = "Grades",
ylab = "Count")
```
And we can add limits to the two axes.
```{r echo =T}
hist(Data$MathGrade,main = "Distribution of Math Grades",xlab = "Grades",
ylab = "Count",xlim = c(0,100),ylim = c(0,25))
```
For any of the plots we create, including the bar and scatter plots, we have the option to add lines and text using the `abline()` and `text()` functions, respectively. For example here I can add lines on the x and y axis and some text.
```{r echo = T}
hist(Data$MathGrade,main = "Distribution of Math Grades",xlab = "Grades",
ylab = "Count",xlim = c(0,100),ylim = c(0,25))
abline(h = 22,col="Red")
abline(v = 30,col="Blue")
abline(a = 0,b = 1,col="Black")
text(x=28,y=15,labels = "Sometimes the lines and text are pointless")
```
We can also change the color and size of the text by adding the `col=` and `cex=` arguments.
```{r echo = T}
hist(Data$MathGrade,main = "Distribution of Math Grades",xlab = "Grades",
ylab = "Count",xlim = c(0,100),ylim = c(0,25))
abline(h = 22,col="Red")
abline(v = 30,col="Blue")
abline(a = 0,b = 1,col="Black")
text(x=28,y=15,labels = "Sometimes the lines and text are pointless",
col = "Blue",cex = 2)
```
Once lines and text have been added to a plot, they cannot be removed. Luckily, simply rerunning the function to create the original plot will create a new clean version.
## Box Plot
The final data visualization we will cover in this section is the box plot. We can make box plots for the math grades and reading grades using the `boxplot()` function.
```{r echo = T}
boxplot(Data$MathGrade)
boxplot(Data$ReadingGrade)
```
By including both data sets, we can plot both grade distributions at once.
```{r echo=T}
boxplot(Data$MathGrade,Data$ReadingGrade)
```
We can also give the plot a title using `main=`.
```{r echo = T}
boxplot(Data$MathGrade,Data$ReadingGrade,main="Box Plots")
```
Each of the groups can also be given a label.
```{r echo = T}
boxplot(Data$MathGrade,Data$ReadingGrade,main="Box Plots",
names = c("Math Grade","Reading Grade"))
```
You can also notch the plots around their medians. The notches provide a rough guide for determining if there is a significance of difference of medians.
```{r echo = T}
boxplot(Data$MathGrade,Data$ReadingGrade,main="Box Plots",
names = c("Math Grade","Reading Grade"),notch = T)
```
The box plot function can also remove points which it has indicated as outliers.
```{r echo = T}
boxplot(Data$MathGrade,Data$ReadingGrade,main="Box Plots",
names = c("Math Grade","Reading Grade"),outline = F)
```
Just like the box plots, we can add color to the box plots as well.
```{r echo = T}
boxplot(Data$MathGrade,Data$ReadingGrade,main="Box Plots",
names = c("Math Grade","Reading Grade"),col = c("Red","Blue"))
```
Finally, box plots also provide the option to turn the graph horizontally.
```{r echo = T}
boxplot(Data$MathGrade,Data$ReadingGrade,main="Box Plots",
names = c("Math Grade","Reading Grade"),horizontal = T)
```