-
Notifications
You must be signed in to change notification settings - Fork 0
/
06-EDA_Graphs.Rmd
145 lines (97 loc) · 2.7 KB
/
06-EDA_Graphs.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# Exploratory data analysis
## Prepare folder and data
### Set the working directory
## Prepare folder and data
## Set the working directory
This can be done in 2 ways:
1. Using codes
2. Using point and click
To use point and click, use the down arrow button next to *More* . Then click 'Set as working directory'
### List the files inside the working directory
All files will be displayed when you click 'Files'.
Or you can use this code,
```{r}
list.files()
```
### Reading dataset from SPSS file (.sav)
Dataset in SPSS format will end with .sav. To read SPSS data into R we use 'foreign' library.
Create a object to represent the SPSS data that we will read into R.
```{r}
library(foreign)
dataSPSS<-read.spss('qol.sav', to.data.frame = TRUE)
```
## Describing data
Let us examine the data
```{r}
str(dataSPSS)
```
Now, let us summarize our data
```{r}
summary(dataSPSS)
```
## Graphing or Plotting data
You must ask yourselves these:
1. Which variable do you want to plot?
2. What is the type of that variable? Factor? Numerical?
3. Are you going to plot another variable together?
### One variable: A categorical or factor variable
We can create a simple barchart
```{r}
dist.sex<-table(dataSPSS$sex)
barplot(dist.sex,
main='Sex distribution',
xlab='Sex')
```
### One variable: A numerical variable
histogram
```{r}
hist(dataSPSS$age, main = 'Age',
xlab='Age in years',
ylab='Count')
```
### Two variables : A numerical with another numerical variable
We will use *scatterplot* to plot
```{r}
plot(dataSPSS$tahundx, dataSPSS$age,
main = 'Duration having DM VS age',
xlab = 'Duration of DM', ylab = 'Age',
pch = 19)
```
Let us make a fit line
```{r}
plot(dataSPSS$tahundx, dataSPSS$age,
main = 'Duration having DM VS age',
xlab = 'Duration of DM', ylab = 'Age',
pch = 19)
abline(lm(dataSPSS$age~dataSPSS$tahundx), col = 'red')
```
and a lowess
```{r}
plot(dataSPSS$tahundx, dataSPSS$age,
main = 'Duration having DM VS age',
xlab = 'Duration of DM', ylab = 'Age',
pch = 19)
lines(lowess(dataSPSS$tahundx,dataSPSS$age), col = 'blue')
```
### Two variables : A categorical variable with a categorical variable
Now, we will plot 2 categorical variables simultenously.
First, we will use stacked barchart
```{r}
compl.sex<-table(dataSPSS$complica,dataSPSS$sex)
compl.sex
barplot(compl.sex,
main='Complications by sex',
xlab='Sex',
col=c('blue','red'),
legend=c('No','Yes'))
```
Next, we will use grouped barchart
```{r}
compl.sex
barplot(compl.sex,
main = 'Complications according to sex',
xlab = 'Sex',
col = c('blue','red'),
legend = c('no','yes'),
beside = TRUE)
```