-
Notifications
You must be signed in to change notification settings - Fork 3
/
anony_implementation.Rmd
133 lines (91 loc) · 2.52 KB
/
anony_implementation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: "anony_implementation"
author: "Yong Cha"
output:
html_document:
toc: yes
toc_depth: 2
number_sections: true
---
# Package Loading
```{r package, message=FALSE}
require(devtools)
install_github('entroPD', 'yongcha')
require(entroPD)
```
# Implementation
## Data Import
```{r load, echo=FALSE}
load('./data/data_adult.Rdata')
```
```{r data}
str(adult)
```
## Partition Data
### Define Attributes
* 데이터를 파티션하기 위헤 변수의 속성을 정의해줌
* __Attributes (or Identifier)__ : (확실한) 식별자
* SSN, Address, Name 등
* __Target Attributes (TA)__ : 분석할 때 반응변수에 해당되는 목표 변수
* __Quasi-Identifiers (QI)__ : 불분명한 식별자
* zip code, birthdate, gender 등
* __Sensitive Attirbutes (SA)__ : 개인과 관련된 민감한 정보
* 질병질환 정보, 연봉 등
* __Insensitive Attributes (IS)__ : 민감하지 않은 기타 정보
```{r part}
# TA (Target)
TA <- c('income')
# QI (Quasi-Identifiers)
QI <- c('age', 'workclass', 'education', 'marital.status', 'occupation', 'race', 'sex', 'native.country')
#QI <- c('age', 'workclass', 'education', 'marital.status')
# SA (Sensitive Attributes)
SA <- c('capital.gain', 'capital.loss')
# IS (Insensitive)
IS <- c('relationship', 'hours.per.week', 'education.num')
# Remove
RM <- c('fnlwgt')
```
### `parda` 함수 적용
```{r parda}
adult.QI <- parda(adult, QI, TA)
str(adult.QI)
```
## `cpselec` 함수 적용
```{r cpselec}
cp.val <- cpselec(10, 100, 2)
cp.val
```
## `rpaclass` 함수 적용
### `mclapply` 적용
```{r rpamc, eval=FALSE}
cp.res.mc <- rpaclass(adult.QI, cp.val, matrix(c(0,1,2,0), ncol=2), cval = 100, mc = TRUE)
```
### `lapply` 적용
```{r rpal}
cp.res.la <- rpaclass(adult.QI, cp.val, matrix(c(0,1,2,0), ncol=2), cval = 100)
cp.res.la[[1]]
```
## `lanode` 함수 적용
```{r lanode}
node.res <- lanode(cp.res.la)
node.res[[30]]
```
## `qigrp` 함수 적용
```{r qigrp, results='asis'}
qigrp.res <- qigrp(adult.QI, node.res, mcs=8)
knitr::kable(head(qigrp.res[[30]], 20))
```
## `eclass` 함수 적용
```{r eclass, results='asis'}
equiclass <- eclass(qigrp.res, QI, TA)
knitr::kable(head(equiclass[[30]], 20))
```
## `resume` 함수 적용
```{r resume, results='asis'}
measure.res <- resume(equiclass, kanon = 5, maxsup = 0.01)
knitr::kable(head(measure.res, 20))
```
## `plot.resume` 함수 적용
```{r plot, message=FALSE, fig.align='center', fig.height=7, fig.width=11, fig.cap='Figure 1'}
plot(measure.res)
```