Skip to content
This repository has been archived by the owner on Nov 30, 2023. It is now read-only.

Final low and high values of the partitions #5

Open
prajwal1210 opened this issue Nov 19, 2019 · 15 comments
Open

Final low and high values of the partitions #5

prajwal1210 opened this issue Nov 19, 2019 · 15 comments

Comments

@prajwal1210
Copy link

So, I notice that in the code for the Mondrian, we only update the parent low and high values along a dimension when it is chosen as an allowed dimension. A few concerns regarding that:

  1. The dimension choice depends on the low and high values so won't we use wrong and old values to make the choice
  2. Once a dimension cannot be split anymore, we do not update its low and high value, however, a split in some other allowable dimension may cause range for this dimension to change as well
@qiyuangong
Copy link
Owner

So, I notice that in the code for the Mondrian, we only update the parent low and high values along a dimension when it is chosen as an allowed dimension. A few concerns regarding that:

  1. The dimension choice depends on the low and high values so won't we use wrong and old values to make the choice
  2. Once a dimension ca'snot be split anymore, we do not update its low and high value, however, a split in some other allowable dimension may cause range for this dimension to change as well

Hi @prajwal1210

Sorry for late reply. :)

A to your concerns:

  1. The basic guideline of generalization is using range values to replace real values, such that the results are correct but not wrong. This technique is not perfect. It doesn't work for all cases.
  2. Correct. Splitting on other dimension may change the range of other dimension, but that won't hurt data anonymization.

Have a nice day!
Qiyuan

@3ndri
Copy link

3ndri commented Dec 24, 2020

Hello,
I just wanted to ask what data exactly gets anonymized. I am running the code with the instructions and i cant quite understand what goes inside the anonymized.data. I am sorry if this sounds like a "stupid" question but I am new to this.

Thank you!

@qiyuangong
Copy link
Owner

qiyuangong commented Dec 26, 2020

Hello,
I just wanted to ask what data exactly gets anonymized. I am running the code with the instructions and i cant quite understand what goes inside the anonymized.data. I am sorry if this sounds like a "stupid" question but I am new to this.

Thank you!

Hi @3ndri . There isn't any stupid question, only stupid answer.

In short, we all know identifier (such as phone number) should be removed, meanwhile QIDs (quasi-identifier, such as age, gender etc) will be anonymized by k-anonymity related algorithms (e.g., Mondrian or others), all others attributes including sensitive values will remain untouched.

Hope this information can help you. :)

@3ndri
Copy link

3ndri commented Dec 26, 2020

But which column is the phone number in adult.data?

@3ndri
Copy link

3ndri commented Dec 26, 2020

Also the output is the same whether i run it with k=10 or k=20
Screenshot from 2020-12-26 21-30-07

@qiyuangong
Copy link
Owner

But which column is the phone number in adult.data?

IDs (phone personal ID or others) are already removed before available.

@qiyuangong
Copy link
Owner

Also the output is the same whether i run it with k=10 or k=20
Screenshot from 2020-12-26 21-30-07

No. They are different in NCP, which means information loss (higher NPC means more loss). Pls read REAMD.md, and checkout the output dir.

@3ndri
Copy link

3ndri commented Dec 27, 2020

But what does the output over K=10 mean? The one which reads:

[[], ['State-gov', 'Self-emp-not-inc', 'Private', 'Federal-gov', 'Local-gov', 'Self-emp-inc', 'Without-pay'], [], ['Never-married', 'Married-civ-spouse', 'Divorced', 'Married-spouse-absent', 'Separated', 'Married-AF-spouse', 'Widowed'], ['Adm-clerical', 'Exec-managerial', 'Handlers-cleaners', 'Prof-specialty', 'Other-service', 'Sales', 'Transport-moving', 'Farming-fishing', 'Machine-op-inspct', 'Tech-support', 'Craft-repair', 'Protective-serv', 'Armed-Forces', 'Priv-house-serv'], ['White', 'Black', 'Asian-Pac-Islander', 'Amer-Indian-Eskimo', 'Other'], ['Male', 'Female'], ['United-States', 'Cuba', 'Jamaica', 'India', 'Mexico', 'Puerto-Rico', 'Honduras', 'England', 'Canada', 'Germany', 'Iran', 'Philippines', 'Poland', 'Columbia', 'Cambodia', 'Thailand', 'Ecuador', 'Laos', 'Taiwan', 'Haiti', 'Portugal', 'Dominican-Republic', 'El-Salvador', 'France', 'Guatemala', 'Italy', 'China', 'South', 'Japan', 'Yugoslavia', 'Peru', 'Outlying-US(Guam-USVI-etc)', 'Scotland', 'Trinadad&Tobago', 'Greece', 'Nicaragua', 'Vietnam', 'Hong', 'Ireland', 'Hungary', 'Holand-Netherlands']]

@3ndri
Copy link

3ndri commented Dec 27, 2020

Oh I get it now, those are the quasi-identifiers

@Arigato97
Copy link

I have a question about which database this program calls

@Arigato97
Copy link

Can you help me annotate the program? I don't understand it as a novice please

@qiyuangong
Copy link
Owner

Can you help me annotate the program? I don't understand it as a novice please

Hi @Arigato97

This program calls adult dataset (https://github.com/qiyuangong/Mondrian/blob/master/data/adult.data) by default, and can be changed into infoms dataset (https://github.com/qiyuangong/Mondrian/blob/master/data/conditions.csv and https://github.com/qiyuangong/Mondrian/blob/master/data/demographics.csv)

@Arigato97
Copy link

Can you add a little more comments to the program? It seems a little difficult for me ,please,help

@Arigato97
Copy link

有些程序看不明白 不清楚具体作用 能添加多一些注释吗 谢谢

@qiyuangong
Copy link
Owner

有些程序看不明白 不清楚具体作用 能添加多一些注释吗 谢谢

抱歉,已经不会再添加注释和功能。

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants