Skip to content

Tools and service for differentially private processing of tabular and relational data

License

Notifications You must be signed in to change notification settings

DataResponsibly/smartnoise-sdk

 
 

Repository files navigation

License: MIT

SmartNoise SDK: Tools for Differential Privacy on Tabular Data

The SmartNoise SDK includes 2 packages:

To get started, see the examples below. Click into each project for more detailed examples.

SQL

Python

Install

pip install smartnoise-sql

Query

import snsql
from snsql import Privacy
import pandas as pd

csv_path = 'PUMS.csv'
meta_path = 'PUMS.yaml'

data = pd.read_csv(csv_path)
privacy = Privacy(epsilon=1.0, delta=0.01)
reader = snsql.from_connection(data, privacy=privacy, metadata=meta_path)

result = reader.execute('SELECT sex, AVG(age) AS age FROM PUMS.PUMS GROUP BY sex')

print(result)

See the SQL project

Synthesizers

Python

Install

pip install smartnoise-synth

MWEM

import pandas as pd
import numpy as np

pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)
nf = pums.to_numpy().astype(int)

synth = snsynth.MWEMSynthesizer(epsilon=1.0, split_factor=nf.shape[1]) 
synth.fit(nf)

sample = synth.sample(10) # get 10 synthetic rows
print(sample)

PATE-CTGAN

import pandas as pd
import numpy as np
from snsynth.pytorch.nn import PATECTGAN
from snsynth.pytorch import PytorchDPSynthesizer

pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)

synth = PytorchDPSynthesizer(1.0, PATECTGAN(regularization='dragan'), None)
synth.fit(pums, categorical_columns=pums.columns.values.tolist())

sample = synth.sample(10) # synthesize 10 rows
print(sample)

See the Synthesizers project

Communication

Releases and Contributing

Please let us know if you encounter a bug by creating an issue.

We appreciate all contributions. Please review the contributors guide. We welcome pull requests with bug-fixes without prior discussion.

If you plan to contribute new features, utility functions or extensions to this system, please first open an issue and discuss the feature with us.

About

Tools and service for differentially private processing of tabular and relational data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.4%
  • ANTLR 1.0%
  • HTML 0.6%
  • Makefile 0.5%
  • HCL 0.3%
  • TSQL 0.2%