Skip to content
/ DECAF Public

Data Extraction and Cleaning Automated Framework - in Python!

License

Notifications You must be signed in to change notification settings

phively/DECAF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DECAF

Data Extraction and Cleaning Automated Framework - in Python!

DECAF provides an extensible framework for performing common data cleaning tasks with modular functions.

Installation

# Install from GitHub dist (recommended)
pip install "https://github.com/phively/DECAF/releases/download/0.1.1/decaf-0.1.1.tar.gz"

# Install local version
python -m pip install FILEPATH\decaf-0.1.1.tar.gz

# Compile source
python -m build

Module structure

  • DECAF: master module, used to specify data files and config inis to use for cleaning
  • DatafileIO: reads config inis and reads and writes data

Other modules should not have dependencies on each other.

Planned features

  • String parsing libraries covering common data types: phone number, email, address...
  • Exact and fuzzy match options
  • RegEx compatibility, possibly some kind of assistant
  • Easy creation of compound operations using config files

To investigate

About

Data Extraction and Cleaning Automated Framework - in Python!

Resources

License

Stars

Watchers

Forks

Packages

No packages published