dataframe cleaning package
This package contains helper methods to clean pandas dataframe quickly and therefore simplifying the data cleaning process
OS X & Linux:
pip3 install dfcleaner
Windows:
pip install dfcleaner
import pandas as pd
from dfcleaner import cleaner
cleaner.ENABLE_LOGGING = True
cleaner.LOG_DIR = './logs'
df = pd.read_csv('some_filename.csv')
df.columns = cleaner.sanitize(df.columns)
conversion_dict = cleaner.suggest_conversion_dict(df)
df = cleaner.preprocess(df,
column_dtype_conversion_dictionary = conversion_dict,
std_coeff = 1.5,
fill_na_method = 'median',
label_col = None)
pip3 install -r requirements.txt
Windows:
pip install -r requirements.txt
M. Zahash – [email protected]
Distributed under the MIT license. See LICENSE
for more information.
- Fork it (https://github.com/zahash/dfcleaner/fork)
- Create your feature branch (
git checkout -b feature/fooBar
) - Commit your changes (
git commit -am 'Add some fooBar'
) - Push to the branch (
git push origin feature/fooBar
) - Create a new Pull Request