Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the dataset preprocessing #5

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

plonerma
Copy link
Collaborator

We discussed improving how the dataset is preprocessed. I added an test-case to ensure the original dataset is not being changed (but instead a changed copy is returned) and started improving some minor points.

There are some open points that should be addressed in the future:

  • The DataCleaner should be EITHER stateless (i.e. is configured in the init and preprocessing dataset does not change any attributes) OR statefull (i.e. it should only be invoke once with a dataset).
  • We need to change the order of the processing steps, as at the moment string-labels do not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants