Improving the dataset preprocessing #5

plonerma · 2024-11-13T16:44:34Z

We discussed improving how the dataset is preprocessed. I added an test-case to ensure the original dataset is not being changed (but instead a changed copy is returned) and started improving some minor points.

There are some open points that should be addressed in the future:

The DataCleaner should be EITHER stateless (i.e. is configured in the init and preprocessing dataset does not change any attributes) OR statefull (i.e. it should only be invoke once with a dataset).
We need to change the order of the processing steps, as at the moment string-labels do not work.

plonerma added 3 commits November 13, 2024 16:42

Fixed two tiny things

6d22002

Added additional test-case for dataset preprocessing

136de72

Made some steps in the datacleaner more explicit

aad7d7f

plonerma requested a review from lukasgarbas November 13, 2024 16:44

lukasgarbas added 3 commits November 22, 2024 15:22

Merge branch 'main' into improving_data_preprocessing

67b5f14

Refactor datacleaner

28938a1

Remove dataset wrapper

9e5651f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the dataset preprocessing #5

Improving the dataset preprocessing #5

plonerma commented Nov 13, 2024

Improving the dataset preprocessing #5

Are you sure you want to change the base?

Improving the dataset preprocessing #5

Conversation

plonerma commented Nov 13, 2024