Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_lookup_file #77

Merged
merged 4 commits into from
Apr 2, 2024
Merged

add_lookup_file #77

merged 4 commits into from
Apr 2, 2024

Conversation

RayStick
Copy link
Member

@RayStick RayStick commented Mar 21, 2024

Closes #66
Closes #68

New data has been added to the package which is a lookup table for auto-categorizations

  • see data/look_up.rda and data-raw/look_up.csv
  • run ?look_up for the manual

These auto categorisations were previously described within the domain_mapping function itself, so this chunk of code has now been deleted.

  • see updates to R/domain_mapping.R
  • run ?domain_mapping for the updated manual

The README file has been updated to reflect that there is now a 3rd input to the domain_mapping function

  • domain_mapping(json_file, domain_file, look_up_file)

@Rainiefantasy in your review could you please:

  • do a demo run with all the defaults - domain_mapping() to see if auto categorisations still work well
  • do a demo run, but input your own look up file csv - domain_mapping(,,look_up_file = 'path')
  • check the changes to the source code to see if there is a better way to write any of the changes
  • anything else you think seems needed, based on the changes I have made

Thanks! =D

@RayStick RayStick marked this pull request as ready for review March 21, 2024 12:35
@RayStick RayStick added this to the Before 0.2.0 release milestone Mar 21, 2024
@Rainiefantasy
Copy link
Contributor

This looks great! will do a demo run monday! 🤞

@Rainiefantasy
Copy link
Contributor

Notes:

  1. demo run worked great!
  2. was able to load my own csv look up file, and that worked too :). Only thing to note is, when there's duplicate rows in the look up table, it seems to stop functioning - intention here was to try categorise for more than one domain and wasn't sure how to do that. But maybe that's fine if it'll mostly be 1:1 mappings?

@RayStick
Copy link
Member Author

RayStick commented Apr 2, 2024

  1. demo run worked great!

Yay!

was able to load my own csv look up file, and that worked too :).

Yay!

Only thing to note is, when there's duplicate rows in the look up table, it seems to stop functioning - intention here was to try categorise for more than one domain and wasn't sure how to do that. But maybe that's fine if it'll mostly be 1:1 mappings?

Ah, that's a good point. Just so I understand - you mean if you duplicate the DataElement row and give it a different mapping, it will only read the first row for this Data Element? Most of the Data Elements included in the lookup will be 1:1 mappings as you say (unlike the ones the user may categorise) but it could be good to allow for multiple mappings. Do you think we should try to address that in this PR or write something in the README for now like "The lookup table assumes 1:1 mappings" and make a new GH Issue for it to address later (as an enhancement).

@Rainiefantasy
Copy link
Contributor

Rainiefantasy commented Apr 2, 2024

#Ah, that's a good point. Just so I understand - you mean if you duplicate the DataElement row and give it a different mapping, it will only read the first row for this Data Element? Most of the Data Elements included in the lookup will be 1:1 mappings as you say (unlike the ones the user may categorise) but it could be good to allow for multiple mappings. Do you think we should try to address that in this PR or write something in the README for now like "The lookup table assumes 1:1 mappings" and make a new GH Issue for it to address later (as an enhancement).
Yes correct, duplicating the row so that I have for example:

AVAIL_FROM_DT,Health,8
AVAIL_FROM_DT, Unsure,0

Instead of auto classifying the variable for both domains (8,0) it asks you to classify it manually, and at the end still asks you to check the autocategorisations but looks like a bug:

! Please check the auto categorised data elements are accurate:
[1] DataClass   DataElement Domain_code
<0 rows> (or 0-length row.names)

If it'll be a one to one mapping anyway I think it's fine for now; but yep as you say could be an enhancement for later if you think it'd help?

@RayStick
Copy link
Member Author

RayStick commented Apr 2, 2024

@Rainiefantasy that makes to me, we'll leave it for later then, as the point of the lookup is mostly 1:1 mappings.
If you are happy please approve this PR

@Rainiefantasy
Copy link
Contributor

Source code looks great to me, happy with merge :)

Copy link
Contributor

@Rainiefantasy Rainiefantasy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved :)

@RayStick RayStick merged commit 161f07b into main Apr 2, 2024
1 check passed
@RayStick RayStick deleted the auto-lookup branch April 2, 2024 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow user to give their own list of auto-categorisations Improve auto categorisations
2 participants