Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve manual checking of categorisations #67

Merged
merged 17 commits into from
Mar 6, 2024
Merged

Improve manual checking of categorisations #67

merged 17 commits into from
Mar 6, 2024

Conversation

RayStick
Copy link
Member

@RayStick RayStick commented Jan 31, 2024

Summary of changes

This PR improves the way the code deals with auto categorisations of variables, and how the user reviews all categorisations (manual or auto)

Closes #70

Summary of changes (user interaction)

  • The auto-categorisations are now displayed to the user and they can chose to manually change them
  • A user can edit the last data element they categorised
  • A user can review all their categorisations for that data class, and change one or many

Summary of code refactoring

  • The code that captured the user's responses is now written within its own function (user_categorisation) which the domain_mapping function can call. This is because previously there was the same chunk of code being use in multiple locations throughout the domain_mapping function - unnecessary repetition.

Summary of documentation

  • README has been updated to reflect the above changes to the user interaction

@Rainiefantasy could you focus on what the code is doing, and do some user testing to check it does not break?

@BatoolMM could you focus on more of the 'R package stuff' and documentation to ensure the checks are still passing etc.

@RayStick RayStick changed the title Improve auto Improve auto categorisations Jan 31, 2024
@RayStick RayStick self-assigned this Jan 31, 2024
@RayStick RayStick changed the title Improve auto categorisations Improve manual checking of categorisations Feb 5, 2024
@RayStick RayStick marked this pull request as ready for review February 5, 2024 15:10
@RayStick RayStick marked this pull request as draft February 7, 2024 14:03
@RayStick RayStick changed the title Improve manual checking of categorisations [WIP] Improve manual checking of categorisations Feb 7, 2024
@RayStick RayStick marked this pull request as ready for review February 28, 2024 10:27
@RayStick RayStick changed the title [WIP] Improve manual checking of categorisations Improve manual checking of categorisations Feb 28, 2024
@RayStick RayStick added this to the Before 0.2.0 release milestone Feb 28, 2024
adding library(devtools) line to import library, line 74
removed line 74
@Rainiefantasy
Copy link
Contributor

Rainiefantasy commented Mar 4, 2024

Some initial feedback:
It looks really great - I've followed the demo and it's worked so that's a yay :)
Some things I noticed when going through the demo (sort of grouped the feedback in bold, enhancements are extras)

- 1. [**potential bug?**] I didn't see the prompt mentioned _'If you make a mistake, the next prompt allows you to redo. Or press enter if you are happy to continue.'_ Do you see it or did you remove this functionality and perhaps the readme needs updating?
- 2. [potential bug?] When you finish processing the variables, again there's an expected prompt to go back and redo the categorisations: _'Would you like to review your categorisations? (Y/N)']_ which I didn't see
+ 3. [structure] It may be useful to have the context of the domains, output, etc towards the beginning of the tutorial, OR, a footnote to indicate this information can be found at the bottom of the readme. From someone who doesn't know much about the repo/SAIL/domain data, it made more sense to have a bit of context before trying to classify the features, but maybe that's just me! :) 
- 4. [enhancement] When looking at a new data class/table it'd be useful to indicate which table you're looking at now, and perhaps it'd help you decide how many variables to process for the table?
- 5. [enhancement] It may help to have a pop up to indicate when a variable is being auto classified and is being skipped?
+ 6. printing the variables for user to see before choosing the range of variables to categorise 
+ 7. [enhancement] to prevent user from typing in whatever, there could be errors raised if user isn't for example, entering a comma separated integer type input (I tried all sorts and it doesn't seem to throw an error), or when intending to write 'redo' but misspells, it's considered as a skip/enter.  but this is a bit extra perhaps especially since the output is for the users own use and isn't going to break anything downstream

Hope that helps for now! :)
will tag if I think of anything else, and let me know if there's anything more specific you had in mind in terms of feedback

Edit (5/03/2024 11:30): I've changed the above according to the new changes I pulled using your guide yesterday!

@RayStick
Copy link
Member Author

RayStick commented Mar 4, 2024

@Rainiefantasy thanks very much for this user testing. I just realized from your feedback that I think you ran the code in the main branch but not the latest updates from the improve-auto branch? Can I ask you to double check that you are running the code in improve-auto and then see which of your points above remain relevant? Thanks! I will send you some direct tips on getting that running

@Rainiefantasy
Copy link
Contributor

Rainiefantasy commented Mar 5, 2024

Thanks Rachael - I've made changes to the above comment now after pulling the latest version :) wasn't sure the best way to indicate which is relevant now so I've just git diff and marked green as relevant, red as irrelevant :). I.e. only points 3,6,7 are relevant

@RayStick
Copy link
Member Author

RayStick commented Mar 5, 2024

Super helpful, thanks @Rainiefantasy

  1. [structure] It may be useful to have the context of the domains, output, etc towards the beginning of the tutorial, OR, a footnote to indicate this information can be found at the bottom of the readme. From someone who doesn't know much about the repo/SAIL/domain data, it made more sense to have a bit of context before trying to classify the features, but maybe that's just me! :)

I see what you mean. Can you write this in a GitHub Issue and mark it with 'enhancement' and 'question' labels. See if it relates to another existing issue, and add it on, but I think it might not. Then we can think about how best to approach, taking in account the type of users that are likely to use this package.

  1. printing the variables for user to see before choosing the range of variables to categorise

I like that feature. Can you write this in a GitHub Issue and mark it with 'enhancement'? See if it relates to another existing issue, and add it on, but I think it might not.

  1. [enhancement] to prevent user from typing in whatever, there could be errors raised if user isn't for example, entering a comma separated integer type input (I tried all sorts and it doesn't seem to throw an error), or when intending to write 'redo' but misspells, it's considered as a skip/enter. but this is a bit extra perhaps especially since the output is for the users own use and isn't going to break anything downstream

I'd really like to sort this but wasn't sure how to code it yet. Is this captured in this existing issue: #71? If so, please comment on the issue/edit the issue text with any other thoughts. Would welcome help on solving it!

If you are happy with those responses, please approve the PR :)
Happy to sort out some of them beforehand if you think we should do that before merging these changes into main.

@RayStick
Copy link
Member Author

RayStick commented Mar 5, 2024

@BatoolMM if you have capacity, could you review this PR but focus on more of the 'R package stuff' and documentation, ensuring that all the checks are still passing etc.

@Rainiefantasy reviewed the PR, focusing on what the code was doing, and did some user testing to check it did not break etc. I am open to more user testing, but I think that can also be after this PR is merged if people agree. And we have captured improvements that are needed in GitHub issues to work on, see this milestone: https://github.com/aim-rsf/browseMetadata/milestone/2

@BatoolMM
Copy link
Member

BatoolMM commented Mar 5, 2024

This is fantastic; the package has become more user-friendly and intuitive. The ability to edit the last data element and review the at the end is wonderful.

I did some checks on the documentation, functionality, and dependencies, and everything appears to be working well. However, I encountered a minor warning concerning the newly introduced "Note" column. The method used to define this variable seems somewhat different from the others, resulting in this message:

domain_mapping: no visible binding for global variable ‘Note’
  Undefined global functions or variables:
    Note

I am not sure if it hasn't been explicitly defined in the global environment or within the function's scope - I need to spend more time on the code.

Also, it seems the "Note" column doesn’t come up in the final review table, although it does appear in the CSV file. Or this could have been only me.

When I added at the top:

utils::globalVariables(c("Note"))

the warning stopped - but I believe no immediate changes are necessary, and the update is ready to be merged if you are comfortable with that.

@RayStick
Copy link
Member Author

RayStick commented Mar 5, 2024

Thanks for these comments @BatoolMM. They are super useful. I will check about the 'Note' variable before I merge, in case I did something silly and forgot to include it somewhere I was meant to.

@RayStick
Copy link
Member Author

RayStick commented Mar 6, 2024

@BatoolMM I am going to add this to a new Issue to be investigated later. There doesn't seem to be anything obvious to me, but I do see this same package warning as well. Everything seems to be functioning as intended, but I would like to understand the warning better.

Also, it seems the "Note" column doesn’t come up in the final review table, although it does appear in the CSV file. Or this could have been only me.

This part is intentional - I was trying to not print out too much text to the console. We could show the note (during the variable review) if that was deemed helpful though.

Based on this, I will merge as long as long as @Rainiefantasy is happy with my above responses!

@Rainiefantasy
Copy link
Contributor

Happy for you to merge and I can create new issues for the suggestions above! 😸

@BatoolMM
Copy link
Member

BatoolMM commented Mar 6, 2024

@RayStick Please do feel free to merge 🚀 🚀 🚀

@Rainiefantasy
Copy link
Contributor

Rainiefantasy commented Mar 6, 2024

FYI I’ve created issues for comments above:

I assigned us both for all three for now as you mentioned you'd like help, but we can decide how to distribute later :)

@RayStick RayStick merged commit 33ed942 into main Mar 6, 2024
1 check passed
@RayStick
Copy link
Member Author

RayStick commented Mar 6, 2024

@BatoolMM and @Rainiefantasy - thank you both!

@RayStick RayStick deleted the improve-auto branch March 6, 2024 13:34
@RayStick
Copy link
Member Author

RayStick commented Mar 6, 2024

@all-contributors add @Rainiefantasy for pull request review and ideas

Copy link
Contributor

@RayStick

I've put up a pull request to add @Rainiefantasy! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow user to change answer
3 participants