Improve manual checking of categorisations #67

RayStick · 2024-01-31T15:15:54Z

Summary of changes

This PR improves the way the code deals with auto categorisations of variables, and how the user reviews all categorisations (manual or auto)

Closes #70

Summary of changes (user interaction)

The auto-categorisations are now displayed to the user and they can chose to manually change them
A user can edit the last data element they categorised
A user can review all their categorisations for that data class, and change one or many

Summary of code refactoring

The code that captured the user's responses is now written within its own function (user_categorisation) which the domain_mapping function can call. This is because previously there was the same chunk of code being use in multiple locations throughout the domain_mapping function - unnecessary repetition.

Summary of documentation

README has been updated to reflect the above changes to the user interaction

@Rainiefantasy could you focus on what the code is doing, and do some user testing to check it does not break?

@BatoolMM could you focus on more of the 'R package stuff' and documentation to ensure the checks are still passing etc.

adding library(devtools) line to import library, line 74

removed line 74

R/user_categorisation.R

README.md

Rainiefantasy · 2024-03-04T15:24:39Z

Some initial feedback:
It looks really great - I've followed the demo and it's worked so that's a yay :)
Some things I noticed when going through the demo (sort of grouped the feedback in bold, enhancements are extras)

- 1. [**potential bug?**] I didn't see the prompt mentioned _'If you make a mistake, the next prompt allows you to redo. Or press enter if you are happy to continue.'_ Do you see it or did you remove this functionality and perhaps the readme needs updating?
- 2. [potential bug?] When you finish processing the variables, again there's an expected prompt to go back and redo the categorisations: _'Would you like to review your categorisations? (Y/N)']_ which I didn't see
+ 3. [structure] It may be useful to have the context of the domains, output, etc towards the beginning of the tutorial, OR, a footnote to indicate this information can be found at the bottom of the readme. From someone who doesn't know much about the repo/SAIL/domain data, it made more sense to have a bit of context before trying to classify the features, but maybe that's just me! :) 
- 4. [enhancement] When looking at a new data class/table it'd be useful to indicate which table you're looking at now, and perhaps it'd help you decide how many variables to process for the table?
- 5. [enhancement] It may help to have a pop up to indicate when a variable is being auto classified and is being skipped?
+ 6. printing the variables for user to see before choosing the range of variables to categorise 
+ 7. [enhancement] to prevent user from typing in whatever, there could be errors raised if user isn't for example, entering a comma separated integer type input (I tried all sorts and it doesn't seem to throw an error), or when intending to write 'redo' but misspells, it's considered as a skip/enter.  but this is a bit extra perhaps especially since the output is for the users own use and isn't going to break anything downstream

Hope that helps for now! :)
will tag if I think of anything else, and let me know if there's anything more specific you had in mind in terms of feedback

Edit (5/03/2024 11:30): I've changed the above according to the new changes I pulled using your guide yesterday!

RayStick · 2024-03-04T16:26:25Z

@Rainiefantasy thanks very much for this user testing. I just realized from your feedback that I think you ran the code in the main branch but not the latest updates from the improve-auto branch? Can I ask you to double check that you are running the code in improve-auto and then see which of your points above remain relevant? Thanks! I will send you some direct tips on getting that running

Rainiefantasy · 2024-03-05T11:31:29Z

Thanks Rachael - I've made changes to the above comment now after pulling the latest version :) wasn't sure the best way to indicate which is relevant now so I've just git diff and marked green as relevant, red as irrelevant :). I.e. only points 3,6,7 are relevant

RayStick · 2024-03-05T12:46:51Z

Super helpful, thanks @Rainiefantasy

[structure] It may be useful to have the context of the domains, output, etc towards the beginning of the tutorial, OR, a footnote to indicate this information can be found at the bottom of the readme. From someone who doesn't know much about the repo/SAIL/domain data, it made more sense to have a bit of context before trying to classify the features, but maybe that's just me! :)

I see what you mean. Can you write this in a GitHub Issue and mark it with 'enhancement' and 'question' labels. See if it relates to another existing issue, and add it on, but I think it might not. Then we can think about how best to approach, taking in account the type of users that are likely to use this package.

printing the variables for user to see before choosing the range of variables to categorise

I like that feature. Can you write this in a GitHub Issue and mark it with 'enhancement'? See if it relates to another existing issue, and add it on, but I think it might not.

[enhancement] to prevent user from typing in whatever, there could be errors raised if user isn't for example, entering a comma separated integer type input (I tried all sorts and it doesn't seem to throw an error), or when intending to write 'redo' but misspells, it's considered as a skip/enter. but this is a bit extra perhaps especially since the output is for the users own use and isn't going to break anything downstream

I'd really like to sort this but wasn't sure how to code it yet. Is this captured in this existing issue: #71? If so, please comment on the issue/edit the issue text with any other thoughts. Would welcome help on solving it!

If you are happy with those responses, please approve the PR :)
Happy to sort out some of them beforehand if you think we should do that before merging these changes into main.

RayStick · 2024-03-05T12:59:12Z

@BatoolMM if you have capacity, could you review this PR but focus on more of the 'R package stuff' and documentation, ensuring that all the checks are still passing etc.

@Rainiefantasy reviewed the PR, focusing on what the code was doing, and did some user testing to check it did not break etc. I am open to more user testing, but I think that can also be after this PR is merged if people agree. And we have captured improvements that are needed in GitHub issues to work on, see this milestone: https://github.com/aim-rsf/browseMetadata/milestone/2

BatoolMM · 2024-03-05T14:23:50Z

This is fantastic; the package has become more user-friendly and intuitive. The ability to edit the last data element and review the at the end is wonderful.

I did some checks on the documentation, functionality, and dependencies, and everything appears to be working well. However, I encountered a minor warning concerning the newly introduced "Note" column. The method used to define this variable seems somewhat different from the others, resulting in this message:

domain_mapping: no visible binding for global variable ‘Note’
  Undefined global functions or variables:
    Note

I am not sure if it hasn't been explicitly defined in the global environment or within the function's scope - I need to spend more time on the code.

Also, it seems the "Note" column doesn’t come up in the final review table, although it does appear in the CSV file. Or this could have been only me.

When I added at the top:

utils::globalVariables(c("Note"))

the warning stopped - but I believe no immediate changes are necessary, and the update is ready to be merged if you are comfortable with that.

RayStick · 2024-03-05T16:14:55Z

Thanks for these comments @BatoolMM. They are super useful. I will check about the 'Note' variable before I merge, in case I did something silly and forgot to include it somewhere I was meant to.

RayStick · 2024-03-06T12:07:30Z

@BatoolMM I am going to add this to a new Issue to be investigated later. There doesn't seem to be anything obvious to me, but I do see this same package warning as well. Everything seems to be functioning as intended, but I would like to understand the warning better.

Also, it seems the "Note" column doesn’t come up in the final review table, although it does appear in the CSV file. Or this could have been only me.

This part is intentional - I was trying to not print out too much text to the console. We could show the note (during the variable review) if that was deemed helpful though.

Based on this, I will merge as long as long as @Rainiefantasy is happy with my above responses!

Rainiefantasy · 2024-03-06T12:30:03Z

Happy for you to merge and I can create new issues for the suggestions above! 😸

BatoolMM · 2024-03-06T12:39:15Z

@RayStick Please do feel free to merge 🚀 🚀 🚀

Rainiefantasy · 2024-03-06T12:54:35Z

FYI I’ve created issues for comments above:

Changing readme structure to have contextual information towards the beginning Changing readme structure to have contextual information towards the beginning #74
Printing variables for user to see prior selecting range of variables to categorise Printing variables for user to see prior selecting range of variables to categorise #75
the last one is captured in Restrict input to readline(prompt = "") #71 as you mentioned so I've left as is.

I assigned us both for all three for now as you mentioned you'd like help, but we can decide how to distribute later :)

RayStick · 2024-03-06T13:34:45Z

@BatoolMM and @Rainiefantasy - thank you both!

RayStick · 2024-03-06T13:38:02Z

@all-contributors add @Rainiefantasy for pull request review and ideas

allcontributors · 2024-03-06T13:38:11Z

@RayStick

I've put up a pull request to add @Rainiefantasy! 🎉

RayStick added 2 commits January 31, 2024 13:37

test

ec4a8cf

it works

fa3e72e

RayStick changed the title ~~Improve auto~~ Improve auto categorisations Jan 31, 2024

RayStick self-assigned this Jan 31, 2024

Merge branch 'main' into improve-auto

6491f93

RayStick changed the title ~~Improve auto categorisations~~ Improve manual checking of categorisations Feb 5, 2024

RayStick added 9 commits February 5, 2024 12:44

refactor-step1

8fee5d8

refactor-step2

2b64707

format

fa40471

docs

60674e1

docs

3cc8b5f

docs

c61f97e

docs

e3e4978

update demo

df61d5b

clarity

788f9db

RayStick requested a review from BatoolMM February 5, 2024 15:10

RayStick marked this pull request as ready for review February 5, 2024 15:10

RayStick requested a review from Rainiefantasy February 5, 2024 15:33

clarify

f2bb561

RayStick marked this pull request as draft February 7, 2024 14:03

RayStick changed the title ~~Improve manual checking of categorisations~~ [WIP] Improve manual checking of categorisations Feb 7, 2024

bug fix

fc2ddc1

RayStick marked this pull request as ready for review February 28, 2024 10:27

RayStick changed the title ~~[WIP] Improve manual checking of categorisations~~ Improve manual checking of categorisations Feb 28, 2024

RayStick added this to the Before 0.2.0 release milestone Feb 28, 2024

Rainiefantasy added 2 commits February 29, 2024 15:11

Update README.md

7dd20c0

adding library(devtools) line to import library, line 74

Update README.md

0ebc0ac

removed line 74

RayStick commented Feb 29, 2024

View reviewed changes

R/user_categorisation.R Show resolved Hide resolved

Rainiefantasy reviewed Feb 29, 2024

View reviewed changes

README.md Show resolved Hide resolved

for MM

b1654f2

BatoolMM approved these changes Mar 5, 2024

View reviewed changes

Rainiefantasy mentioned this pull request Mar 6, 2024

Restrict input to readline(prompt = "") #71

Closed

RayStick merged commit 33ed942 into main Mar 6, 2024
1 check passed

RayStick deleted the improve-auto branch March 6, 2024 13:34

allcontributors bot mentioned this pull request Mar 6, 2024

docs: add Rainiefantasy as a contributor for review, and ideas #76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve manual checking of categorisations #67

Improve manual checking of categorisations #67

RayStick commented Jan 31, 2024 •

edited

Loading

Rainiefantasy commented Mar 4, 2024 •

edited

Loading

RayStick commented Mar 4, 2024

Rainiefantasy commented Mar 5, 2024 •

edited

Loading

RayStick commented Mar 5, 2024 •

edited

Loading

RayStick commented Mar 5, 2024 •

edited

Loading

BatoolMM commented Mar 5, 2024

RayStick commented Mar 5, 2024

RayStick commented Mar 6, 2024 •

edited

Loading

Rainiefantasy commented Mar 6, 2024

BatoolMM commented Mar 6, 2024

Rainiefantasy commented Mar 6, 2024 •

edited

Loading

RayStick commented Mar 6, 2024

RayStick commented Mar 6, 2024

allcontributors bot commented Mar 6, 2024

Improve manual checking of categorisations #67

Improve manual checking of categorisations #67

Conversation

RayStick commented Jan 31, 2024 • edited Loading

Summary of changes

Rainiefantasy commented Mar 4, 2024 • edited Loading

RayStick commented Mar 4, 2024

Rainiefantasy commented Mar 5, 2024 • edited Loading

RayStick commented Mar 5, 2024 • edited Loading

RayStick commented Mar 5, 2024 • edited Loading

BatoolMM commented Mar 5, 2024

RayStick commented Mar 5, 2024

RayStick commented Mar 6, 2024 • edited Loading

Rainiefantasy commented Mar 6, 2024

BatoolMM commented Mar 6, 2024

Rainiefantasy commented Mar 6, 2024 • edited Loading

RayStick commented Mar 6, 2024

RayStick commented Mar 6, 2024

allcontributors bot commented Mar 6, 2024

RayStick commented Jan 31, 2024 •

edited

Loading

Rainiefantasy commented Mar 4, 2024 •

edited

Loading

Rainiefantasy commented Mar 5, 2024 •

edited

Loading

RayStick commented Mar 5, 2024 •

edited

Loading

RayStick commented Mar 5, 2024 •

edited

Loading

RayStick commented Mar 6, 2024 •

edited

Loading

Rainiefantasy commented Mar 6, 2024 •

edited

Loading