Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset_summarize() output > Dataset assessment 'value' shows incomplete/incorrect information. #80

Open
twey2 opened this issue Jul 8, 2024 · 2 comments
Labels
bug Something isn't working TO TEST

Comments

@twey2
Copy link

twey2 commented Jul 8, 2024

I'm not positive what the 'value' column in the output of dataset_summarize() > Dataset assessment is supposed to be showing, but I think there is an error. For categorical variables, it sometimes shows all of the category values, separated by semicolons. But frequently, it shows only an incomplete list of the category values, even when all values show up in the variable summaries (Variables summary (all) and Categorical variable summary).
For example, in a dataset I just summarized, variable_3 has values "Male" and "Female".
image
These show up correctly in 'Categorical variable summary':
image
image
But in 'Dataset assessment', only "Female" shows up.
image
In other variables, all or some of the category values might appear in 'Dataset assessment'.
In at least one case, the value is modified/incorrect ("BModerna45O" is changed to "bmoderna45o"). It's very confusing and unclear why sometimes there are errors and other times not.

More generally, the column 'value' is currently confusing. It shows a mix of types of information (category values, the word "text" for text variables, etc.). Instead of 'value', maybe the column should be called "Description of content" of something else. But I think the objective and presentation of this column could be re-evaluated.

@GuiFabre
Copy link
Contributor

GuiFabre commented Oct 9, 2024

hello @twey2
Can you tell me, when you are at this issue, if the information has been corrected as expected ?
Thank you !

@twey2
Copy link
Author

twey2 commented Oct 10, 2024

I think this is the test/message mismatch we were just discussing today, so the question still applies to the current summary reports. There are two possible related tests/messages: one for the whole variable, another for specific values within the variable. There is already another test/message for "[INFO] - Categorical values present in dataset that do not match categorical values in data dictionary", so this message should correspond to the check for if a variable is categorical in both the data dictionary (has categories in Categories sheet) and in the dataset (is a factor with levels defined). So it looks like the test script needs to be modified to match the message, but to verify.

@GuiFabre GuiFabre added TO TEST bug Something isn't working and removed To validate labels Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working TO TEST
Projects
None yet
Development

No branches or pull requests

2 participants