dataset_summarize() output > Dataset assessment 'value' shows incomplete/incorrect information. #80

twey2 · 2024-07-08T19:07:49Z

I'm not positive what the 'value' column in the output of dataset_summarize() > Dataset assessment is supposed to be showing, but I think there is an error. For categorical variables, it sometimes shows all of the category values, separated by semicolons. But frequently, it shows only an incomplete list of the category values, even when all values show up in the variable summaries (Variables summary (all) and Categorical variable summary).
For example, in a dataset I just summarized, variable_3 has values "Male" and "Female".

These show up correctly in 'Categorical variable summary':

But in 'Dataset assessment', only "Female" shows up.

In other variables, all or some of the category values might appear in 'Dataset assessment'.
In at least one case, the value is modified/incorrect ("BModerna45O" is changed to "bmoderna45o"). It's very confusing and unclear why sometimes there are errors and other times not.

More generally, the column 'value' is currently confusing. It shows a mix of types of information (category values, the word "text" for text variables, etc.). Instead of 'value', maybe the column should be called "Description of content" of something else. But I think the objective and presentation of this column could be re-evaluated.

GuiFabre · 2024-10-09T03:29:36Z

hello @twey2
Can you tell me, when you are at this issue, if the information has been corrected as expected ?
Thank you !

twey2 · 2024-10-10T20:45:35Z

I think this is the test/message mismatch we were just discussing today, so the question still applies to the current summary reports. There are two possible related tests/messages: one for the whole variable, another for specific values within the variable. There is already another test/message for "[INFO] - Categorical values present in dataset that do not match categorical values in data dictionary", so this message should correspond to the check for if a variable is categorical in both the data dictionary (has categories in Categories sheet) and in the dataset (is a factor with levels defined). So it looks like the test script needs to be modified to match the message, but to verify.

GuiFabre added the To validate label Oct 9, 2024

GuiFabre added TO TEST bug Something isn't working and removed To validate labels Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset_summarize() output > Dataset assessment 'value' shows incomplete/incorrect information. #80

dataset_summarize() output > Dataset assessment 'value' shows incomplete/incorrect information. #80

twey2 commented Jul 8, 2024 •

edited

Loading

GuiFabre commented Oct 9, 2024

twey2 commented Oct 10, 2024

dataset_summarize() output > Dataset assessment 'value' shows incomplete/incorrect information. #80

dataset_summarize() output > Dataset assessment 'value' shows incomplete/incorrect information. #80

Comments

twey2 commented Jul 8, 2024 • edited Loading

GuiFabre commented Oct 9, 2024

twey2 commented Oct 10, 2024

twey2 commented Jul 8, 2024 •

edited

Loading