-
-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Collabra paper draft #568
Conversation
Should we streamline the review process for Collabra?
|
@bwiernik, you mentioned the paper would need some polishing for Collabra. Any specific suggestions on the current version? I did add the height-weight code demo as discussed earlier. @DominiqueMakowski mentioned maybe arguing more strongly in favour of our multiple methods approach. But I I am not really able to do that. So Dom would you like to make an attempt? If not no worry, we can leave as is. Also, there is no |
Also, in the paper, we cite easystats as follows:
However, that does not correspond to the order of authors on the package website or using
So @IndrajeetPatil which citation/author order is correct!? |
Why are we citing the easystats meta-paper?! I don't think we should cite it. We haven't done so for any of the other publications. We can just mention it, and even link to the GitHub organization, but I don't think we need to cite it here. That said, we can cite the relevant JOSS papers. E.g., in ggstatsplot paper, this is what I do:
At some point, we will need to decide on the order, but I feel it's a bit early for that. |
I believe @DominiqueMakowski added the reference to easystats. I don't think it's necessarily a bad idea even if it's early and that things could change. I like for instance referring to the website. I mean that's the outcome of using I had not realized that the two titles were different though. As you say, the first one is probably the meta-paper, whereas in the second one, we're simply citing the website/package, so I think that's fine? |
Anybody else would like to make a last reread of the paper before I submit to Collabra @easystats/core-team? I would like to submit next weekend if nobody requests changes. Thanks. |
papers/Collabra/paper.Rmd
Outdated
|
||
## What happens ater? | ||
|
||
See comment in the thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a discussion or some thoughts about what to do after outlier detection: re-run analyses? Show results with and without outliers? Describe outlying sample? I don't think there is one single best approach but we could at least mention some caveats and important things.
One thing we might highlight is the need for reporting the characteristics of the outliers (how many were removed, the percentage, and eventually the features). For instance, we could add an argument in report_participants()
and report_sample()
, outliers
that takes a vector of outliers / the output of check_outliers, and that would add a description of the outliers.
In report_participants
, could look like... "Description of whole sample. Out of this sample, X (3.54%) of participants were flagged as outliers (demographic summary), leaving X participants in the final set (new demographics without)."
in report_sample()
: @strengejacke ideas how to present it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a discussion or some thoughts about what to do after outlier detection: re-run analyses? Show results with and without outliers?
We do have already a section called "Handling Outliers". And then we go about describing the different types of outliers, and what to do for each type of outlier, and whether to keep, exclude, or winsorize depending on the case. Does that correspond to what you wanted?
One thing we might highlight is the need for reporting the characteristics of the outliers (how many were removed, the percentage, and eventually the features)
We already kind of do that in the transparency section no? (I added a mention of the percentage though, thanks)
we could add an argument in report_participants() and report_sample(), outliers that takes a vector of outliers / the output of check_outliers, and that would add a description of the outliers.
It's true that we don't mention describing the characteristics of the outliers; perhaps I could add a paragraph on that. But do we really need a new function, when we can simply subset on the data directly with the outlier object or as you say a vector of outliers directly, if you have it?
report_participants(data[which(outliers),])
And I wonder if that assumes homogeneity of the outliers, which they should not be for random outliers. Even if they are, there would be no way of knowing unless you also report measures of homogeneity. But with so few observations as 1 to maybe 5 outliers, how meaningful would these data be? Especially if they are e.g., error outliers that are excluded, then any sample description would be meaningless. Maybe only for interesting outliers to try to figure out a pattern? Please convince me why you think this is important :)
papers/Collabra/cover_letter.Rmd
Outdated
In this sense, the paper fits very well with the special issue "Advances in Statistical Computing", as it essentially communicates to the wider public current advances in the statistical computing of outlier detection algorithms and their implementation in currently available open source and free software. This makes the manuscript relevant to data science, behavioural science, and statistical computing more generally. | ||
It explains the key approaches and highlights recommendations, and shows how users can adopt them in their R analysis with just one function. The manuscript covers univariate, multivariate, and model-based statistical outlier detection methods, their recommended threshold, standard output, and plotting method, among other things. | ||
|
||
Beyond acting like a concise review of outlier treatment procedures and practical tutorial, we also describe a new method (a consensus-based approach) and discuss its benefits ad limitations. In this sense, the paper fits very well with the scope of the journal, as it essentially communicates to the wider public current advances in the statistical computing of outlier detection algorithms and their implementation in currently available open source and free software. This makes the manuscript relevant to data science, behavioural science, and statistical computing more generally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we don't discuss benefits and limitations of the consensus-based approach right now. @DominiqueMakowski actually what are the method's benefits and limitations exactly? Since I believe you came up with the method, if you give me some pointers I may be able to flesh it out in the paper.
Alright, so I have finished updating the cover letter and the manuscript, and writing the response to reviewers. @IndrajeetPatil @strengejacke @DominiqueMakowski @mattansb that's the last opportunity to review those documents before I resubmit. Also two small things for the paper:
|
I'll create a new PR for JOSE. Should we merge this then? |
Merge? Or delete? |
Hum, delete probably, yeah. I was thinking in case we want to keep archives or files and stuff but it was bugging the JOSE submission system anyway so I deleted it as well in the other PR. |
New Collabra paper draft.
Real time PDF: https://github.com/easystats/performance/blob/collabra_paper/papers/Collabra/paper.pdf
Edit: reminder to use [skip ci] in this PR
Follow-up on #544