Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Collabra paper draft #568

Closed
wants to merge 6 commits into from
Closed

New Collabra paper draft #568

wants to merge 6 commits into from

Conversation

rempsyc
Copy link
Member

@rempsyc rempsyc commented Apr 2, 2023

New Collabra paper draft.

Real time PDF: https://github.com/easystats/performance/blob/collabra_paper/papers/Collabra/paper.pdf

Edit: reminder to use [skip ci] in this PR

Follow-up on #544

@rempsyc
Copy link
Member Author

rempsyc commented Apr 2, 2023

Should we streamline the review process for Collabra?

Streamlined review?
Authors whose articles have been rejected within the previous 365 days from other journals for reasons that are not due to lack of scientific, methodological, or ethical rigor are welcome to submit prior reviews and decision letters along with their submission, and request a streamlined review in their cover letter.

@easystats easystats deleted a comment from codecov-commenter Apr 2, 2023
@rempsyc
Copy link
Member Author

rempsyc commented Apr 2, 2023

@bwiernik, you mentioned the paper would need some polishing for Collabra. Any specific suggestions on the current version?

I did add the height-weight code demo as discussed earlier.

@DominiqueMakowski mentioned maybe arguing more strongly in favour of our multiple methods approach. But I I am not really able to do that. So Dom would you like to make an attempt? If not no worry, we can leave as is.

Also, there is no rticles template for Collabra, so I didn't know what template to use for the paper but they use double blind review so I used the preprint archive thing for now because I didn't want to have to change all the meta-info parameters to classic rmarkdown manually again.

@rempsyc
Copy link
Member Author

rempsyc commented Apr 2, 2023

Also, in the paper, we cite easystats as follows:

Lüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Brenton M. Wiernik, Etienne Bacher, Rémi Thériault, and Dominique Makowski. 2022. “Easystats: Framework for Easy Statistical Modeling, Visualization, and Reporting.” CRAN. https://easystats.github.io/easystats/.

However, that does not correspond to the order of authors on the package website or using report::cite_easystats().

Lüdecke, D., Makowski, D., Ben-Shachar, M. S., Patil, I., Wiernik, B. M., Bacher, Etienne, & Thériault, R. (2023). easystats: Streamline model interpretation, visualization, and reporting (0.6.0) [R package]. https://easystats.github.io/easystats/ (Original work published 2019)

So @IndrajeetPatil which citation/author order is correct!?

@IndrajeetPatil
Copy link
Member

Also, in the paper, we cite easystats as follows:

Lüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Brenton M. Wiernik, Etienne Bacher, Rémi Thériault, and Dominique Makowski. 2022. “Easystats: Framework for Easy Statistical Modeling, Visualization, and Reporting.” CRAN. https://easystats.github.io/easystats/.

However, that does not correspond to the order of authors on the package website or using report::cite_easystats().

Lüdecke, D., Makowski, D., Ben-Shachar, M. S., Patil, I., Wiernik, B. M., Bacher, Etienne, & Thériault, R. (2023). easystats: Streamline model interpretation, visualization, and reporting (0.6.0) [R package]. https://easystats.github.io/easystats/ (Original work published 2019)

So @IndrajeetPatil which citation/author order is correct!?

Why are we citing the easystats meta-paper?! I don't think we should cite it. We haven't done so for any of the other publications. We can just mention it, and even link to the GitHub organization, but I don't think we need to cite it here. That said, we can cite the relevant JOSS papers. E.g., in ggstatsplot paper, this is what I do:

Screenshot 2023-04-03 at 08 29 20

So @IndrajeetPatil which citation/author order is correct!?

At some point, we will need to decide on the order, but I feel it's a bit early for that.

@rempsyc
Copy link
Member Author

rempsyc commented Apr 3, 2023

I believe @DominiqueMakowski added the reference to easystats. I don't think it's necessarily a bad idea even if it's early and that things could change. I like for instance referring to the website.

I mean that's the outcome of using citation("easystats") so people are probably already citing that. If that's a problem maybe we need to change it?

I had not realized that the two titles were different though. As you say, the first one is probably the meta-paper, whereas in the second one, we're simply citing the website/package, so I think that's fine?

@rempsyc
Copy link
Member Author

rempsyc commented Apr 9, 2023

Anybody else would like to make a last reread of the paper before I submit to Collabra @easystats/core-team? I would like to submit next weekend if nobody requests changes. Thanks.


## What happens ater?

See comment in the thread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a discussion or some thoughts about what to do after outlier detection: re-run analyses? Show results with and without outliers? Describe outlying sample? I don't think there is one single best approach but we could at least mention some caveats and important things.

One thing we might highlight is the need for reporting the characteristics of the outliers (how many were removed, the percentage, and eventually the features). For instance, we could add an argument in report_participants() and report_sample(), outliers that takes a vector of outliers / the output of check_outliers, and that would add a description of the outliers.

In report_participants, could look like... "Description of whole sample. Out of this sample, X (3.54%) of participants were flagged as outliers (demographic summary), leaving X participants in the final set (new demographics without)."

in report_sample(): @strengejacke ideas how to present it?

Copy link
Member Author

@rempsyc rempsyc Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a discussion or some thoughts about what to do after outlier detection: re-run analyses? Show results with and without outliers?

We do have already a section called "Handling Outliers". And then we go about describing the different types of outliers, and what to do for each type of outlier, and whether to keep, exclude, or winsorize depending on the case. Does that correspond to what you wanted?

One thing we might highlight is the need for reporting the characteristics of the outliers (how many were removed, the percentage, and eventually the features)

We already kind of do that in the transparency section no? (I added a mention of the percentage though, thanks)

 we could add an argument in report_participants() and report_sample(), outliers that takes a vector of outliers / the output of check_outliers, and that would add a description of the outliers.

It's true that we don't mention describing the characteristics of the outliers; perhaps I could add a paragraph on that. But do we really need a new function, when we can simply subset on the data directly with the outlier object or as you say a vector of outliers directly, if you have it?

report_participants(data[which(outliers),])

And I wonder if that assumes homogeneity of the outliers, which they should not be for random outliers. Even if they are, there would be no way of knowing unless you also report measures of homogeneity. But with so few observations as 1 to maybe 5 outliers, how meaningful would these data be? Especially if they are e.g., error outliers that are excluded, then any sample description would be meaningless. Maybe only for interesting outliers to try to figure out a pattern? Please convince me why you think this is important :)

In this sense, the paper fits very well with the special issue "Advances in Statistical Computing", as it essentially communicates to the wider public current advances in the statistical computing of outlier detection algorithms and their implementation in currently available open source and free software. This makes the manuscript relevant to data science, behavioural science, and statistical computing more generally.
It explains the key approaches and highlights recommendations, and shows how users can adopt them in their R analysis with just one function. The manuscript covers univariate, multivariate, and model-based statistical outlier detection methods, their recommended threshold, standard output, and plotting method, among other things.

Beyond acting like a concise review of outlier treatment procedures and practical tutorial, we also describe a new method (a consensus-based approach) and discuss its benefits ad limitations. In this sense, the paper fits very well with the scope of the journal, as it essentially communicates to the wider public current advances in the statistical computing of outlier detection algorithms and their implementation in currently available open source and free software. This makes the manuscript relevant to data science, behavioural science, and statistical computing more generally.
Copy link
Member Author

@rempsyc rempsyc Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we don't discuss benefits and limitations of the consensus-based approach right now. @DominiqueMakowski actually what are the method's benefits and limitations exactly? Since I believe you came up with the method, if you give me some pointers I may be able to flesh it out in the paper.

@rempsyc
Copy link
Member Author

rempsyc commented Apr 17, 2023

Alright, so I have finished updating the cover letter and the manuscript, and writing the response to reviewers.

@IndrajeetPatil @strengejacke @DominiqueMakowski @mattansb that's the last opportunity to review those documents before I resubmit. Also two small things for the paper:

  1. On p. 6 of the PDF, I added a summary decision table. Thoughts? Is it OK?
  2. On p. 7, I wanted to draw a nice scatter plot to show the model expectation without adding too many lines of code. But I was not able to do this using see, ggplot2, or base R, so I just rempsyc::nice_scatter. But I don't know how I feel about using rempsyc here. Is it OK?

@rempsyc
Copy link
Member Author

rempsyc commented May 13, 2023

I'll create a new PR for JOSE. Should we merge this then?

@rempsyc rempsyc mentioned this pull request May 13, 2023
@mattansb
Copy link
Member

Merge? Or delete?

@rempsyc
Copy link
Member Author

rempsyc commented May 14, 2023

Hum, delete probably, yeah. I was thinking in case we want to keep archives or files and stuff but it was bugging the JOSE submission system anyway so I deleted it as well in the other PR.

@rempsyc rempsyc closed this May 14, 2023
@IndrajeetPatil IndrajeetPatil deleted the collabra_paper branch December 3, 2023 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants