Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactoring the correlation package #298

Open
3 of 4 tasks
mattansb opened this issue Aug 23, 2023 · 6 comments
Open
3 of 4 tasks

refactoring the correlation package #298

mattansb opened this issue Aug 23, 2023 · 6 comments

Comments

@mattansb
Copy link
Member

mattansb commented Aug 23, 2023

As discussed in out meeting yesterday, the correlation package should be broken down (and built up!) into the following "bits":

A simple 1:1 correlation function (currently done in cor_test()

(#261, #260)

  • input: 2 vectors
  • methods: any of the 10~ currently available
    • Pearson correlations can also be Bayesian (via BayesFactor)
  • output: a tidy data frame - with CIs and p-values

The methods for tetrachoric, polychoric, and biserial correlations can be improved, I think.

Things left to do:

  • Finish docs for cor_test()
  • address all TODOs
  • Fix unit tests
  • Allow x and y to be vectors?

A correlation "matrix" function

(#292, #217, #232)

  • input: a data frame (or data frames) with or without those handy select arguments.
    • should also support grouped data frames
  • methods: same as for the 1:1 variant
  • output: a tidy (long) data frame
    • ... that can be transformed into a matrix-like output (currently via the summary() method).

A function for part/partial correlation

(#311, #301, #204?, #181)

This function will also produce multilevel correlations (#253, #207)?

  • input: a data frame with those handy select arguments to control
    • Need to be able to control what x/y are and what z are, and if z is partialled out from x, y, or both.
    • should also support grouped data frames?
  • methods: only Pearson for now?
  • output: a tidy (long) data frame
    • ... that can be transformed into a matrix-like output (currently via the summary() method).

Things to keep

  • The current plotting options in see are good.
  • All the cor_*() functions also, I think?
  • is.cor and isSquare (maybe rename to snake-case?)
  • z_fisher() (rename to fishers_z, which is more inline with "named" statistic convention in effectsize?)

Also welcoming @TomGeva that will be working on this with @bwiernik and myself

WIP can be found here: https://github.com/TomGeva/correlation2

@strengejacke
Copy link
Member

Hi Tom, welcome on board! Great to see you participating in the easystats project! 🎉

@rempsyc
Copy link
Member

rempsyc commented Aug 23, 2023

Welcome Tom! :) good luck with all this work 😉

@DominiqueMakowski
Copy link
Member

Hi @TomGeva good to have you ☺️ Looking forward to that roadmap and don't hesitate to nag us if there's anything

@mattansb
Copy link
Member Author

BTW, I have some code and a precompiled model (written for a client) to estimate full Bayesian correlation matrices (and partial correlation matrices) using {cmdstanr} (and {posterior}). We might want to use that instead of {BayesFactor}.

@TomGeva
Copy link

TomGeva commented Dec 6, 2023

Happy to join!, thanks for the kind words 😁

@DominiqueMakowski
Copy link
Member

We might want to use that instead of {BayesFactor}

Yeah it'd be great to have more flexibility in terms of prior setting and all, assuming it preserves some of the advantages of BayesFactor (you don't need to compile stan stuff, it's fast and it gives BFs out of the box)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants