Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First principles datasets #181

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

gAldeia
Copy link

@gAldeia gAldeia commented Sep 3, 2024

Data comes from two symbolic regression repos:

They are all datasets that have a first-principle equation derived from data and used in their respective papers to show how symbolic regression has the potential of retrieving the original equation when only observational data is available.

While some of them have just a few samples and others are synthetically generated, they are challenging for symbolic regression methods and can be used to evaluate these algorithms.

The idea of pushing them into PMLB is to help other users to quickly set up experiments with the data.

I still need to write proper metadata for them. My understanding is that opening a PR will trigger a GA that will push some new files to my fork, which I should complete before the new datasets go to revision. Please let me know if there is there anything I got wrong and need to update!

gAldeia and others added 3 commits September 3, 2024 18:48
Data comes from two symbolic regression repos:
- Miles Cranmer's PySR: https://github.com/MilesCranmer/PySR
- Etienne Russeil et al.'s MvSR: https://github.com/erusseil/MvSR-analysis

They are all datasets that have a first-principle equation
derived from data and used in their respective papers
to show how symbolic regression has the potential of retrieving
the original equation when only observational data is available.

While some of them have just a few samples and  others are synthetically
generated, they are challenging for symbolic regression methods and
can be used to evaluate these algorithms.

The idea of pushing them into PMLB is to help other users to
quickly set up experiments with the data.

I still need to write proper metadata for them.
CI was failing to parse the contents of these specific ones.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant