Task and Dataset definition #3
sebffischer
started this conversation in
OpenML Design and Feature Requests
Replies: 2 comments 8 replies
-
Beta Was this translation helpful? Give feedback.
2 replies
-
My thoughts about this are:
We haven't had a proper procedure in place, which is why it's so hard to change things. We should fix this soon. When defining new versions of the metadata schema, it would be good to also look at standardization initiatives that have emerged in recent years:
|
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I wanted to start a general discussion on the metadata of tasks and datasets.
One thing I have noticed is that it would be great to have more fields available in the task. There are a couple of concrete examples that I have in mind:
Multiple target variables
There is a datasets where different target variables are possible (y1, y2).
Currently, one has (at least I don't know how else to do it) to upload the dataset twice (once ignoring y1 and once ignoring y2). It would be nice if one could set a field "features" in a task. That way one would only have to upload the dataset once, and simply exclude y1 and y2 respectively in the "features" field.
Group column
In general I think there are cases, where there are columns that I want to have available via download, but that are neither features nor the target variable. One example is a group id.
Let's say I have a dataset with the column "group", that I don't want to use as a feature, but I want to e.g. use it to locally split my data (not using an OpenML task). If I include it in the dataset (i.e. don't set it as ignore column) then it is included in every OpenML task that is created for the dataset afaik.
If I don't include it, it is not available as download and I cannot use it to create the datasplits.
Positive class
For classification problems it would be nice to be able to set a "positive class".
Other fields / properties that would be possible but not so important imo are e.g. "weights" or "stratum".
Beta Was this translation helpful? Give feedback.
All reactions