-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] v2 Work on tracking constant columns efficiently #121
Conversation
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
…into constantsv2
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #121 +/- ##
==========================================
- Coverage 87.68% 87.68% -0.01%
==========================================
Files 28 28
Lines 2323 2322 -1
==========================================
- Hits 2037 2036 -1
Misses 286 286
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you try to avoid splitting on constant features in node_split
?
Wouldn't it be easier to preprocess X by just removing constant features from it? Which could be done before running any of the tree building code?
It is actually to track constants after a certain point. E.g. you split 4 times and then the rest of column 10 and 20 are constant so at any node underneath, there is no point in splitting on column 10 or 20 in the oblique combination. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we test that this feature works properly? The performance should stay very much the same, so we look for wall time differences? I also wonder how patch oblique works with it.
Perhaps depth of the tree on a very simple setup. |
Patch oblique we would not use this feature as of now. |
Anyone in the lab have exp benchmarking and profiling compiled code? Heh. It would be useful to determine if this vs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used time.perf_counter()
to measure the wall times, like the example here:
Closes: #115
Changes proposed in this pull request:
Some ideas for the next PR:
Before submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.After submitting