[ENH] v2 Work on tracking constant columns efficiently #121

adam2392 · 2023-09-02T22:33:53Z

Closes: #115

Changes proposed in this pull request:

enables us to track columns of features that are constant at a certain split node. Since the tree is built sequentially, this # of known constants is passed to lower levels, which makes use of this information
according to my benchmarks, this does not improve things much, but I am still unable to run asv benchmarks, so I think this can be improved

Some ideas for the next PR:

sample projection vector within the while loop instead of sampling the matrix all at once. This allows us to hash the projection we sample and also ignore any constants we find in earlier stages of the split
probably explore in-lining methods

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings following the syntax described in the
Writing docstrings section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <[email protected]>

…into constantsv2

Signed-off-by: Adam Li <[email protected]>

codecov · 2023-09-11T18:53:33Z

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.01% ⚠️

Comparison is base (b582895) 87.68% compared to head (7efaee4) 87.68%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #121      +/-   ##
==========================================
- Coverage   87.68%   87.68%   -0.01%     
==========================================
  Files          28       28              
  Lines        2323     2322       -1     
==========================================
- Hits         2037     2036       -1     
  Misses        286      286

Files Changed	Coverage Δ
sktree/tests/test_supervised_forest.py	`99.41% <100.00%> (ø)`
sktree/tree/tests/test_tree.py	`99.51% <100.00%> (-0.01%)`	⬇️
sktree/tree/tests/test_utils.py	`98.79% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sampan501

Why do you try to avoid splitting on constant features in node_split?

Wouldn't it be easier to preprocess X by just removing constant features from it? Which could be done before running any of the tree building code?

adam2392 · 2023-09-12T11:37:04Z

It is actually to track constants after a certain point. E.g. you split 4 times and then the rest of column 10 and 20 are constant so at any node underneath, there is no point in splitting on column 10 or 20 in the oblique combination.

PSSF23

How do we test that this feature works properly? The performance should stay very much the same, so we look for wall time differences? I also wonder how patch oblique works with it.

adam2392 · 2023-09-12T13:50:06Z

Perhaps depth of the tree on a very simple setup.

adam2392 · 2023-09-12T13:50:46Z

Patch oblique we would not use this feature as of now.

adam2392 · 2023-09-12T15:04:29Z

How do we test that this feature works properly? The performance should stay very much the same, so we look for wall time differences? I also wonder how patch oblique works with it.

Anyone in the lab have exp benchmarking and profiling compiled code? Heh. It would be useful to determine if this vs main is faster via one of the benchmarks in benchmarks_nonasv/ or benchmarks/

PSSF23

I used time.perf_counter() to measure the wall times, like the example here:

https://github.com/neurodata/SDTF/blob/eb2545b8cd50503723d619497059510e37b3e7ad/benchmarks/code/cifar10.py#L35-L38

adam2392 added 11 commits September 2, 2023 16:33

Still works

6905601

Signed-off-by: Adam Li <[email protected]>

Working prototype

bcca3ea

Signed-off-by: Adam Li <[email protected]>

Adding benchmark suite

3b36e3a

Signed-off-by: Adam Li <[email protected]>

Merge branch 'main' into constantsv2

a06b2e5

Adding asv benchmarks

c0895ec

Signed-off-by: Adam Li <[email protected]>

Merge branch 'constantsv2' of https://github.com/neurodata/scikit-tree …

b91515f

…into constantsv2

Merge main

bac918c

Signed-off-by: Adam Li <[email protected]>

Try again

f799c71

Signed-off-by: Adam Li <[email protected]>

Try again

9b7ed6d

Signed-off-by: Adam Li <[email protected]>

Update submodule

841562f

Signed-off-by: Adam Li <[email protected]>

Fix unit-test

88233c4

Signed-off-by: Adam Li <[email protected]>

adam2392 marked this pull request as ready for review September 11, 2023 18:50

adam2392 requested review from PSSF23 and sampan501 September 12, 2023 01:10

sampan501 reviewed Sep 12, 2023

View reviewed changes

PSSF23 reviewed Sep 12, 2023

View reviewed changes

Merge branch 'main' into constantsv2

7efaee4

PSSF23 reviewed Sep 12, 2023

View reviewed changes

adam2392 deleted the branch temp2 April 26, 2024 07:58

adam2392 closed this Apr 26, 2024

adam2392 mentioned this pull request Apr 26, 2024

[ENH] v2 Work on tracking constant columns efficiently #271

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] v2 Work on tracking constant columns efficiently #121

[ENH] v2 Work on tracking constant columns efficiently #121

adam2392 commented Sep 2, 2023 •

edited

Loading

codecov bot commented Sep 11, 2023 •

edited

Loading

sampan501 left a comment

adam2392 commented Sep 12, 2023

PSSF23 left a comment

adam2392 commented Sep 12, 2023

adam2392 commented Sep 12, 2023

adam2392 commented Sep 12, 2023

PSSF23 left a comment

[ENH] v2 Work on tracking constant columns efficiently #121

[ENH] v2 Work on tracking constant columns efficiently #121

Conversation

adam2392 commented Sep 2, 2023 • edited Loading

Before submitting

After submitting

codecov bot commented Sep 11, 2023 • edited Loading

Codecov Report

sampan501 left a comment

Choose a reason for hiding this comment

adam2392 commented Sep 12, 2023

PSSF23 left a comment

Choose a reason for hiding this comment

adam2392 commented Sep 12, 2023

adam2392 commented Sep 12, 2023

adam2392 commented Sep 12, 2023

PSSF23 left a comment

Choose a reason for hiding this comment

adam2392 commented Sep 2, 2023 •

edited

Loading

codecov bot commented Sep 11, 2023 •

edited

Loading