Add a configurable filter list to HSC data set #53

mtauraso · 2024-08-27T23:18:23Z

This is based on PR #49 , and its important for @mtauraso to change the base branch to main before merging.

I've set the base branch properly so the change is just what's added to the PR #49 branch.

Given dataloader:filters as config, the dataloader will:

Only scan files which are part of its filter set
Prune objects where the full list of filters provided
are not present on the filesystem.

github-actions · 2024-08-27T23:21:44Z

Before [`79293c8`]	After [`78c2f2c`]	Ratio	Benchmark (Parameter)
3.39±0.7s	1.19±0.9s	~0.35	benchmarks.time_computation
320	3.63k	11.35	benchmarks.mem_list

Click here to view all benchmarks.

codecov · 2024-08-27T23:22:05Z

Codecov Report

Attention: Patch coverage is 94.73684% with 4 lines in your changes missing coverage. Please review.

Project coverage is 47.08%. Comparing base (3bffb95) to head (7693a07).
Report is 11 commits behind head on issue/35/cutout-interface-cleanup.

Files	Patch %	Lines
src/fibad/data_loaders/hsc_data_loader.py	94.73%	4 Missing ⚠️

Additional details and impacted files

@@                          Coverage Diff                          @@
##           issue/35/cutout-interface-cleanup      #53      +/-   ##
=====================================================================
+ Coverage                              44.08%   47.08%   +3.00%     
=====================================================================
  Files                                     16       16              
  Lines                                    549      584      +35     
=====================================================================
+ Hits                                     242      275      +33     
- Misses                                   307      309       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Given dataloader:filters as config, the dataloader will: - Only scan files which are part of its filter set - Prune objects where the full list of filters provided are not present on the filesystem.

aritraghsh09

Looks good! One minor comment.

aritraghsh09 · 2024-08-28T00:30:10Z

src/fibad/data_loaders/hsc_data_loader.py

+            m = re.match(full_regex, filename)
+
+            # Skip files that don't match the pattern.
+            if m is None:


If it doesn't make the process super slow, can we log the name of the file being skipped?

I'm more worried about log spam. Adding a debug or info level log here shouldn't slow things down unless the log is being emitted to a console.

I am thinking though that the better solution is to output that manifest fits table, which will have all the skipped files explicitly and not create a potential foot-gun for people changing the logging level to info/debug.

I would advocate for @mtauraso's approach here. Perhaps a middle ground would be logging some summary metrics at the end along with a message saying to look in the manifest fits table for skipped files?

Yes, putting the info in the manifest table sounds good! I also like @drewoldag's idea of some summary metrics at the end if it's easy to implement!

mtauraso requested review from aritraghsh09 and drewoldag August 27, 2024 23:18

mtauraso self-assigned this Aug 27, 2024

mtauraso linked an issue Aug 27, 2024 that may be closed by this pull request

Make HSC data loader accept a list of filters #34

Closed

mtauraso changed the base branch from main to issue/35/cutout-interface-cleanup August 27, 2024 23:25

Adding configurable list of filters to dataloader

5277c7f

Given dataloader:filters as config, the dataloader will: - Only scan files which are part of its filter set - Prune objects where the full list of filters provided are not present on the filesystem.

mtauraso force-pushed the issue/34/filter-list branch from 7693a07 to 5277c7f Compare August 27, 2024 23:27

aritraghsh09 approved these changes Aug 28, 2024

View reviewed changes

Base automatically changed from issue/35/cutout-interface-cleanup to main August 29, 2024 18:12

mtauraso merged commit 2fb04de into main Aug 29, 2024

mtauraso deleted the issue/34/filter-list branch August 29, 2024 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a configurable filter list to HSC data set #53

Add a configurable filter list to HSC data set #53

mtauraso commented Aug 27, 2024 •

edited

Loading

github-actions bot commented Aug 27, 2024

codecov bot commented Aug 27, 2024 •

edited

Loading

aritraghsh09 left a comment •

edited

Loading

aritraghsh09 Aug 28, 2024

mtauraso Aug 29, 2024 •

edited

Loading

drewoldag Aug 29, 2024

aritraghsh09 Aug 29, 2024

Add a configurable filter list to HSC data set #53

Add a configurable filter list to HSC data set #53

Conversation

mtauraso commented Aug 27, 2024 • edited Loading

github-actions bot commented Aug 27, 2024

codecov bot commented Aug 27, 2024 • edited Loading

Codecov Report

aritraghsh09 left a comment • edited Loading

Choose a reason for hiding this comment

aritraghsh09 Aug 28, 2024

Choose a reason for hiding this comment

mtauraso Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

drewoldag Aug 29, 2024

Choose a reason for hiding this comment

aritraghsh09 Aug 29, 2024

Choose a reason for hiding this comment

mtauraso commented Aug 27, 2024 •

edited

Loading

codecov bot commented Aug 27, 2024 •

edited

Loading

aritraghsh09 left a comment •

edited

Loading

mtauraso Aug 29, 2024 •

edited

Loading