Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Based on what classifiers use, define "Semantically Compressed" data set #200

Open
2 tasks
wmwv opened this issue Jul 13, 2023 · 2 comments
Open
2 tasks
Labels
Discussion Requesting feedback from specialists, and discussion amongst those interested Enhancement New feature or request Pipeline: Science Components producing science output

Comments

@wmwv
Copy link
Collaborator

wmwv commented Jul 13, 2023

Based on the information used by downstream classifiers, identify what is the necessary information to keep in Alert packets while still retaining significant utility.

The current LSST packets are very loosely half pixels and half derived numbers.

  1. Pixel-based classifiers. Should these always assume to just use the full alert packet? If it's pixel-based, maybe it doesn't need most of the other numbers?
  2. Non-pixel-based classifiers. Drop stamps. Do they use the summary statistics?
@troyraen
Copy link
Collaborator

The current module called "lite" does this semantic compression, so (rephrasing the issue) we should revisit which fields that module is dropping vs retaining.

Currently:

  • All fields that are needed by downstream modules within the broker are kept, plus a couple more (I think) but not many. We should revisit with an eye toward what downstream users might want/need.
  • The stamps are dropped, so pixel-based classifiers need to get the alert packet from Cloud Storage. There is a technical issue with including the stamps in the Pub/Sub stream that is not insurmountable but would need to be addressed. (I'd have to refresh my memory on the details, but it's something to do with the fact that the stamps, as provided, cannot be re-serialized to JSON. Most of our streams use JSON because that's Pub/Sub's default, but Avro is an option and maybe we should do that anyway to more closely match the format sent by surveys.)

@troyraen troyraen added Enhancement New feature or request Discussion Requesting feedback from specialists, and discussion amongst those interested Pipeline: Science Components producing science output labels Jul 13, 2023
@wmwv
Copy link
Collaborator Author

wmwv commented Jul 13, 2023

So a worked example of a pixel-based classifier would be helpful to understand the performance and expense of pulling the packet from Cloud Storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Requesting feedback from specialists, and discussion amongst those interested Enhancement New feature or request Pipeline: Science Components producing science output
Projects
None yet
Development

No branches or pull requests

2 participants