Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate NN archive from training configs #17

Merged
merged 29 commits into from
Mar 20, 2024
Merged

Generate NN archive from training configs #17

merged 29 commits into from
Mar 20, 2024

Conversation

jkbmrz
Copy link
Contributor

@jkbmrz jkbmrz commented Mar 13, 2024

This PR introduces a new generator class (luxonis_train.core.Archiver) enabling automatic generation of NN archives from training configs. This functionality is exposed both through a command-line interface (CLI) and callbacks (with uploading the NN archive to MLFlow). Furthermore, basic test coverage is included to ensure correctness of generation.

The added features are functional but there are still a few open issues so any insights or suggestions regarding the current implementation would be greatly appreciated!

Open issues:

  • (model formats): Currently, the generator is only able to archive for ONNX model format. Do we want to support it for other formats also (e.g. BLOB, DLC, XML/BIN… )? If so, how to approach models with multiple executables (e.g. the path parameter in NN archive Metadata class requires a single path to the model executable so I'm not sure what to do in case of multiple)?
  • (is_softmax parameter, e.g. in HeadClassification class): Currently, decision if output is already softmaxed is based on iterating ONNX nodes and checking if there is a node named softmax. However, I'm not sure how bulletproof is that and if it is implementable for other model formats. Is there some other way to determine this parameter (OR should maybe even be hard-coded for specific architectures)?
  • (head_outputs classes): I’m wondering how to determine which outputs belong to specific head output parameters (cc @conorsim). For example:
    • yolo_outputs parameter in EfficientBBoxHead. Is it sensible to expect that all outputs of the model always belong here? If not, how to determine the outputs to be listed?
    • predictions parameter in ClassificationHead. Is it sensible to expect that models with ClassificationHead will always only have one output? If not, how to determine which of the outputs is the right one?
    • boxes and scores parameters in ObjectDetectionSSD. How to determine which output belongs to which parameter for an arbitrary SSD network?
  • (tests): I’ve implemented setup class method that makes a dummy LDF, config, and ONNX model, and runs the archivation. Next, the tests are run to inspect basic characteristics of the constructed archive file. In the end, teardown class method deletes all the constructed files. Is this approach OK or is there some better way to tackle this?

Copy link

github-actions bot commented Mar 13, 2024

Test Results

  6 files    6 suites   54m 2s ⏱️
 38 tests  38 ✅ 0 💤 0 ❌
228 runs  228 ✅ 0 💤 0 ❌

Results for commit e4ca5bf.

♻️ This comment has been updated with latest results.

@jkbmrz
Copy link
Contributor Author

jkbmrz commented Mar 13, 2024

Any ideas why are we getting all these import errors during the checks? Adding the missing packages to the requirements.txt might help but its strange because imports fail even for some package imports that existed before my PR (e.g. cv2).

Copy link
Collaborator

@conorsim conorsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you! Just adding some comments for now since there's some questions to sort out. Will reply to those separately.

# head_outputs["prototype_output_name"] # TODO: implement
elif head_name == "ObjectDetectionSSD":
raise NotImplementedError
# head_outputs["anchors"] # TODO: implement
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we support any architectures that will require anchors at the moment, just FYI

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImplicitKeypointBBoxHead head does require anchors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for pointing that out. Actually need to make a change to NN archive then

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But realizing I need all the object detection parameters in the keypoint head anyways in NN archive)

elif head_name == "BiSeNetHead":
parameters["is_softmax"] = self._is_softmax(executable_path) # TODO: test
elif head_name == "ImplicitKeypointBBoxHead":
raise NotImplementedError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to implement this one now with latest updates. The ONNX for ImplicitKeypointBBoxHead should just have one output, which will be predictions in NN archive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding support in commit 3c0ddc3


model = onnx.load(executable_path)
for node in model.graph.node:
if node.op_type.lower() == "softmax":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not sure if this is a good general solution. I think it could just be hardcoded based on the classification or segmentation head. Unless it also becomes a training config options, then we could pull it from there (CC @kozlov721)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could run the ONNX model and check that the outputs sum to 1. It's also not an optimal solution though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's not a bad idea either. I think we could hardcode into each classification/segmentation head to keep it simple for now though

Comment on lines 327 to 328
elif head_name == "ObjectDetectionSSD":
raise NotImplementedError # TODO: boxes, scores
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not sure we even need this case for SSD right now since AFAIK we don't have MobileNet SSD implemented in this library

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing support in commit 87bd9b2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also confirmed with @tersekmatija by the way that we don't have plans of supporting MobileNet SSD in luxonis-train

Comment on lines 329 to 334
elif head_name == "SegmentationHead":
raise NotImplementedError # TODO: predictions
elif head_name == "BiSeNetHead":
raise NotImplementedError
elif head_name == "ImplicitKeypointBBoxHead":
raise NotImplementedError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting we'll also want to implement these once questions are answered

Copy link
Contributor Author

@jkbmrz jkbmrz Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding support in commit 04bc590

luxonis_train/nodes/efficient_bbox_head.py Show resolved Hide resolved
yaml_file.write(yaml_str)

# make model
model = torchvision.models.mobilenet_v2(pretrained=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to use ONNX from luxonis-train instead of torchvision. It's not critical or anything, so we don't necessarily need to make the change in this PR, but I'd say the primary goal is that we want to make sure all the heads in luxonis-train can correspond to heads in NN archive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, will look into it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in commit 5e59c3a. For now, all tests are based on ClassificationModel. Later, we can extend tests for all other models.

@conorsim
Copy link
Collaborator

Any ideas why are we getting all these import errors during the checks? Adding the missing packages to the requirements.txt might help but its strange because imports fail even for some package imports that existed before my PR (e.g. cv2).

I think this is because you import cv2 in the tests you added. And I think OpenCV is not actually a requirement of this repo. But CC @kozlov721 as well.

@conorsim
Copy link
Collaborator

(model formats): Currently, the generator is only able to archive for ONNX model format. Do we want to support it for other formats also (e.g. BLOB, DLC, XML/BIN… )? If so, how to approach models with multiple executables (e.g. the path parameter in NN archive Metadata class requires a single path to the model executable so I'm not sure what to do in case of multiple)?

In my opinion we only need to support ONNX, especially for now. We might still need to address the secondary executable path issue for ONNX as well, but since we don't have the instance segmentation model implemented at the moment, I wouldn't worry about it in this PR.

@conorsim
Copy link
Collaborator

(is_softmax parameter, e.g. in HeadClassification class): Currently, decision if output is already softmaxed is based on iterating ONNX nodes and checking if there is a node named softmax. However, I'm not sure how bulletproof is that and if it is implementable for other model formats. Is there some other way to determine this parameter (OR should maybe even be hard-coded for specific architectures)?

Yeah I left another comment for this but in my opinion it can either be
1/ Hardcoded for each architecture/head in this repo or
2/ An option in the training config which we can pull

@conorsim
Copy link
Collaborator

(head_outputs classes): I’m wondering how to determine which outputs belong to specific head output parameters (cc @conorsim). For example:
yolo_outputs parameter in EfficientBBoxHead. Is it sensible to expect that all outputs of the model always belong here? If not, how to determine the outputs to be listed?
predictions parameter in ClassificationHead. Is it sensible to expect that models with ClassificationHead will always only have one output? If not, how to determine which of the outputs is the right one?
boxes and scores parameters in ObjectDetectionSSD. How to determine which output belongs to which parameter for an arbitrary SSD network?

yolo_outputs: Yes, I think is it sensible to expect that all outputs of the model always belong here.

predictions: Yeah this is the "default" configuration in NN archive where there is only one ONNX output. I think it's safe to assume only one ONNX output in this case.

boxes and scores: Let's not worry about this for now since it's not implemented in luxonis-train.

@conorsim
Copy link
Collaborator

(tests): I’ve implemented setup class method that makes a dummy LDF, config, and ONNX model, and run the archivation within the. Next, the tests are run to inspect basic characteristics of the constructed archive file. In the end, teardown class method deletes all the constructed files. Is this approach OK or is there some better way to tackle this?

Yeah this seems good to me. It will be nice to also add tests for each type of head in luxonis-train once other questions are resolved.

@kozlov721
Copy link
Collaborator

Any ideas why are we getting all these import errors during the checks? Adding the missing packages to the requirements.txt might help but its strange because imports fail even for some package imports that existed before my PR (e.g. cv2).

I think this is because you import cv2 in the tests you added. And I think OpenCV is not actually a requirement of this repo. But CC @kozlov721 as well.

opencv-python should be installed as part of luxonis-ml, so I'm not sure what's the issue exactly. We can add it also to requirements of luxonis-train, but it's strange.

@jkbmrz
Copy link
Contributor Author

jkbmrz commented Mar 14, 2024

I've added opencv-python to requirements.txt but now other packages are reported missing. I'm not sure why but maybe luxonis-ml isn't installed properly. Might it be possible that I disrupted something by listing its dev version in requirements.txt here?

@kozlov721
Copy link
Collaborator

I've added opencv-python to requirements.txt but now other packages are reported missing. I'm not sure why but maybe luxonis-ml isn't installed properly. Might it be possible that I disrupted something by listing its dev version in requirements.txt here?

You're right, seems you accidentally removed the [all] specifier so it only installs luxonis-ml[utils]. Should be fixed.
There's also mlflow package missing; hotfix for now, I'll update luxonis-ml so it also installs all optional filesystem packages when using with the [all] specifier.

Copy link

github-actions bot commented Mar 14, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
4658 3694 79% 0% 🟢

New Files

File Coverage Status
luxonis_train/callbacks/archive_on_train_end.py 38% 🟢
luxonis_train/core/archiver.py 72% 🟢
luxonis_train/nodes/enums/head_categorization.py 100% 🟢
TOTAL 70% 🟢

Modified Files

File Coverage Status
luxonis_train/main.py 50% 🟢
luxonis_train/callbacks/init.py 100% 🟢
luxonis_train/core/init.py 100% 🟢
luxonis_train/core/core.py 83% 🟢
luxonis_train/nodes/efficient_bbox_head.py 100% 🟢
luxonis_train/nodes/implicit_keypoint_bbox_head.py 92% 🟢
luxonis_train/utils/config.py 95% 🟢
TOTAL 89% 🟢

updated for commit: e4ca5bf by action🐍

@jkbmrz
Copy link
Contributor Author

jkbmrz commented Mar 14, 2024

Based on the discussion above, two things remain unresolved:

  1. Determining is_softmax parameter. As suggested by @conorsim we could go for:

1/ Hardcoded for each architecture/head in this repo or
2/ An option in the training config which we can pull

I’d go for the option 1 for now as we could simply implement it by adding a simple enum like:

class ImplementedHeadsSoxtmaxed(Enum):
  ClassificationHead = True/False
  EfficientBBoxHead = None
  ImplicitKeypointBBoxHead = None
  SegmentationHead = True/False
  BiSeNetHead = True/False

What do you think @kozlov721

  1. Expanding tests. As wrote @conorsim:

It will be nice to also add tests for each type of head in luxonis-train once other questions are resolved.

If I understand correctly, tests should generate a model for each of the existing heads (ClassificationHead, SegmentationHead, BiSeNetHead, EfficientBBoxHead, ImplicitKeypointBBoxHead) and test if head parameters (also head_outputs) are set correctly?

@conorsim
Copy link
Collaborator

If I understand correctly, tests should generate a model for each of the existing heads (ClassificationHead, SegmentationHead, BiSeNetHead, EfficientBBoxHead, ImplicitKeypointBBoxHead) and test if head parameters (also head_outputs) are set correctly?

Yeah this is what I meant.

@jkbmrz
Copy link
Contributor Author

jkbmrz commented Mar 18, 2024

I've resolved all the comments from above except adding tests for for each of the existing heads (will be added in a separate PR). We can merge as soon as @kozlov721 approves!

Copy link
Collaborator

@kozlov721 kozlov721 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jkbmrz jkbmrz merged commit e1ab39b into dev Mar 20, 2024
10 checks passed
@jkbmrz jkbmrz deleted the nnarchive-gen branch March 20, 2024 08:07
@kozlov721 kozlov721 mentioned this pull request Oct 9, 2024
kozlov721 added a commit that referenced this pull request Oct 9, 2024
* add archiver CLI

* add archiver callback

* add max_det parameter to EfficientBBoxHead

* add enum to categorize tasks for the implemented heads

* add archiver tests

* adjust Archiver to new nn archive format

* pre-comit formatting

* add LDF creation and adjust to new nn archive format

* update requirements.txt

* add opencv-python to requirements.txt

* add support for ImplicitKeypointBBoxHead

* remove support for ObjectDetectionSSD

* Update requirements.txt

* Added mlflow and removed opencv

* [Automated] Updated coverage badge

* add support for SegmentationHead and BiSeNetHead

* base archiver tests on model from luxonis-train instead of torchvision

* adjust head parameters to changes in NN Archive

* adjust keypoint detection head parameters to changes in NN Archive

* bugfix - make sure self.max_det is used in nms

* add max_det parameter to ImplicitKeypointBBoxHead

* adjust task categorization for ImplicitKeypointBBoxHead

* fixing  Windows PermissionError occuring on file deletion

* fixing Windows PermissionError occuring on file deletion due to unreleased logging handlers

* add method to remove file handlers keeping the log file open

* add a logging statement at the end of archiving

* add optuna_integration to requirements.txt

* add hard-coded solution to determining is_softmax parameter

* added help

---------

Co-authored-by: Martin Kozlovský <[email protected]>
Co-authored-by: GitHub Actions <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants