Generate NN archive from training configs #17

jkbmrz · 2024-03-13T08:24:49Z

This PR introduces a new generator class (luxonis_train.core.Archiver) enabling automatic generation of NN archives from training configs. This functionality is exposed both through a command-line interface (CLI) and callbacks (with uploading the NN archive to MLFlow). Furthermore, basic test coverage is included to ensure correctness of generation.

The added features are functional but there are still a few open issues so any insights or suggestions regarding the current implementation would be greatly appreciated!

Open issues:

(model formats): Currently, the generator is only able to archive for ONNX model format. Do we want to support it for other formats also (e.g. BLOB, DLC, XML/BIN… )? If so, how to approach models with multiple executables (e.g. the path parameter in NN archive Metadata class requires a single path to the model executable so I'm not sure what to do in case of multiple)?
(is_softmax parameter, e.g. in HeadClassification class): Currently, decision if output is already softmaxed is based on iterating ONNX nodes and checking if there is a node named softmax. However, I'm not sure how bulletproof is that and if it is implementable for other model formats. Is there some other way to determine this parameter (OR should maybe even be hard-coded for specific architectures)?
(head_outputs classes): I’m wondering how to determine which outputs belong to specific head output parameters (cc @conorsim). For example:
- yolo_outputs parameter in EfficientBBoxHead. Is it sensible to expect that all outputs of the model always belong here? If not, how to determine the outputs to be listed?
- predictions parameter in ClassificationHead. Is it sensible to expect that models with ClassificationHead will always only have one output? If not, how to determine which of the outputs is the right one?
- boxes and scores parameters in ObjectDetectionSSD. How to determine which output belongs to which parameter for an arbitrary SSD network?
(tests): I’ve implemented setup class method that makes a dummy LDF, config, and ONNX model, and runs the archivation. Next, the tests are run to inspect basic characteristics of the constructed archive file. In the end, teardown class method deletes all the constructed files. Is this approach OK or is there some better way to tackle this?

github-actions · 2024-03-13T08:31:29Z

Test Results

6 files 6 suites 54m 2s ⏱️
38 tests 38 ✅ 0 💤 0 ❌
228 runs 228 ✅ 0 💤 0 ❌

Results for commit e4ca5bf.

♻️ This comment has been updated with latest results.

jkbmrz · 2024-03-13T10:54:21Z

Any ideas why are we getting all these import errors during the checks? Adding the missing packages to the requirements.txt might help but its strange because imports fail even for some package imports that existed before my PR (e.g. cv2).

conorsim

Nice, thank you! Just adding some comments for now since there's some questions to sort out. Will reply to those separately.

conorsim · 2024-03-13T15:25:41Z

luxonis_train/core/archiver.py

+            # head_outputs["prototype_output_name"] # TODO: implement
+        elif head_name == "ObjectDetectionSSD":
+            raise NotImplementedError
+            # head_outputs["anchors"] # TODO: implement


I don't think we support any architectures that will require anchors at the moment, just FYI

ImplicitKeypointBBoxHead head does require anchors.

Ah thanks for pointing that out. Actually need to make a change to NN archive then

(But realizing I need all the object detection parameters in the keypoint head anyways in NN archive)

conorsim · 2024-03-13T15:26:59Z

luxonis_train/core/archiver.py

+        elif head_name == "BiSeNetHead":
+            parameters["is_softmax"] = self._is_softmax(executable_path)  # TODO: test
+        elif head_name == "ImplicitKeypointBBoxHead":
+            raise NotImplementedError


I think we should be able to implement this one now with latest updates. The ONNX for ImplicitKeypointBBoxHead should just have one output, which will be predictions in NN archive.

adding support in commit 3c0ddc3

conorsim · 2024-03-13T15:29:08Z

luxonis_train/core/archiver.py

+
+        model = onnx.load(executable_path)
+        for node in model.graph.node:
+            if node.op_type.lower() == "softmax":


Yeah I'm not sure if this is a good general solution. I think it could just be hardcoded based on the classification or segmentation head. Unless it also becomes a training config options, then we could pull it from there (CC @kozlov721)

We could run the ONNX model and check that the outputs sum to 1. It's also not an optimal solution though.

Yeah that's not a bad idea either. I think we could hardcode into each classification/segmentation head to keep it simple for now though

conorsim · 2024-03-13T15:30:12Z

luxonis_train/core/archiver.py

+        elif head_name == "ObjectDetectionSSD":
+            raise NotImplementedError  # TODO: boxes, scores


Yeah I'm not sure we even need this case for SSD right now since AFAIK we don't have MobileNet SSD implemented in this library

removing support in commit 87bd9b2

I also confirmed with @tersekmatija by the way that we don't have plans of supporting MobileNet SSD in luxonis-train

conorsim · 2024-03-13T15:30:45Z

luxonis_train/core/archiver.py

+        elif head_name == "SegmentationHead":
+            raise NotImplementedError  # TODO: predictions
+        elif head_name == "BiSeNetHead":
+            raise NotImplementedError
+        elif head_name == "ImplicitKeypointBBoxHead":
+            raise NotImplementedError


Just noting we'll also want to implement these once questions are answered

adding support in commit 04bc590

luxonis_train/nodes/efficient_bbox_head.py

conorsim · 2024-03-13T15:35:57Z

tests/unittests/test_core/test_archiver.py

+            yaml_file.write(yaml_str)
+
+        # make model
+        model = torchvision.models.mobilenet_v2(pretrained=False)


I think it would make sense to use ONNX from luxonis-train instead of torchvision. It's not critical or anything, so we don't necessarily need to make the change in this PR, but I'd say the primary goal is that we want to make sure all the heads in luxonis-train can correspond to heads in NN archive.

I agree, will look into it!

added in commit 5e59c3a. For now, all tests are based on ClassificationModel. Later, we can extend tests for all other models.

conorsim · 2024-03-13T15:38:34Z

Any ideas why are we getting all these import errors during the checks? Adding the missing packages to the requirements.txt might help but its strange because imports fail even for some package imports that existed before my PR (e.g. cv2).

I think this is because you import cv2 in the tests you added. And I think OpenCV is not actually a requirement of this repo. But CC @kozlov721 as well.

conorsim · 2024-03-13T15:40:05Z

(model formats): Currently, the generator is only able to archive for ONNX model format. Do we want to support it for other formats also (e.g. BLOB, DLC, XML/BIN… )? If so, how to approach models with multiple executables (e.g. the path parameter in NN archive Metadata class requires a single path to the model executable so I'm not sure what to do in case of multiple)?

In my opinion we only need to support ONNX, especially for now. We might still need to address the secondary executable path issue for ONNX as well, but since we don't have the instance segmentation model implemented at the moment, I wouldn't worry about it in this PR.

conorsim · 2024-03-13T15:41:31Z

(is_softmax parameter, e.g. in HeadClassification class): Currently, decision if output is already softmaxed is based on iterating ONNX nodes and checking if there is a node named softmax. However, I'm not sure how bulletproof is that and if it is implementable for other model formats. Is there some other way to determine this parameter (OR should maybe even be hard-coded for specific architectures)?

Yeah I left another comment for this but in my opinion it can either be
1/ Hardcoded for each architecture/head in this repo or
2/ An option in the training config which we can pull

conorsim · 2024-03-13T15:44:15Z

(head_outputs classes): I’m wondering how to determine which outputs belong to specific head output parameters (cc @conorsim). For example:
yolo_outputs parameter in EfficientBBoxHead. Is it sensible to expect that all outputs of the model always belong here? If not, how to determine the outputs to be listed?
predictions parameter in ClassificationHead. Is it sensible to expect that models with ClassificationHead will always only have one output? If not, how to determine which of the outputs is the right one?
boxes and scores parameters in ObjectDetectionSSD. How to determine which output belongs to which parameter for an arbitrary SSD network?

yolo_outputs: Yes, I think is it sensible to expect that all outputs of the model always belong here.

predictions: Yeah this is the "default" configuration in NN archive where there is only one ONNX output. I think it's safe to assume only one ONNX output in this case.

boxes and scores: Let's not worry about this for now since it's not implemented in luxonis-train.

conorsim · 2024-03-13T15:45:12Z

(tests): I’ve implemented setup class method that makes a dummy LDF, config, and ONNX model, and run the archivation within the. Next, the tests are run to inspect basic characteristics of the constructed archive file. In the end, teardown class method deletes all the constructed files. Is this approach OK or is there some better way to tackle this?

Yeah this seems good to me. It will be nice to also add tests for each type of head in luxonis-train once other questions are resolved.

kozlov721 · 2024-03-13T17:55:26Z

Any ideas why are we getting all these import errors during the checks? Adding the missing packages to the requirements.txt might help but its strange because imports fail even for some package imports that existed before my PR (e.g. cv2).

I think this is because you import cv2 in the tests you added. And I think OpenCV is not actually a requirement of this repo. But CC @kozlov721 as well.

opencv-python should be installed as part of luxonis-ml, so I'm not sure what's the issue exactly. We can add it also to requirements of luxonis-train, but it's strange.

jkbmrz · 2024-03-14T09:40:58Z

I've added opencv-python to requirements.txt but now other packages are reported missing. I'm not sure why but maybe luxonis-ml isn't installed properly. Might it be possible that I disrupted something by listing its dev version in requirements.txt here?

kozlov721 · 2024-03-14T10:21:35Z

I've added opencv-python to requirements.txt but now other packages are reported missing. I'm not sure why but maybe luxonis-ml isn't installed properly. Might it be possible that I disrupted something by listing its dev version in requirements.txt here?

You're right, seems you accidentally removed the [all] specifier so it only installs luxonis-ml[utils]. Should be fixed.
There's also mlflow package missing; hotfix for now, I'll update luxonis-ml so it also installs all optional filesystem packages when using with the [all] specifier.

github-actions · 2024-03-14T10:41:01Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
4658	3694	79%	0%	🟢

New Files

File	Coverage	Status
luxonis_train/callbacks/archive_on_train_end.py	38%	🟢
luxonis_train/core/archiver.py	72%	🟢
luxonis_train/nodes/enums/head_categorization.py	100%	🟢
TOTAL	70%	🟢

Modified Files

File	Coverage	Status
luxonis_train/main.py	50%	🟢
luxonis_train/callbacks/init.py	100%	🟢
luxonis_train/core/init.py	100%	🟢
luxonis_train/core/core.py	83%	🟢
luxonis_train/nodes/efficient_bbox_head.py	100%	🟢
luxonis_train/nodes/implicit_keypoint_bbox_head.py	92%	🟢
luxonis_train/utils/config.py	95%	🟢
TOTAL	89%	🟢

updated for commit: e4ca5bf by action🐍

jkbmrz · 2024-03-14T16:21:34Z

Based on the discussion above, two things remain unresolved:

Determining is_softmax parameter. As suggested by @conorsim we could go for:

1/ Hardcoded for each architecture/head in this repo or
2/ An option in the training config which we can pull

I’d go for the option 1 for now as we could simply implement it by adding a simple enum like:

class ImplementedHeadsSoxtmaxed(Enum):
  ClassificationHead = True/False
  EfficientBBoxHead = None
  ImplicitKeypointBBoxHead = None
  SegmentationHead = True/False
  BiSeNetHead = True/False

What do you think @kozlov721

Expanding tests. As wrote @conorsim:

It will be nice to also add tests for each type of head in luxonis-train once other questions are resolved.

If I understand correctly, tests should generate a model for each of the existing heads (ClassificationHead, SegmentationHead, BiSeNetHead, EfficientBBoxHead, ImplicitKeypointBBoxHead) and test if head parameters (also head_outputs) are set correctly?

conorsim · 2024-03-15T14:25:46Z

If I understand correctly, tests should generate a model for each of the existing heads (ClassificationHead, SegmentationHead, BiSeNetHead, EfficientBBoxHead, ImplicitKeypointBBoxHead) and test if head parameters (also head_outputs) are set correctly?

Yeah this is what I meant.

…eased logging handlers

jkbmrz · 2024-03-18T15:02:05Z

I've resolved all the comments from above except adding tests for for each of the existing heads (will be added in a separate PR). We can merge as soon as @kozlov721 approves!

kozlov721

LGTM

* add archiver CLI * add archiver callback * add max_det parameter to EfficientBBoxHead * add enum to categorize tasks for the implemented heads * add archiver tests * adjust Archiver to new nn archive format * pre-comit formatting * add LDF creation and adjust to new nn archive format * update requirements.txt * add opencv-python to requirements.txt * add support for ImplicitKeypointBBoxHead * remove support for ObjectDetectionSSD * Update requirements.txt * Added mlflow and removed opencv * [Automated] Updated coverage badge * add support for SegmentationHead and BiSeNetHead * base archiver tests on model from luxonis-train instead of torchvision * adjust head parameters to changes in NN Archive * adjust keypoint detection head parameters to changes in NN Archive * bugfix - make sure self.max_det is used in nms * add max_det parameter to ImplicitKeypointBBoxHead * adjust task categorization for ImplicitKeypointBBoxHead * fixing Windows PermissionError occuring on file deletion * fixing Windows PermissionError occuring on file deletion due to unreleased logging handlers * add method to remove file handlers keeping the log file open * add a logging statement at the end of archiving * add optuna_integration to requirements.txt * add hard-coded solution to determining is_softmax parameter * added help --------- Co-authored-by: Martin Kozlovský <[email protected]> Co-authored-by: GitHub Actions <[email protected]>

jkbmrz added 8 commits March 10, 2024 10:39

add archiver CLI

90e615c

add archiver callback

9dc3328

add max_det parameter to EfficientBBoxHead

a3ab7d7

add enum to categorize tasks for the implemented heads

01255ed

add archiver tests

654de83

adjust Archiver to new nn archive format

65acca2

pre-comit formatting

7302204

add LDF creation and adjust to new nn archive format

f2536f6

update requirements.txt

6e5d17f

jkbmrz requested review from kozlov721, tersekmatija and conorsim March 13, 2024 10:54

conorsim reviewed Mar 13, 2024

View reviewed changes

add opencv-python to requirements.txt

0e3b4de

jkbmrz and others added 4 commits March 14, 2024 11:00

add support for ImplicitKeypointBBoxHead

3c0ddc3

remove support for ObjectDetectionSSD

87bd9b2

Update requirements.txt

0932b20

Added mlflow and removed opencv

ecf905f

actions-user and others added 4 commits March 14, 2024 10:41

[Automated] Updated coverage badge

6aea106

add support for SegmentationHead and BiSeNetHead

04bc590

base archiver tests on model from luxonis-train instead of torchvision

5e59c3a

adjust head parameters to changes in NN Archive

55c74e9

jkbmrz added 5 commits March 15, 2024 12:15

adjust keypoint detection head parameters to changes in NN Archive

b39dafd

bugfix - make sure self.max_det is used in nms

18d4946

add max_det parameter to ImplicitKeypointBBoxHead

85f174b

adjust task categorization for ImplicitKeypointBBoxHead

1009354

fixing Windows PermissionError occuring on file deletion

5ee6649

jkbmrz added 4 commits March 18, 2024 13:43

fixing Windows PermissionError occuring on file deletion due to unrel…

331eae7

…eased logging handlers

add method to remove file handlers keeping the log file open

adf5f65

add a logging statement at the end of archiving

305192d

add optuna_integration to requirements.txt

ac7cafb

conorsim approved these changes Mar 18, 2024

View reviewed changes

add hard-coded solution to determining is_softmax parameter

4362cff

added help

e4ca5bf

kozlov721 approved these changes Mar 19, 2024

View reviewed changes

jkbmrz merged commit e1ab39b into dev Mar 20, 2024
10 checks passed

jkbmrz deleted the nnarchive-gen branch March 20, 2024 08:07

kozlov721 mentioned this pull request Oct 9, 2024

LuxonisTrain - v0.1.0 #102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate NN archive from training configs #17

Generate NN archive from training configs #17

jkbmrz commented Mar 13, 2024 •

edited

Loading

github-actions bot commented Mar 13, 2024 •

edited

Loading

jkbmrz commented Mar 13, 2024 •

edited

Loading

conorsim left a comment

conorsim Mar 13, 2024

kozlov721 Mar 13, 2024

conorsim Mar 13, 2024

conorsim Mar 13, 2024

conorsim Mar 13, 2024

jkbmrz Mar 14, 2024

conorsim Mar 13, 2024

kozlov721 Mar 13, 2024

conorsim Mar 13, 2024

conorsim Mar 13, 2024

jkbmrz Mar 14, 2024

conorsim Mar 14, 2024

conorsim Mar 13, 2024

jkbmrz Mar 14, 2024 •

edited

Loading

conorsim Mar 13, 2024

jkbmrz Mar 14, 2024

jkbmrz Mar 14, 2024

conorsim commented Mar 13, 2024

conorsim commented Mar 13, 2024

conorsim commented Mar 13, 2024

conorsim commented Mar 13, 2024

conorsim commented Mar 13, 2024

kozlov721 commented Mar 13, 2024

jkbmrz commented Mar 14, 2024 •

edited

Loading

kozlov721 commented Mar 14, 2024

github-actions bot commented Mar 14, 2024 •

edited

Loading

jkbmrz commented Mar 14, 2024 •

edited

Loading

conorsim commented Mar 15, 2024

jkbmrz commented Mar 18, 2024

kozlov721 left a comment

		elif head_name == "ObjectDetectionSSD":
		raise NotImplementedError # TODO: boxes, scores

Generate NN archive from training configs #17

Generate NN archive from training configs #17

Conversation

jkbmrz commented Mar 13, 2024 • edited Loading

github-actions bot commented Mar 13, 2024 • edited Loading

Test Results

jkbmrz commented Mar 13, 2024 • edited Loading

conorsim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkbmrz Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

conorsim commented Mar 13, 2024

conorsim commented Mar 13, 2024

conorsim commented Mar 13, 2024

conorsim commented Mar 13, 2024

conorsim commented Mar 13, 2024

kozlov721 commented Mar 13, 2024

jkbmrz commented Mar 14, 2024 • edited Loading

kozlov721 commented Mar 14, 2024

github-actions bot commented Mar 14, 2024 • edited Loading

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

jkbmrz commented Mar 14, 2024 • edited Loading

conorsim commented Mar 15, 2024

jkbmrz commented Mar 18, 2024

kozlov721 left a comment

Choose a reason for hiding this comment

jkbmrz commented Mar 13, 2024 •

edited

Loading

github-actions bot commented Mar 13, 2024 •

edited

Loading

jkbmrz commented Mar 13, 2024 •

edited

Loading

jkbmrz Mar 14, 2024 •

edited

Loading

jkbmrz commented Mar 14, 2024 •

edited

Loading

github-actions bot commented Mar 14, 2024 •

edited

Loading

jkbmrz commented Mar 14, 2024 •

edited

Loading