Add support for MLprogram in ort_coreml #116

yuygfgg · 2024-11-05T13:33:06Z

Onnxruntime supports two Core ML execution providers: NeuralNetwork and MLProgram. The NeuralNetwork provider is the default choice as it supports a wider range of operators, but it does not support FP16 precision (so all nodes falls to CPUExecutionProvider).

The MLProgram provider, while newer and currently supporting fewer operators, does support FP16 and is under active development. (Recent GitHub PRs suggest that it will mature rapidly, adding tens of new operators). Although it might be slower now due to limited operator support, once it achieves comprehensive coverage, the potential CPU/GPU acceleration through FP16 could make it perform better than the NeuralNetwork provider.

In Onnxruntime, the ONNX model is converted to a Core ML model and saved to disk, which is then loaded via Apple's CoreML framework. By choosing FP16 inputs with the MLProgram provider, we can significantly reduce both memory and disk usage as the Core ML model will be stored in a more compact FP16 format. While the ANE always performs computations in FP16 internally regardless of input precision, making FP16 acceleration unnecessary for the neural engine itself, the storage benefits remain valuable.

Moreover, FP16 inputs may accelerate computations on GPU and CPU, as both support FP16 (though not enabled by default). However, the exact behavior of FP16 handling in Onnxruntime remains unclear due to its complex execution flow: ORT first decides which nodes to assign to CoreML, uses CPUExecutionProvider for the rest, and then CoreML further distributes its nodes among CPU, GPU, and ANE.

For more details on FP16 behavior, refer to this documentation: 16-bit precision in Core ML on ANE.

ML Program relevant PRs: microsoft/onnxruntime#19347 microsoft/onnxruntime#22068 microsoft/onnxruntime#22480 microsoft/onnxruntime#22710 and so on.

It enables fp16 computation on ANE, instead of allocating all to CPU. However, the MLprogram is not well-supported currently, supporting much less EPs than regular NeuralNetwork.

yuygfgg · 2024-11-05T13:33:32Z

Need onnxruntime >= 1.20.0

WolframRhodium · 2024-11-05T14:13:07Z

Interesting and thanks for the information.

WolframRhodium · 2024-11-06T06:10:47Z

please use snake case and place the ml_program param to the end of the param list

… list

yuygfgg added 2 commits November 5, 2024 20:53

Add support for MLprogram

28ce473

It enables fp16 computation on ANE, instead of allocating all to CPU. However, the MLprogram is not well-supported currently, supporting much less EPs than regular NeuralNetwork.

Add support for MLprogram in vsmlrt.py

789dc8a

Update vsoort/README.md

819734b

yuygfgg added 2 commits November 8, 2024 09:18

use snake case and place the ml_program param to the end of the param…

53c7857

… list

Merge branch 'AmusementClub:master' into patch-1

31e2465

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for MLprogram in ort_coreml #116

Add support for MLprogram in ort_coreml #116

yuygfgg commented Nov 5, 2024 •

edited

Loading

yuygfgg commented Nov 5, 2024

WolframRhodium commented Nov 5, 2024

WolframRhodium commented Nov 6, 2024 •

edited

Loading

Add support for MLprogram in ort_coreml #116

Are you sure you want to change the base?

Add support for MLprogram in ort_coreml #116

Conversation

yuygfgg commented Nov 5, 2024 • edited Loading

yuygfgg commented Nov 5, 2024

WolframRhodium commented Nov 5, 2024

WolframRhodium commented Nov 6, 2024 • edited Loading

yuygfgg commented Nov 5, 2024 •

edited

Loading

WolframRhodium commented Nov 6, 2024 •

edited

Loading