Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MLprogram in ort_coreml #116

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

yuygfgg
Copy link
Contributor

@yuygfgg yuygfgg commented Nov 5, 2024

Onnxruntime supports two Core ML execution providers: NeuralNetwork and MLProgram. The NeuralNetwork provider is the default choice as it supports a wider range of operators, but it does not support FP16 precision (so all nodes falls to CPUExecutionProvider).

The MLProgram provider, while newer and currently supporting fewer operators, does support FP16 and is under active development. (Recent GitHub PRs suggest that it will mature rapidly, adding tens of new operators). Although it might be slower now due to limited operator support, once it achieves comprehensive coverage, the potential CPU/GPU acceleration through FP16 could make it perform better than the NeuralNetwork provider.

In Onnxruntime, the ONNX model is converted to a Core ML model and saved to disk, which is then loaded via Apple's CoreML framework. By choosing FP16 inputs with the MLProgram provider, we can significantly reduce both memory and disk usage as the Core ML model will be stored in a more compact FP16 format. While the ANE always performs computations in FP16 internally regardless of input precision, making FP16 acceleration unnecessary for the neural engine itself, the storage benefits remain valuable.

Moreover, FP16 inputs may accelerate computations on GPU and CPU, as both support FP16 (though not enabled by default). However, the exact behavior of FP16 handling in Onnxruntime remains unclear due to its complex execution flow: ORT first decides which nodes to assign to CoreML, uses CPUExecutionProvider for the rest, and then CoreML further distributes its nodes among CPU, GPU, and ANE.


For more details on FP16 behavior, refer to this documentation: 16-bit precision in Core ML on ANE.

ML Program relevant PRs: microsoft/onnxruntime#19347 microsoft/onnxruntime#22068 microsoft/onnxruntime#22480 microsoft/onnxruntime#22710 and so on.

It enables fp16 computation on ANE, instead of allocating all to CPU. However, the MLprogram is not well-supported currently, supporting much less EPs than regular NeuralNetwork.
@yuygfgg
Copy link
Contributor Author

yuygfgg commented Nov 5, 2024

Need onnxruntime >= 1.20.0

@WolframRhodium
Copy link
Contributor

Interesting and thanks for the information.

@WolframRhodium
Copy link
Contributor

WolframRhodium commented Nov 6, 2024

please use snake case and place the ml_program param to the end of the param list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants