Continue updating the Apple backend #741

freedomtan · 2023-06-27T02:43:49Z

some ideas we can try to improve the apple backend

in the WWDC 2023 "Use Core ML Tools for machine learning model compression", https://developer.apple.com/wwdc23/10047, Apple folks claimed that Apple's new quantization scheme could help reduce inference latency
- Apple used that to convert Stable Diffusion models, see https://github.com/apple/ml-stable-diffusion
some models definitely still have room for improvement, e.g.,
- the MobileDet was converted with freedom's quick-and-dirty script, and
- the MobileBERT could be improved by referring to Apple's Transformer on Neural Engine guide, https://machinelearning.apple.com/research/neural-engine-transformers

RSMNYS · 2023-10-09T07:58:05Z

@freedomtan for further improvements should we use the saved models from your repo (MobileBert, MobileDet)? Or we can use some models from the TensorFlow hub (At least for MobileBert model: https://tfhub.dev/tensorflow/mobilebert_en_uncased_L-24_H-128_B-512_A-4_F-4_OPT). As I see the saved models are with tf 1 version. However in new model the inputs are different than ours.

freedomtan · 2023-10-10T00:47:26Z

@freedomtan for further improvements should we use the saved models from your repo (MobileBert, MobileDet)? Or we can use some models from the TensorFlow hub (At least for MobileBert model: https://tfhub.dev/tensorflow/mobilebert_en_uncased_L-24_H-128_B-512_A-4_F-4_OPT). As I see the saved models are with tf 1 version. However in new model the inputs are different than ours.

I am not proud of my repo :-)
For MobileBERT: whatever we do, it should be compatible (and mathematically equivalent) with what Google colleagues contribulted at https://github.com/mlcommons/mobile_open/tree/main/language/bert.
For MobileDet: see https://github.com/mlcommons/mobile_open/tree/main/vision/mobilenet

We should check the accuracies of models.

As far as I can tell the https://tfhub.dev/tensorflow/mobilebert_en_uncased_L-24_H-128_B-512_A-4_F-4_OPT is not for SQuAD (hence not compatible)

RSMNYS · 2023-10-30T19:30:33Z

Hi guys! So I've converted the MobileBERT using the coreMLTools version 7, TensorFlow v 2.12 to the *.mlpackage format, as well, as optimised the model using quantization technique. Currently I have the problem to use the *.mlpackage format in our application. The problem arises when do on device compilation to receive the mlmodelc. I've tried to compile on the Mac itself and then use the compiled model, but then some issue with loading its content. So working to resolve this to see how accurate is the optimised model.

When working on the task I found such issues/possible improvements:

When do the on device model compilation we use some deprecated method, which compile the model synchronously. There are alternatives which uses async methods or method with the callback. So would be good to revise this. The problem here is that app expects configured CoreMLExecutor after init.
flutter has some issues while debugging on the device with iOS 17. (So can't debug directly), some patches exists, but the recommended in the doc flutter version is not the official one, and can't be updated, maybe we can revise this as well, and try to update the flutter to latest, at least it will be easier to maintain in future.

freedomtan · 2023-10-31T04:58:26Z

Hi guys! So I've converted the MobileBERT using the coreMLTools version 7, TensorFlow v 2.12 to the *.mlpackage format, as well, as optimised the model using quantization technique. Currently I have the problem to use the *.mlpackage format in our application. The problem arises when do on device compilation to receive the mlmodelc. I've tried to compile on the Mac itself and then use the compiled model, but then some issue with loading its content. So working to resolve this to see how accurate is the optimised model.

When working on the task I found such issues/possible improvements:

When do the on device model compilation we use some deprecated method, which compile the model synchronously. There are alternatives which uses async methods or method with the callback. So would be good to revise this. The problem here is that app expects configured CoreMLExecutor after init.

flutter has some issues while debugging on the device with iOS 17. (So can't debug directly), some patches exists, but the recommended in the doc flutter version is not the official one, and can't be updated, maybe we can revise this as well, and try to update the flutter to latest, at least it will be easier to maintain in future.

@RSMNYS I don't really get what you ran into. From my past experiences, if we can make the //flutter/cpp/binary:main work on macOS, mostly the app will work on iOS.

And for performance, please check if you got latency improvement in Xcode's / Instrument's Core ML Performance Report first.

RSMNYS · 2023-10-31T06:13:40Z

@RSMNYS I don't really get what you ran into. From my past experiences, if we can make the //flutter/cpp/binary:main work on macOS, mostly the app will work on iOS.

The thing is it works for main (when doing tests), and it loads the ml program with no issues. But when trying in the app the error says can't read the spec. Will continue with this today.

RSMNYS · 2023-11-06T12:15:07Z

Hi guys! Here are the results of the inferences by using the original MobileBERT (mlmodel) and the new converted models (mlpackage and optimized mlpackage). For the optimized one we used the default int8 quantized data type). As we can see the converted mlpackage has worse results than the original mlmodel. Need to check what could be the problem. As for the quantized model all seems correct as we used the lower precision for the data type (int8 and not float16)m that's why worse results.

MLPackage is the directory and not the single file. So to have the fingerprint for it we need to archive it. To correctly handle the archive with the mlpackage I've adjusted the archive_cached_helper. So now the app can load the mlpackage and do the inferences. Still have some difficulties with the model path after app restarts, because the logic returns only the path the archive's folder.

In our case we have: https://github.com/RSMNYS/mobile_models/raw/main/v3_0/CoreML/MobileBERT.zip. After download and unarchive the model is saved to ../raw/main/v3_0/CoreML/MobileBERT/MobileBERT.mlpackage. After app restart (app uses cached resources) he app returns this model_path: ../raw/main/v3_0/CoreML/MobileBERT, which is not correct. I think we can resolve this by introducing the new property to the pbtxt settings: model_name, so we can compose the model_path correctly and support models type which are not just single file, but package(directory). Please let's discuss.

freedomtan · 2023-11-06T12:53:07Z

@RSMNYS Please check model performance with Xcode Performance tab and/or Core ML Instruments first. For performance benchmark, it's hard to ask people to believe that we have "improved" model which is 1 - (92.14/121.71) = 24% slower than the original one.

anhappdev · 2023-11-07T06:42:14Z

@RSMNYS Can you try rename ‘MobileBERT.zip’ to ‘MobileBERT.mlpackage.zip’

freedomtan · 2023-11-14T06:24:03Z

@RSMNYS please share your forked repo or one the .mlpackage model.

RSMNYS · 2023-11-16T07:05:56Z

@freedomtan here is the forked repo: https://github.com/RSMNYS/mobile_models/raw/main/v3_0/CoreML/MobileBERT.mlpackage.zip

freedomtan · 2023-11-16T07:50:47Z

@freedomtan here is the forked repo: https://github.com/RSMNYS/mobile_models/raw/main/v3_0/CoreML/MobileBERT.mlpackage.zip

Let's check something basic.

Did you try to open the model you converted in Xcode and run it in the Performance tab as in Xcode Performance tab and/or Core ML Instruments? When I tried to run your .mlpackage model with "CPU and ANE", I got error messages. Mostly, there is something wrong. And if you compare running my .mlmodel and your .mlpackage, you can .mlmodel is faster because it's CPU+ANE and your .mlpackage is on GPU+ANE.
I converted the .mlmodel I had at https://github.com/freedomtan/coreml_models_for_mlperf/tree/main/mobilebert to .mlpackage by opening in Xcode 15.0, clicking the "Edit" button, and then accepting the conversion. I got a model in .mlpackage. And running the model with CPU and ANE is roughly as good as running the .mlmodel one.
if you want to debug the converted model, maybe you can start from its MIL https://apple.github.io/coremltools/docs-guides/source/model-intermediate-language.html

RSMNYS · 2023-11-16T09:34:59Z

@freedomtan here is my test with the converted model:

Sometimes Xcode fails to create the report, but I believe this is the Xcode issue, because it shows me sometimes the wrong operating system, but in the result all is listed correctly.

Can you share the general tab for the converted model, the results, and version of the Xcode, please.

freedomtan · 2023-11-16T09:40:06Z

@RSMNYS I meant "CPU and ANE". "GPU and ANE" is the reason why your model is slower.

@freedomtan here is my test with the converted model:

Sometimes Xcode fails to create the report, but I believe this is the Xcode issue, because it shows me sometimes the wrong operating system, but in the result all is listed correctly.

Can you share the general tab for the converted model, the results, and version of the Xcode, please.

freedomtan · 2023-11-28T06:37:40Z

@freedomtan to post profiling results how old coreml model work on couple devices.

freedomtan · 2023-11-29T01:40:26Z

@RSMNYS
with the MobileBERT.mlmodel here, https://github.com/freedomtan/coreml_models_for_mlperf/tree/main/mobilebert
comparing my .mlmodel and your .mlpackage in the Instruments, you can see, as I said, GPU takes much longer time than CPU.

With coremltools's converter, you can try to convert a TF model to MIL by setting the convert_to parameter to milinternal https://apple.github.io/coremltools/source/coremltools.converters.convert.html.

freedom's .mlmodel	Sergie's .mlpackage

freedomtan · 2023-12-04T03:00:06Z

@RSMNYS

I dug a bit into it over the past weekend. Some information maybe useful.

We can check graphs of both .mlmodel and .mlpackage with netron.
- for .mlmodel: simply netron MobileBERT.mlmodel works
- for .mlpackage: there is model.mlmodel in MobileBERT.mlpackage/Data/com.apple.CoreML/
we can get MIL programs matching graphs
- for .mlmodel: use convert_to = 'milinternal' as mentioned above
- for .mlpackage: xcrun mlmodelc compile MobileBERT.mlpackage /tmp, then we can find model.mil in /tmp/MobileBERT.mlmodelc/
With the graphs and MIL programs, it's possible to check the first 10 ops and 8 os of .mlmodel and .mlpackage, respectively.

And then, it should be possible to tweek .mil program.

RSMNYS · 2024-01-04T15:59:04Z

@freedomtan I did some more testing with MobileBERT.mlpackage. I've set different precisions for the model: Float16, and Float32 and here are the results:

FLOAT16

All units: 8.33 ms 1900 operations on NE, 8 op on GPU
CPU: can't run
CPU & GPU: 31.45 ms - all operations run on GPU
CPU & NE: can't run

FLOAT32

All unit: 31.11 ms - all operations run on GPU only
CPU only: 69.25 ms
CPU & GPU: 30.7 ms - all operations run on GPU only
CPU & NE: 69.13 ms - all operations run on CPU ony

Also found this description: ML programs use a GPU runtime that is backed by the Metal Performance Shaders Graph framework. So could it be that mlpackage is optimised to perform the operations on the gpu (to utilise parallel execution). And since nlp models has the sequence nature, it's not so beneficial to run on gpu. (In terms of qps). We can check other models (vision) to see if the operations are faster in this case. Checking more.

freedomtan · 2024-01-05T07:51:49Z

@freedomtan I did some more testing with MobileBERT.mlpackage. I've set different precisions for the model: Float16, and Float32 and here are the results:

FLOAT16

All units: 8.33 ms 1900 operations on NE, 8 op on GPU CPU: can't run CPU & GPU: 31.45 ms - all operations run on GPU CPU & NE: can't run

FLOAT32

All unit: 31.11 ms - all operations run on GPU only CPU only: 69.25 ms CPU & GPU: 30.7 ms - all operations run on GPU only CPU & NE: 69.13 ms - all operations run on CPU ony

Also found this description: ML programs use a GPU runtime that is backed by the Metal Performance Shaders Graph framework. So could it be that mlpackage is optimised to perform the operations on the gpu (to utilise parallel execution). And since nlp models has the sequence nature, it's not so beneficial to run on gpu. (In terms of qps). We can check other models (vision) to see if the operations are faster in this case. Checking more.

@RSMNYS
The float16 and float32 results don't surprise me at all. As far as I know,

CPU: flp16, fp32 (and maybe bf16)
GPU: fp16, bf16, and float32
ANE: fp16.

I recommend

for .mlmodel and .mlpackage we discussed,

mlmodel	mlpackage

As you can see, running on GPU is slower.

With MIL program and netron, we can find what the 10 and 8 ops in mlmodel and mlpackage, respectively

mlmodel:
- 0: buffer conversion
- 1-3: the 2 nodes after 'segment_ids'
- 4-5: the 2 nodes after 'input_mask'
- 6-7: the 3 nodes after 'input_ids'
- 8-9: loadConst and the add after 3
mlpackage:
- 0-2: bufer conversion
- 3: the node after 'input_ids'
- 4: the gather after 'segment_ids' → cast, cast folded
- 5, 6 : the cast and reshape after 'input_mask'
- 7: the gather after 3, the expand_dims, cast folded

Maybe we can change mlprogram manually to check what stopped the 8 ops from running on CPU.

freedomtan added the priority:low label Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue updating the Apple backend #741

Continue updating the Apple backend #741

freedomtan commented Jun 27, 2023

RSMNYS commented Oct 9, 2023 •

edited

Loading

freedomtan commented Oct 10, 2023

RSMNYS commented Oct 30, 2023

freedomtan commented Oct 31, 2023 •

edited

Loading

RSMNYS commented Oct 31, 2023

RSMNYS commented Nov 6, 2023

freedomtan commented Nov 6, 2023 •

edited

Loading

anhappdev commented Nov 7, 2023

freedomtan commented Nov 14, 2023

RSMNYS commented Nov 16, 2023 •

edited

Loading

freedomtan commented Nov 16, 2023 •

edited

Loading

RSMNYS commented Nov 16, 2023

freedomtan commented Nov 16, 2023

freedomtan commented Nov 28, 2023

freedomtan commented Nov 29, 2023

freedomtan commented Dec 4, 2023 •

edited

Loading

RSMNYS commented Jan 4, 2024

freedomtan commented Jan 5, 2024

Continue updating the Apple backend #741

Continue updating the Apple backend #741

Comments

freedomtan commented Jun 27, 2023

RSMNYS commented Oct 9, 2023 • edited Loading

freedomtan commented Oct 10, 2023

RSMNYS commented Oct 30, 2023

freedomtan commented Oct 31, 2023 • edited Loading

RSMNYS commented Oct 31, 2023

RSMNYS commented Nov 6, 2023

freedomtan commented Nov 6, 2023 • edited Loading

anhappdev commented Nov 7, 2023

freedomtan commented Nov 14, 2023

RSMNYS commented Nov 16, 2023 • edited Loading

freedomtan commented Nov 16, 2023 • edited Loading

RSMNYS commented Nov 16, 2023

freedomtan commented Nov 16, 2023

freedomtan commented Nov 28, 2023

freedomtan commented Nov 29, 2023

freedomtan commented Dec 4, 2023 • edited Loading

RSMNYS commented Jan 4, 2024

freedomtan commented Jan 5, 2024

RSMNYS commented Oct 9, 2023 •

edited

Loading

freedomtan commented Oct 31, 2023 •

edited

Loading

freedomtan commented Nov 6, 2023 •

edited

Loading

RSMNYS commented Nov 16, 2023 •

edited

Loading

freedomtan commented Nov 16, 2023 •

edited

Loading

freedomtan commented Dec 4, 2023 •

edited

Loading