std::bad_alloc Exception When Loading Large Model on iOS with MediaPipe #5757
Labels
platform:ios
MediaPipe IOS issues
task:LLM inference
Issues related to MediaPipe LLM Inference Gen AI setup
type:others
issues not falling in bug, perfromance, support, build and install or feature
I'm experiencing a std::bad_alloc exception when attempting to load a large model (~2.16 GB) using MediaPipe's LLM inference capabilities on an iPhone 16 Pro. The app crashes during model initialization due to what appears to be a memory allocation issue.
Environment:
Device: iPhone 16 Pro
iOS Version: latest
MediaPipe Version: latest
Xcode Version: 16.1
Steps to Reproduce:
Model Preparation:
Use a large .task model file approximately 2.16 GB in size (e.g., Llama-3.2-1b-q8.task).
The model is downloaded at runtime and stored in the app's documents directory to avoid bundling it with the app.
Model Initialization Code:
Initialize the model using the following code snippet:
init(model: Model) throws {
let options = LlmInference.Options(modelPath: model.modelFileURL.path)
options.maxTokens = 512
inference = try LlmInference(options: options)
let sessionOptions = LlmInference.Session.Options()
sessionOptions.temperature = 0.2
sessionOptions.randomSeed = 2222
session = try LlmInference.Session(llmInference: inference, options: sessionOptions)
}
Run the App:
Launch the app on the iPhone 16 Pro.
The app attempts to initialize the model using the above code.
Expected Behavior:
The model should initialize successfully, allowing for on-device inference using MediaPipe's LLM capabilities.
Actual Behavior:
The app crashes with a std::bad_alloc exception during model initialization.
Here are the relevant logs and error messages:
Is there a recommended way to load large models using MediaPipe on iOS devices without exceeding memory limits?
Are there any best practices or techniques within MediaPipe or TensorFlow Lite to handle large models efficiently on mobile devices?
Can MediaPipe support loading models in a way that mitigates high memory consumption, such as streaming parts of the model or more efficient memory management during initialization?
The text was updated successfully, but these errors were encountered: