Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run dhat or similar memory tools on a native built version of the the browsermt marian-dev fork #898

Open
gregtatum opened this issue Oct 23, 2024 · 0 comments

Comments

@gregtatum
Copy link
Member

In Firefox the memory size of the inference engine is quite large in wasm. There aren't good memory tools to analyze the wasm. Instead, we should compile it natively, and analyze the memory there.

Some details from another document.

This worker has a copy of the models (~20-50mb) and a copy of the engine binary (in the range of a few mb). However, when the engine is running, the memory balloons to ~250mb of RSS, and ~450mb of reserved wasm heap memory. It's unclear without further analysis where exactly this memory is coming from, but my assumption is that the model gets copied to the ExpressionGraph class in Marian. Marian has its own allocator called a Workspace. The tensors are allocated there.

The work here would be to integrate dhat or some other memory tool into a build of marian-dev, and run one of our quantized models in it. This should tell us the call sites where memory is being allocated. We can also inlist some other Firefox platform experts who can help analyze things when we have something that is working through a taskfile command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant