Run dhat or similar memory tools on a native built version of the the browsermt marian-dev fork #898

gregtatum · 2024-10-23T21:19:30Z

In Firefox the memory size of the inference engine is quite large in wasm. There aren't good memory tools to analyze the wasm. Instead, we should compile it natively, and analyze the memory there.

Some details from another document.

This worker has a copy of the models (~20-50mb) and a copy of the engine binary (in the range of a few mb). However, when the engine is running, the memory balloons to ~250mb of RSS, and ~450mb of reserved wasm heap memory. It's unclear without further analysis where exactly this memory is coming from, but my assumption is that the model gets copied to the ExpressionGraph class in Marian. Marian has its own allocator called a Workspace. The tensors are allocated there.

The work here would be to integrate dhat or some other memory tool into a build of marian-dev, and run one of our quantized models in it. This should tell us the call sites where memory is being allocated. We can also inlist some other Firefox platform experts who can help analyze things when we have something that is working through a taskfile command.

gregtatum added the inference label Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run dhat or similar memory tools on a native built version of the the browsermt marian-dev fork #898

Run dhat or similar memory tools on a native built version of the the browsermt marian-dev fork #898

gregtatum commented Oct 23, 2024

Run dhat or similar memory tools on a native built version of the the browsermt marian-dev fork #898

Run dhat or similar memory tools on a native built version of the the browsermt marian-dev fork #898

Comments

gregtatum commented Oct 23, 2024