Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Using `torch.as_tensor` we can alias the tensor rather than copy during gguf file loading. This avoids duplicating the entire tensor contents when tracing torch programs which substrantially decreases memory usage on large models. e.g. LLaMa 70b decreased memory allocation from 60+GB to 2 GB for tensors.
- Loading branch information