Skip to content

Commit

Permalink
Alias gguf tensors instead of copy
Browse files Browse the repository at this point in the history
Using `torch.as_tensor` we can alias the tensor rather than copy during
gguf file loading. This avoids duplicating the entire tensor contents
when tracing torch programs which substrantially decreases memory usage
on large models.

e.g. LLaMa 70b decreased memory allocation from 60+GB to 2 GB for
tensors.
  • Loading branch information
rsuderman committed Sep 4, 2024
1 parent 944e358 commit d8eddb0
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions sharktank/sharktank/types/gguf_interop/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,10 @@ def _externalize_tensor(
# Important: The annotation tag must be set on the actual leaf tensor
# which is stored in the root theta. This means that any shaping or
# data type massaging has to happen *before* annotating.
data_tensor = torch.tensor(data)
if logical_shape is not None:
data_tensor = data_tensor.reshape(logical_shape)
data_tensor = torch.as_tensor(data.reshape(logical_shape))
else:
data_tensor = torch.as_tensor(data)
ExternalTensorTrait(external_name=name, external_scope="").set(data_tensor)
return data_tensor

Expand Down

0 comments on commit d8eddb0

Please sign in to comment.