Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The results from global tensors slicing are different from creating tensors for each inference #4227

Closed
OswaldoBornemann opened this issue Oct 29, 2024 · 1 comment

Comments

@OswaldoBornemann
Copy link

So my case is that in the Llama TensorRT engine, I need to infer in a while loop.

At the first implementation, I created tensors in each iteration, which was found to be too slow because we needed to create tensors each time.

 def allocate_buffer(self, 
                     shape_dict=None, 
                     engine=None,
                     context=None):
     tensors = OrderedDict()
     
     for binding in range(engine.num_io_tensors):
         name = engine.get_tensor_name(binding)
         
         if shape_dict and name in shape_dict:
             shape = shape_dict[name]
         else:
             shape = context.get_tensor_shape(name)
         
         dtype = trt.nptype(engine.get_tensor_dtype(name))
         if engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
             context.set_input_shape(name, shape)
         tensor = torch.empty(tuple(shape), dtype=numpy_to_torch_dtype_dict[dtype]).to(device=self.device)
         tensors[name] = tensor
         
     return tensors

Then I tried to initialize the global tensors when initialising the model, like below.

  def initial_global_tensors(self):
      tensor_shapes = {
          "position": (1, 2000),
          "inputs_embeds": (1, 2000, 1792),
          "lm_logits": (1, 1, 8212),
      }
      
      for i in range(12):
          tensor_shapes[f"past_key_in{i}"] = (1, 16, 2000, 112)
          tensor_shapes[f"past_value_in{i}"] = (1, 16, 2000, 112)
          tensor_shapes[f"past_key{i}"] = (1, 16, 2000, 112)
          tensor_shapes[f"past_value{i}"] = (1, 16, 2000, 112)
      
      self.global_tensors = {
          name: torch.zeros(shape, dtype=torch.int64 if 'position' in name else torch.float32).to(device=self.device)
          for name, shape in tensor_shapes.items()
      }

Then I can slice the tensors in each iteration, based on the actual input shape

def set_shape(self, 
                        shape_dict=None, 
                        engine=None,
                        context=None):
    
    tensors = OrderedDict()
      
    for binding in range(engine.num_io_tensors):
        name = engine.get_tensor_name(binding)
        
        if shape_dict and name in shape_dict:
            shape = shape_dict[name]
        else:
            shape = context.get_tensor_shape(name)
        
        if engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
            context.set_input_shape(name, shape)
        
        original_tensor = self.global_tensors[name]
        
        sliced_tensor = original_tensor
        for dim, size in enumerate(shape):
            sliced_tensor = sliced_tensor.narrow(dim, 0, size)
        tensors[name] = sliced_tensor
        
    return tensors

However, I found the results from global tensors slicing differ from creating tensors for each iteration. And I have no ideas.

@OswaldoBornemann
Copy link
Author

I found the solution, we need to make sliced_tensor as contiguous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant