The results from global tensors slicing are different from creating tensors for each inference #4227

OswaldoBornemann · 2024-10-29T06:18:28Z

So my case is that in the Llama TensorRT engine, I need to infer in a while loop.

At the first implementation, I created tensors in each iteration, which was found to be too slow because we needed to create tensors each time.

 def allocate_buffer(self, 
                     shape_dict=None, 
                     engine=None,
                     context=None):
     tensors = OrderedDict()
     
     for binding in range(engine.num_io_tensors):
         name = engine.get_tensor_name(binding)
         
         if shape_dict and name in shape_dict:
             shape = shape_dict[name]
         else:
             shape = context.get_tensor_shape(name)
         
         dtype = trt.nptype(engine.get_tensor_dtype(name))
         if engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
             context.set_input_shape(name, shape)
         tensor = torch.empty(tuple(shape), dtype=numpy_to_torch_dtype_dict[dtype]).to(device=self.device)
         tensors[name] = tensor
         
     return tensors

Then I tried to initialize the global tensors when initialising the model, like below.

  def initial_global_tensors(self):
      tensor_shapes = {
          "position": (1, 2000),
          "inputs_embeds": (1, 2000, 1792),
          "lm_logits": (1, 1, 8212),
      }
      
      for i in range(12):
          tensor_shapes[f"past_key_in{i}"] = (1, 16, 2000, 112)
          tensor_shapes[f"past_value_in{i}"] = (1, 16, 2000, 112)
          tensor_shapes[f"past_key{i}"] = (1, 16, 2000, 112)
          tensor_shapes[f"past_value{i}"] = (1, 16, 2000, 112)
      
      self.global_tensors = {
          name: torch.zeros(shape, dtype=torch.int64 if 'position' in name else torch.float32).to(device=self.device)
          for name, shape in tensor_shapes.items()
      }

Then I can slice the tensors in each iteration, based on the actual input shape

def set_shape(self, 
                        shape_dict=None, 
                        engine=None,
                        context=None):
    
    tensors = OrderedDict()
      
    for binding in range(engine.num_io_tensors):
        name = engine.get_tensor_name(binding)
        
        if shape_dict and name in shape_dict:
            shape = shape_dict[name]
        else:
            shape = context.get_tensor_shape(name)
        
        if engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
            context.set_input_shape(name, shape)
        
        original_tensor = self.global_tensors[name]
        
        sliced_tensor = original_tensor
        for dim, size in enumerate(shape):
            sliced_tensor = sliced_tensor.narrow(dim, 0, size)
        tensors[name] = sliced_tensor
        
    return tensors

However, I found the results from global tensors slicing differ from creating tensors for each iteration. And I have no ideas.

The text was updated successfully, but these errors were encountered:

OswaldoBornemann · 2024-10-30T08:17:02Z

I found the solution, we need to make sliced_tensor as contiguous.

OswaldoBornemann closed this as completed Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The results from global tensors slicing are different from creating tensors for each inference #4227

The results from global tensors slicing are different from creating tensors for each inference #4227

OswaldoBornemann commented Oct 29, 2024

OswaldoBornemann commented Oct 30, 2024

The results from global tensors slicing are different from creating tensors for each inference #4227

The results from global tensors slicing are different from creating tensors for each inference #4227

Comments

OswaldoBornemann commented Oct 29, 2024

OswaldoBornemann commented Oct 30, 2024