- Introduction
- Backends
- Comparator
- Putting It All Together
- Enabling PyTorch
- Examples
- Python API Reference Documentation
The Polygraphy API consists broadly of two major components:
Backend
s and the Comparator
.
NOTE: To help you get started with the API, you can use the run
tool
with the --gen-script
option to auto-generate template scripts that use the Polygraphy API.
A Polygraphy backend provides an interface for a deep learning framework. Backends are comprised of two components: Loaders and Runners.
A Loader
is a functor or callable that loads or operates on models in some way.
Existing Loader
s can be composed for more advanced behaviors.
For example, we can implement a conversion like ONNX -> TensorRT Network -> TensorRT Engine
:
from polygraphy.backend.trt import EngineFromNetwork, NetworkFromOnnxPath
build_engine = EngineFromNetwork(NetworkFromOnnxPath("/path/to/model.onnx"))
build_engine
is a callable that will build a TensorRT engine.
Polygraphy also provides immediately evaluated functional variants of each Loader.
These use the same names, except snake_case
instead of PascalCase
, and expose the same APIs.
Using the functional loaders, the conversion above would be:
from polygraphy.backend.trt import engine_from_network, network_from_onnx_path
engine = engine_from_network(network_from_onnx_path("/path/to/model.onnx"))
engine
is a TensorRT engine as opposed to a callable.
A Runner
uses a loader to load a model and can then run inference (see BaseRunner
).
IMPORTANT: Runners may reuse their output buffers. Thus, if you need to save outputs from multiple inferences, you should
make a copy of the outputs with copy.deepcopy(outputs)
.
To use a runner, you just need to activate it, and then call infer()
.
Note that activating a runner can be very expensive, so you should minimize the
number of times you activate a runner - ideally do not do this more than once.
It is recommended to use a context manager to activate and deactivate the runner rather than calling the functions manually:
from polygraphy.backend.trt import TrtRunner
with TrtRunner(build_engine) as runner:
outputs = runner.infer(feed_dict={"input0": input_data})
Generally, you do not need to write custom runners unless you want to support a new backend.
In case you do, in the simplest case, you only need to implement two functions:
infer_impl
: Accepts a dictionary of numpy buffers, runs inference, and finally returns a dictionary containing the outputs.get_input_metadata_impl
: Returns aTensorMetadata
mapping input names to their shapes and data types. You may useNone
, negative numbers, or strings to indicate dynamic dimensions.
For more advanced runners, where some setup is required, you may also need to implement the activate_impl()
and deactivate_impl()
functions.
For example, in the TrtRunner
, engines are created in activate_impl()
and destroyed in deactivate_impl()
.
Importantly, the GPU is not used at all until these functions have been called (notice, for example,
that in the TrtRunner
, the CUDA runtime library is only loaded in the activate_impl()
function).
This allows the Comparator
to optionally provide each runner with exclusive access to the GPU, to prevent any interference between runners.
The Comparator
is used to run inference for runners, and then compare accuracy (see Comparator.py).
This process is divided into two phases:
-
Running inference:
run_results = Comparator.run(runners)
This function accepts a list of runners and returns a
RunResults
object (see Comparator.py) containing the inference outputs of each run. It also accepts an optionaldata_loader
argument to control the input data. If not provided, it will use the default data loader.Comparator.run()
continues until inputs from the data loader are exhausted. -
Comparing results:
Comparator.compare_accuracy(run_results)
This function accepts the results returned by
Comparator.run
and compares them between runners.
IMPORTANT: The Comparator is designed for scenarios where you need to compare a small number of inputs across multiple runners. It is not a good idea to use it to validate a model with an entire dataset! Instead, runners should be used directly for such cases (see the example).
A data loader is used by the Comparator
to load input data to feed to each runner
(for example, see Polygraphy's default data loader).
A data loader can be any generator or iterable that yields
a dictionary of input buffers. In the simplest case, this can just be a list
of dict
s.
In case you don't know details about the inputs ahead of time, you can access the input_metadata
property in your data loader, which will be set to an TensorMetadata
instance by the Comparator.
NOTE: Polygraphy provides a default DataLoader
class that uses numpy to generate random input buffers.
The input data can be bounded via parameters to the constructor.
Now that you know the basic components of Polygraphy, let's take a look at how they fit together.
In this example, we will write a script that:
- Builds a TensorRT engine from an ONNX model
- Bounds input values in the range
[0, 2]
- Runs inference using ONNX-Runtime and TensorRT
- Compares the results and checks that they match
from polygraphy.backend.onnxrt import OnnxrtRunner, SessionFromOnnx
from polygraphy.backend.trt import TrtRunner, EngineFromNetwork, NetworkFromOnnxPath
from polygraphy.comparator import Comparator, DataLoader
model_path = "/path/to/model.onnx"
build_onnxrt_session = SessionFromOnnx(model_path)
build_engine = EngineFromNetwork(NetworkFromOnnxPath(model_path))
runners = [
OnnxrtRunner(build_onnxrt_session),
TrtRunner(build_engine),
]
data_loader = DataLoader(val_range=(0, 2))
run_results = Comparator.run(runners, data_loader=data_loader)
assert bool(Comparator.compare_accuracy(run_results))
In order to enable PyTorch, you need to provide three things to the PytRunner
:
-
A model loader: In the simplest case, this can be a callable that returns a
torch.nn.Module
. -
input_metadata
: ATensorMetadata
describing the inputs of the model. This maps input names to their shapes and data types. As with other runners,None
may be used to indicate dynamic dimensions.NOTE: Other runners are able to automatically determine input metadata by inspecting the model definition, but because of the way PyTorch is implemented, it is difficult to write a generic function to determine model inputs from a
torch.nn.Module
. -
output_names
: A list of output names. This is used by theComparator
to matchPytRunner
outputs to those of other runners.
You can find complete code examples that use the Polygraphy Python API here.
For more details, see the Polygraphy Python API reference documentation.
To build the API documentation, first install required packages:
python -m pip install -r docs/requirements.txt
and then use the make
target to build docs:
make docs
The HTML documentation will be generated under build/docs
To view the docs, open build/docs/index.html
in a browser or HTML viewer.