About this page

This is an API reference for ONNX in BentoML. Please refer to ONNX guide for more information about how to use ONNX in BentoML.

bentoml.onnx.save_model(name: Tag | str, model: onnx.ModelProto, *, signatures: dict[str, ModelSignatureDict] | dict[str, ModelSignature] | None = None, labels: dict[str, str] | None = None, custom_objects: dict[str, t.Any] | None = None, external_modules: t.List[ModuleType] | None = None, metadata: dict[str, t.Any] | None = None) bentoml.Model[source]#

Save a onnx model instance to the BentoML model store.

  • name (str) – The name to give to the model in the BentoML store. This must be a valid Tag name.

  • model (ModelProto) – The ONNX model to be saved.

  • signatures (dict[str, ModelSignatureDict], optional) – Signatures of predict methods to be used. If not provided, the signatures default to {"run": {"batchable": False}}. See ModelSignature for more details. bentoml.onnx internally use onnxruntime.InferenceSession to run inference. When the original model is converted to ONNX format and loaded by onnxruntime.InferenceSession, the inference method of the original model is converted to the run method of the onnxruntime.InferenceSession. signatures here refers to the predict method of onnxruntime.InferenceSession, hence the only allowed method name in signatures is run.

  • labels (dict[str, str], optional) – A default set of management labels to be associated with the model. An example is {"training-set": "data-1"}.

  • custom_objects (dict[str, Any], optional) –

    Custom objects to be saved with the model. An example is {"my-normalizer": normalizer}.

    Custom objects are currently serialized with cloudpickle, but this implementation is subject to change.

  • external_modules (List[ModuleType], optional, default to None) – user-defined additional python modules to be saved alongside the model or custom objects, e.g. a tokenizer module, preprocessor module, model configuration module

  • metadata (dict[str, Any], optional) –

    Metadata to be associated with the model. An example is {"bias": 4}.

    Metadata is intended for display in a model management UI and therefore must be a default Python type, such as str or int.


A BentoML model containing the saved ONNX model instance. store.

Return type:



import bentoml

import torch
import torch.nn as nn

class ExtendedModel(nn.Module):
    def __init__(self, D_in, H, D_out):
        # In the constructor we instantiate two nn.Linear modules and assign them as
        #  member variables.
        super(ExtendedModel, self).__init__()
        self.linear1 = nn.Linear(D_in, H)
        self.linear2 = nn.Linear(H, D_out)

    def forward(self, x, bias):
        # In the forward function we accept a Tensor of input data and an optional bias
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred + bias

N, D_in, H, D_out = 64, 1000, 100, 1
x = torch.randn(N, D_in)
model = ExtendedModel(D_in, H, D_out)

input_names = ["x", "bias"]
output_names = ["output1"]

tmpdir = "/tmp/model"
model_path = os.path.join(tmpdir, "test_torch.onnx")
    (x, torch.Tensor([1.0])),

bento_model = bentoml.onnx.save_model("onnx_model", model_path, signatures={"run": {"batchable": True}})
bentoml.onnx.load_model(bento_model: str | Tag | bentoml.Model, *, providers: ProvidersType | None = None, session_options: ort.SessionOptions | None = None) ort.InferenceSession[source]#

Load the onnx model with the given tag from the local BentoML model store.

  • bento_model (str | Tag | Model) – Either the tag of the model to get from the store, or a BentoML ~bentoml.Model instance to load the model from.

  • providers (List[Union[str, Tuple[str, Dict[str, Any]], optional, default to None) – Different providers provided by users. By default BentoML will use ["CPUExecutionProvider"] when loading a model.

  • session_options (onnxruntime.SessionOptions, optional, default to None) – SessionOptions per use case. If not specified, then default to None.


An instance of ONNX Runtime inference session created using ONNX model loaded from the model store.

Return type:



import bentoml
sess = bentoml.onnx.load_model("my_onnx_model")
bentoml.onnx.get(tag_like: str | Tag) Model[source]#

Get the BentoML model with the given tag.


tag_like – The tag of the model to retrieve from the model store.


A BentoML Model with the matching tag.

Return type:



import bentoml
# target model must be from the BentoML model store
model = bentoml.onnx.get("onnx_resnet50")