BentoML SDK

Service decorator

bentoml.service(inner: type[T], /) Service[T][source]
bentoml.service(inner: None = None, /, *, image: Image | None = None, envs: list[dict[str, Any]] | None = None, **kwargs: Unpack) _ServiceDecorator

Mark a class as a BentoML service.

Example

@service(traffic={“timeout”: 60}) class InferenceService:

@api def predict(self, input: str) -> str:

return input

bentoml.runner_service(runner: Runner, **kwargs: Unpack) Service[Any][source]

Make a service from a legacy Runner

bentoml.mount_asgi_app(app: ASGIApp, *, path: str = '/', name: str | None = None) Callable[[T], T][source]

Service API

bentoml.api(func: t.Callable[t.Concatenate[t.Any, P], R]) APIMethod[P, R][source]
bentoml.api(*, route: str | None = None, name: str | None = None, input_spec: type[IODescriptor] | None = None, output_spec: type[IODescriptor] | None = None, batchable: bool = False, batch_dim: int | tuple[int, int] = 0, max_batch_size: int = 100, max_latency_ms: int = 60000) t.Callable[[t.Callable[t.Concatenate[t.Any, P], R]], APIMethod[P, R]]

Make a BentoML API method. This decorator can be used either with or without arguments.

Parameters:
  • func – The function to be wrapped.

  • route – The route of the API. e.g. “/predict”

  • name – The name of the API.

  • input_spec – The input spec of the API, should be a subclass of pydantic.BaseModel.

  • output_spec – The output spec of the API, should be a subclass of pydantic.BaseModel.

  • batchable – Whether the API is batchable.

  • batch_dim – The batch dimension of the API.

  • max_batch_size – The maximum batch size of the API.

  • max_latency_ms – The maximum latency of the API.

Note that when you enable batching, batch_dim can be a tuple or a single value.

  • For a tuple (input_dim, output_dim):

    • input_dim: Determines along which dimension the input arrays should be batched (or stacked) together before sending them for processing. For example, if you are working with 2-D arrays and input_dim is set to 0, BentoML will stack the arrays along the first dimension. This means if you have two 2-D input arrays with dimensions 5x2 and 10x2, specifying an input_dim of 0 would combine these into a single 15x2 array for processing.

    • output_dim: After the inference is done, the output array needs to be split back into the original batch sizes. The output_dim indicates along which dimension the output array should be split. In the example above, if the inference process returns a 15x2 array and output_dim is set to 0, BentoML will split this array back into the original sizes of 5x2 and 10x2, based on the recorded boundaries of the input batch. This ensures that each requester receives the correct portion of the output corresponding to their input.

  • If you specify a single value for batch_dim, this value will apply to both input_dim and output_dim. In other words, the same dimension is used for both batching inputs and splitting outputs.

Image illustration of batch_dim

This image illustrates the concept of batch_dim in the context of processing 2-D arrays.

../../_images/batch-dim-example.png

On the left side, there are two 2-D arrays of size 5x2, represented by blue and green boxes. The arrows show two different paths that these arrays can take depending on the batch_dim configuration:

  • The top path has batch_dim=(0,0). This means that batching occurs along the first dimension (the number of rows). The two arrays are stacked on top of each other, resulting in a new combined array of size 10x2, which is sent for inference. After inference, the result is split back into two separate 5x2 arrays.

  • The bottom path has batch_dim=(1,1). This implies that batching occurs along the second dimension (the number of columns). The two arrays are concatenated side by side, forming a larger array of size 5x4, which is processed by the model. After inference, the output array is split back into the original dimensions, resulting in two separate 5x2 arrays.

bentoml.task(func: t.Callable[t.Concatenate[t.Any, P], R]) APIMethod[P, R][source]
bentoml.task(*, route: str | None = None, name: str | None = None, input_spec: type[IODescriptor] | None = None, output_spec: type[IODescriptor] | None = None, batchable: bool = False, batch_dim: int | tuple[int, int] = 0, max_batch_size: int = 100, max_latency_ms: int = 60000) t.Callable[[t.Callable[t.Concatenate[t.Any, P], R]], APIMethod[P, R]]

Mark a method as a BentoML async task. This decorator can be used either with or without arguments.

Parameters:
  • func – The function to be wrapped.

  • route – The route of the API. e.g. “/predict”

  • name – The name of the API.

  • input_spec – The input spec of the API, should be a subclass of pydantic.BaseModel.

  • output_spec – The output spec of the API, should be a subclass of pydantic.BaseModel.

  • batchable – Whether the API is batchable.

  • batch_dim – The batch dimension of the API.

  • max_batch_size – The maximum batch size of the API.

  • max_latency_ms – The maximum latency of the API.

bentoml.depends

bentoml.depends(*, url: str | None = None, deployment: str | None = None, cluster: str | None = None) Dependency[None][source]
bentoml.depends(on: Service[T], *, url: str | None = None, deployment: str | None = None, cluster: str | None = None) Dependency[T]

Create a dependency on other service or deployment

Parameters:
  • on – Service[T] | None: The service to depend on.

  • url – str | None: The URL of the service to depend on.

  • deployment – str | None: The deployment of the service to depend on.

  • cluster – str | None: The cluster of the service to depend on.

Examples:

@bentoml.service
class MyService:
    # depends on a service
    svc_a = bentoml.depends(SVC_A)
    # depends on a deployment
    svc_b = bentoml.depends(deployment="ci-iris")
    # depends on a remote service with url
    svc_c = bentoml.depends(url="http://192.168.1.1:3000")
    # For the latter two cases, the service can be given to provide more accurate types:
    svc_d = bentoml.depends(url="http://192.168.1.1:3000", on=SVC_D)

bentoml.validators

class bentoml.validators.PILImageEncoder[source]

Bases: object

decode(obj: bytes | t.BinaryIO | UploadFile | PILImage.Image) t.Any[source]
encode(obj: PILImage.Image) bytes[source]
class bentoml.validators.FileSchema(format: str = 'binary', content_type: str | None = None)[source]

Bases: object

content_type: str | None
decode(obj: bytes | BinaryIO | UploadFile | PurePath | str) Any[source]
encode(obj: Path) bytes[source]
format: str
class bentoml.validators.TensorSchema(format: TensorFormat, dtype: t.Optional[str] = None, shape: t.Optional[t.Tuple[int, ...]] = None)[source]

Bases: object

property dim: int | None
dtype: t.Optional[str]
encode(arr: TensorType, info: core_schema.SerializationInfo) t.Any[source]
format: TensorFormat
property framework_dtype: Any
shape: t.Optional[t.Tuple[int, ...]]
validate(obj: Any) Any[source]
class bentoml.validators.DataframeSchema(orient: str = 'records', columns=None)[source]

Bases: object

columns: tuple[str] | None
encode(df: pd.DataFrame, info: core_schema.SerializationInfo) t.Any[source]
orient: str
validate(obj: t.Any) pd.DataFrame[source]
class bentoml.validators.ContentType(content_type: str)[source]

Bases: BaseMetadata

content_type: str
class bentoml.validators.Shape(dimensions: tuple[int, ...])[source]

Bases: BaseMetadata

dimensions: tuple[int, ...]
class bentoml.validators.DType(dtype: str)[source]

Bases: BaseMetadata

dtype: str