BentoML SDK¶

Service decorator¶

bentoml.service(inner: type[T], /) → Service[T][source]¶

Mark a class as a BentoML service.

Parameters:

name – The name of the service. Defaults to the class name.
image – The image to use for the service.
description – A description of the service.
path_prefix – A URL path prefix to apply to all API endpoints of this service. For example, setting path_prefix="/v1" will make an endpoint /predict available at /v1/predict. This also applies to mounted ASGI applications and health check endpoints.
envs – Environment variables to set for the service.
labels – Labels to attach to the service.
cmd – A custom command to start the service.
**kwargs – Additional service configurations such as traffic, resources, workers, etc.

Example

@service(traffic={“timeout”: 60}) class InferenceService:

@api def predict(self, input: str) -> str:

return input

bentoml.runner_service(runner: Runner, **kwargs: Unpack) → Service[Any][source]¶: Make a service from a legacy Runner

bentoml.asgi_app(app: ASGIApp, *, path: str = '/', name: str | None = None) → t.Callable[[R], R][source]¶

Mount an ASGI app to the service.

Parameters:

app – The ASGI app to be mounted.
path – The path to mount the app.
name – The name of the app.

Service API¶

bentoml.api(func: t.Callable[t.Concatenate[t.Any, P], R]) → APIMethod[P, R][source]¶

bentoml.api(*, route: str | None = None, name: str | None = None, input_spec: type[IODescriptor] | None = None, output_spec: type[IODescriptor] | None = None, batchable: bool = False, batch_dim: int | tuple[int, int] = 0, max_batch_size: int = 100, max_latency_ms: int = 60000) → t.Callable[[t.Callable[t.Concatenate[t.Any, P], R]], APIMethod[P, R]]

Make a BentoML API method. This decorator can be used either with or without arguments.

Parameters:

func – The function to be wrapped.
route – The route of the API. e.g. “/predict”
name – The name of the API.
input_spec – The input spec of the API, should be a subclass of pydantic.BaseModel.
output_spec – The output spec of the API, should be a subclass of pydantic.BaseModel.
batchable – Whether the API is batchable.
batch_dim – The batch dimension of the API.
max_batch_size – The maximum batch size of the API.
max_latency_ms – The maximum latency of the API.

Note that when you enable batching, batch_dim can be a tuple or a single value.

For a tuple (input_dim, output_dim):
- input_dim: Determines along which dimension the input arrays should be batched (or stacked) together before sending them for processing. For example, if you are working with 2-D arrays and input_dim is set to 0, BentoML will stack the arrays along the first dimension. This means if you have two 2-D input arrays with dimensions 5x2 and 10x2, specifying an input_dim of 0 would combine these into a single 15x2 array for processing.
- output_dim: After the inference is done, the output array needs to be split back into the original batch sizes. The output_dim indicates along which dimension the output array should be split. In the example above, if the inference process returns a 15x2 array and output_dim is set to 0, BentoML will split this array back into the original sizes of 5x2 and 10x2, based on the recorded boundaries of the input batch. This ensures that each requester receives the correct portion of the output corresponding to their input.
If you specify a single value for batch_dim, this value will apply to both input_dim and output_dim. In other words, the same dimension is used for both batching inputs and splitting outputs.

bentoml.task(func: t.Callable[t.Concatenate[t.Any, P], R]) → APIMethod[P, R][source]¶

bentoml.task(*, route: str | None = None, name: str | None = None, input_spec: type[IODescriptor] | None = None, output_spec: type[IODescriptor] | None = None, batchable: bool = False, batch_dim: int | tuple[int, int] = 0, max_batch_size: int = 100, max_latency_ms: int = 60000) → t.Callable[[t.Callable[t.Concatenate[t.Any, P], R]], APIMethod[P, R]]

Mark a method as a BentoML async task. This decorator can be used either with or without arguments.

Parameters:

func – The function to be wrapped.
route – The route of the API. e.g. “/predict”
name – The name of the API.
input_spec – The input spec of the API, should be a subclass of pydantic.BaseModel.
output_spec – The output spec of the API, should be a subclass of pydantic.BaseModel.
batchable – Whether the API is batchable.
batch_dim – The batch dimension of the API.
max_batch_size – The maximum batch size of the API.
max_latency_ms – The maximum latency of the API.

bentoml.depends¶

bentoml.depends(*, url: str | None = None, deployment: str | None = None, cluster: str | None = None) → Dependency[None][source]¶

bentoml.depends(on: Service[T], *, url: str | None = None, deployment: str | None = None, cluster: str | None = None) → Dependency[T]

Create a dependency on other service or deployment

Parameters:

on – Service[T] | None: The service to depend on.
url – str | None: The URL of the service to depend on.
deployment – str | None: The deployment of the service to depend on.
cluster – str | None: The cluster of the service to depend on.

Examples:

@bentoml.service
class MyService:
    # depends on a service
    svc_a = bentoml.depends(SVC_A)
    # depends on a deployment
    svc_b = bentoml.depends(deployment="ci-iris")
    # depends on a remote service with url
    svc_c = bentoml.depends(url="http://192.168.1.1:3000")
    # For the latter two cases, the service can be given to provide more accurate types:
    svc_d = bentoml.depends(url="http://192.168.1.1:3000", on=SVC_D)

bentoml.validators¶

class bentoml.validators.PILImageEncoder[source]¶

Bases: object

decode(obj: bytes | t.BinaryIO | UploadFile | PILImage.Image) → t.Any[source]¶

encode(obj: PILImage.Image) → bytes[source]¶

class bentoml.validators.FileSchema(format: str = 'binary', content_type: str | None = None)[source]¶

Bases: object

content_type: str | None¶

decode(obj: bytes | BinaryIO | UploadFile | PurePath | str) → Any[source]¶

encode(obj: Path) → bytes[source]¶

format: str¶

class bentoml.validators.TensorSchema(format: TensorFormat, dtype: t.Optional[str] = None, shape: t.Optional[t.Tuple[int, ...]] = None)[source]¶

Bases: object

property dim: int | None¶

dtype: t.Optional[str]¶

encode(arr: TensorType, info: core_schema.SerializationInfo) → t.Any[source]¶

format: TensorFormat¶

property framework_dtype: Any¶

shape: t.Optional[t.Tuple[int, ...]]¶

validate(obj: Any) → Any[source]¶

class bentoml.validators.DataframeSchema(orient: str = 'records', columns=None)[source]¶

Bases: object

columns: tuple[str] | None¶

encode(df: pd.DataFrame, info: core_schema.SerializationInfo) → t.Any[source]¶

orient: str¶

validate(obj: t.Any) → pd.DataFrame[source]¶

class bentoml.validators.ContentType(content_type: str)[source]¶

Bases: BaseMetadata

content_type: str¶

class bentoml.validators.Shape(dimensions: tuple[int, ...])[source]¶

Bases: BaseMetadata

dimensions: tuple[int, ...]¶

class bentoml.validators.DType(dtype: str)[source]¶

Bases: BaseMetadata

dtype: str¶