Deploying Bento#

Deployment Overview#

BentoML is designed to provide a unified packaging format, for deploying AI applications via a wide range of serving patterns, including real-time inference API, offline batch inference, streaming inference, and custom integrations.

For online API use cases, here are the three most common cloud deployment solutions:

  • ☁️ Deploy to BentoCloud - Serverless cloud for AI, the best place to deploy and operate BentoML for AI teams. Sign up here for early access.

  • 🦄️ Deploy on Kubernetes with Yatai - Cloud-native AI deployment on Kubernetes, comes with advanced auto-scaling and CI/CD workflows. Requires professional DevOps team to maintain and operate.

  • 🚀 Fast Cloud Deployment with BentoCTL - Great for proof-of-concept deployments directly running on public cloud services (EC2, ECS, SageMaker, Lambda, GCP, etc). Requires working knowledge of Cloud Services and their limitations for AI-specific workloads.

Feature comparison across deployment options:


🍱 BentoCloud

Yatai on Kubernetes

Cloud Deployment with BentoCTL


✅ Fast auto-scaling optimized for AI

✅ Kubernetes-native with custom metrics

Only available on some Cloud Services, e.g. ECS, requires manual configurations


✅ Scaling at individual Model/Runner level

Not supported

Supported on AWS Lambda, GCP Functions with limitations on model size and access to GPU

GPU Support

Supported on EC2, AWS SageMaker, requires manual configurations


✅ Auto-generated dashboards for key metrics

Requires manual configurations

Requires manual configurations with cloud provider

Endpoint Security

✅ Access token management and authentication

Requires manual setup

Requires manual setup

UI and API

✅ Web UI dashboards, REST API, CLI command, and Python API

✅ CLI(kubectl) + k8s CRD resource definition

✅ CLI(bentoctl, terraform)


✅ Rich integrated API for programmatic access in CI/CD, support common GitOps and MLOps workflows

✅ Cloud-native design supporting Kubernetes CRD and GitOps workflow

✅ Native Terraform integration, easily customizable

Access control

✅ Flexible API token management and Role-based access control

Inherits Kubernetes’ account and RBAC mechanism, no model/bento/endpoint level access control

No access control besides basic cloud platform permissions such as creating/deleting resources

All three deployment solutions above rely on BentoML’s Docker containerization feature underneath. In order to ensure a smooth path to production with BentoML, it is important to understand the Bento specification, how to run inference with it, and how to build docker images from a Bento. This is not only useful for testing a Bento’s environment and lifecycle configurations, but also for building custom integrations with the BentoML eco-system.

Docker Containers#

Containerizing bentos as Docker images allows users to easily test out Bento’s environment and dependency configurations locally. Once Bentos are built and saved to the bento store, we can containerize saved bentos with the CLI command bentoml containerize.

Start the Docker engine. Verify using docker info.

$ docker info

Run bentoml list to view available bentos in the store.

$ bentoml list

Tag                               Size        Creation Time        Path
iris_classifier:ejwnswg5kw6qnuqj  803.01 KiB  2022-05-27 00:37:08  ~/bentoml/bentos/iris_classifier/ejwnswg5kw6qnuqj
iris_classifier:h4g6jmw5kc4ixuqj  644.45 KiB  2022-05-27 00:02:08  ~/bentoml/bentos/iris_classifier/h4g6jmw5kc4ixuqj

Run bentoml containerize to start the containerization process.

$ bentoml containerize iris_classifier:latest

INFO [cli] Building docker image for Bento(tag="iris_classifier:ejwnswg5kw6qnuqj")...
[+] Building 21.2s (20/20) FINISHED
INFO [cli] Successfully built docker image "iris_classifier:ejwnswg5kw6qnuqj"
For Mac with Apple Silicon

Specify the --platform to avoid potential compatibility issues with some Python libraries.

$ bentoml containerize --opt platform=linux/amd64 iris_classifier:latest

View the built Docker image:

$ docker images

REPOSITORY          TAG                 IMAGE ID       CREATED         SIZE
iris_classifier     ejwnswg5kw6qnuqj    669e3ce35013   1 minutes ago   1.12GB

Run the generated docker image:

$ docker run -p 3000:3000 iris_classifier:ejwnswg5kw6qnuqj

See also

Containerization with different container engines. goes into more details on our containerization process and how to use different container runtime.

Deploy with Yatai on Kubernetes#

Yatai helps ML teams to deploy large scale model serving workloads on Kubernetes. It standardizes BentoML deployment on Kubernetes, provides UI and APIs for managing all your ML models and deployments in one place, and enables advanced GitOps and CI/CD workflows.

Yatai is Kubernetes native, providing native CRD for managing BentoML deployments, and integrates well with other tools in the K8s eco-system.

To get started, get an API token from Yatai Web UI and login from your bentoml CLI command:

bentoml yatai login --api-token {YOUR_TOKEN_GOES_HERE} --endpoint

Push your local Bentos to yatai:

bentoml push iris_classifier:latest

Yatai is designed to be a cloud-native tool, providing For DevOps managing production model serving workloads along with other kubernetes resources, the best option is to use kubectl and directly create BentoDeployment objects in the cluster, which will be handled by the Yatai deployment CRD controller.

# my_deployment.yaml
kind: BentoDeployment
  name: demo
  bento_tag: iris_classifier:3oevmqfvnkvwvuqj
      cpu: 1000m
      cpu: 500m
kubectl apply -f my_deployment.yaml

Deploy with BentoControl#

bentoctl is a CLI tool for deploying Bentos to run on any cloud platform. It supports all major cloud providers, including AWS, Azure, Google Cloud, and many more.

Underneath, bentoctl is powered by Terraform. bentoctl adds required modifications to Bento or service configurations, and then generate terraform templates for the target deploy platform for easy deployment.

The bentoctl deployment workflow is optimized for CI/CD and GitOps. It is highly customizable, users can fine-tune all configurations provided by the cloud platform. It is also extensible, for users to define additional terraform templates to be attached to a deployment.

Here’s an example of using bentoctl for deploying to AWS Lambda. First, install the aws-lambda operator plugin:

bentoctl operator install aws-lambda

Initialize a bentoctl project. This enters an interactive mode asking users for related deployment configurations:

$ bentoctl init

Bentoctl Interactive Deployment Config Builder

deployment config generated to: deployment_config.yaml
✨ generated template files.
  - bentoctl.tfvars

Deployment config will be saved to ./deployment_config.yaml:

api_version: v1
name: quickstart
    name: aws-lambda
template: terraform
    region: us-west-1
    timeout: 10
    memory_size: 512

Now, we are ready to build the deployable artifacts required for this deployment. In most cases, this step will product a new docker image specific to the target deployment configuration:

bentoctl build -b iris_classifier:btzv5wfv665trhcu -f ./deployment_config.yaml

Next step, use terraform CLI command to apply the generated deployment configs to AWS. This will require user setting up AWS credentials on the environment.

$ terraform init
$ terraform apply -var-file=bentoctl.tfvars --auto-approve

base_url = ""
function_name = "quickstart-function"
image_tag = ""

Testing the endpoint deployed:

URL=$(terraform output -json | jq -r .base_url.value)classify
curl -i \
    --header "Content-Type: application/json" \
    --request POST \
    --data '[5.1, 3.5, 1.4, 0.2]' \

Learn More about BentoCTL#

Check out BentoCTL docs here.

Supported cloud platforms:

Deploy to BentoCloud#

BentoCloud is currently under private beta. Please contact us by scheduling a demo request here.