Cloud deployment¶

BentoCloud is an Inference Management Platform and Compute Orchestration Engine built on top of BentoML’s open-source serving framework. It provides a complete stack for building fast and scalable AI systems with any model, on any cloud.

Why developers love BentoCloud:

Flexible Pythonic APIs for building inference APIs, batch jobs, and compound AI systems
Blazing fast cold start with a container infrastructure stack rebuilt for ML/AI workloads
Support for any ML frameworks and inference runtimes (vLLM, TensorRT, Triton, etc.)
Streamlined workflows across development, testing, deployment, monitoring, and CI/CD
Easy access to various GPUs like L4 and A100, in our cloud or yours

Here is the workflow of deploying your AI service to BentoCloud:

Log in to BentoCloud¶

Visit the BentoML website to sign up.
Install BentoML.
```
pip install bentoml
```

Log in to BentoCloud with the bentoml cloud login command. Follow the on-screen instructions to create a new API token.

$ bentoml cloud login

? How would you like to authenticate BentoML CLI? [Use arrows to move]
> Create a new API token with a web browser
  Paste an existing API token

Deploy your first model¶

Clone the Hello world example.

git clone https://github.com/bentoml/quickstart.git
cd quickstart

Deploy it to BentoCloud from the project directory. Optionally, use the -n flag to set a name.

bentoml deploy -n my-first-bento

Note

By default, this command packages all files under the directory from which it is executed. To exclude specific files or directories, define them in a .bentoignore file.

Sample output:

🍱 Built bento summarization:ngfnciv5g6nxonry
Successfully pushed Bento "summarization:ngfnciv5g6nxonry"
✅ Created deployment "my-first-bento" in cluster "google-cloud-us-central-1"
💻 View Dashboard: https://demo.cloud.bentoml.com/deployments/my-first-bento

The first Deployment might take a minute or two. Wait until it’s fully ready:

✅ Deployment "my-first-bento" is ready: https://demo.cloud.bentoml.com/deployments/my-first-bento

On the BentoCloud console, navigate to the Deployments page, and click your Deployment. Once it’s up and running, you can interact with it using the Form section on the Playground tab.

Call the Deployment endpoint¶

Retrieve the Deployment URL via CLI. Replace my-first-bento if you use another name.
```
bentoml deployment get my-first-bento -o json | jq ."endpoint_urls"
```
Note

Ensure jq is installed for processing JSON output.

Create a BentoML client to call the exposed endpoint. Replace the example URL with your Deployment’s URL:

import bentoml

client = bentoml.SyncHTTPClient("https://my-first-bento-e3c1c7db.mt-guc1.bentoml.ai")
result: str = client.summarize(
      text="Breaking News: In an astonishing turn of events, the small town of Willow Creek has been taken by storm as local resident Jerry Thompson's cat, Whiskers, performed what witnesses are calling a 'miraculous and gravity-defying leap.' Eyewitnesses report that Whiskers, an otherwise unremarkable tabby cat, jumped a record-breaking 20 feet into the air to catch a fly. The event, which took place in Thompson's backyard, is now being investigated by scientists for potential breaches in the laws of physics. Local authorities are considering a town festival to celebrate what is being hailed as 'The Leap of the Century.",
   )
print(result)

Update the Deployment¶

To apply changes to your code, modify it locally and update the Deployment on BentoCloud by running:

bentoml deployment update my-first-bento --bento ./project/directory

For more information, see Manage Deployments.

Configure scaling¶

The replica count defaults to 1. You can update the minimum and maximum replicas allowed for scaling:

bentoml deployment update my-first-bento --scaling-min 0 --scaling-max 3

Cleanup¶

To terminate this Deployment, click Stop in the top right corner of its details page or simply run:

bentoml deployment terminate my-first-bento

More resources¶

If you are a first-time user of BentoCloud, we recommend you read the following documents to get started: