Unified Model Serving Framework#

BentoML is an open-source model serving library for building performant and scalable AI applications with Python. It comes with everything you need for serving optimization, model packaging, and production deployment.

Featured use cases#

vLLM inference

Deploy an LLM application using vLLM as the backend for high-throughput and memory-efficient inference.

ControlNet

Deploy a ControlNet application to influence image composition, adjust specific elements, and ensure spatial consistency.

Stable Diffusion XL Turbo

Deploy an image generation application capable of creating high-quality visuals with just a single inference step.

CLIP embeddings

Deploy a CLIP application to convert images and text into embeddings.

WhisperX: Speech recognition

Deploy a speech recognition application.

BLIP: Image captioning

Deploy an image captioning application.

Start your BentoML journey#

The BentoML documentation provides detailed guidance on the project with hands-on tutorials and examples. If you are a first-time user of BentoML, we recommend that you read the following documents in order:

Get started

Gain a basic understanding of the BentoML open-source framework, its workflow, installation, and a quickstart example.

Use cases

Create different BentoML projects for common machine learning scenarios, like large language models, image generation, embeddings, speech recognition, and more.

Guides

Dive into BentoML’s features and advanced use cases, including GPU support, clients, monitoring, and performance optimization.

BentoCloud

A fully managed platform for deploying and scaling BentoML in the cloud.

Stay informed#

The BentoML team uses the following channels to announce important updates like major product releases and share tutorials, case studies, as well as community news.

To receive release notifications, star and watch the BentoML project on GitHub. For release notes and detailed changelogs, see the Releases page.