BentoML Documentation¶

github_stars pypi_status actions_status documentation_status join_slack


BentoML is a Unified Inference Platform for deploying and scaling AI systems with any model, on any cloud.

What is BentoML¶

BentoML is a Unified Inference Platform for deploying and scaling AI models with production-grade reliability, all without the complexity of managing infrastructure. It enables your developers to build AI systems 10x faster with custom models, scale efficiently in your cloud, and maintain complete control over security and compliance.

_images/bentoml-inference-platform.png

To get started with BentoML:

How-tos¶

Build your custom AI APIs with BentoML.

Create online API Services

Deploy your AI application to production with one command.

Create Deployments

Configure fast autoscaling to achieve optimal performance.

Concurrency and autoscaling

Run model inference on GPUs with BentoML.

Work with GPUs

Develop with powerful cloud GPUs using your favorite IDE.

Develop with Codespaces

Load and serve your custom models with BentoML.

Load and manage models

Stay informed¶

The BentoML team uses the following channels to announce important updates like major product releases and share tutorials, case studies, as well as community news.

To receive release notifications, star and watch the BentoML project on GitHub. For release notes and detailed changelogs, see the Releases page.