Unified AI Application Framework#
BentoML is a framework for building reliable, scalable and cost-efficient AI applications. It comes with everything you need for model serving, application packaging, and production deployment.
Featured use cases#
Deploy an LLM application using vLLM as the backend for high-throughput and memory-efficient inference.
Deploy a ControlNet application to influence image composition, adjust specific elements, and ensure spatial consistency.
Deploy an image generation application capable of creating high-quality visuals with just a single inference step.
Start your BentoML journey#
The BentoML documentation provides detailed guidance on the project with hands-on tutorials and examples. If you are a first-time user of BentoML, we recommend that you read the following documents in order:
Gain a basic understanding of the BentoML open-source framework, its workflow, installation, and a quickstart example.
Create different BentoML projects for common machine learning scenarios, like large language models, image generation, embeddings, speech recognition, and more.
Dive into BentoML’s features and advanced use cases, including GPU support, clients, monitoring, and performance optimization.
The BentoML team uses the following channels to announce important updates like major product releases and share tutorials, case studies, as well as community news.