Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
BentoML
Light Logo Dark Logo
BentoML

Get Started

  • Hello world
  • Adaptive batching
  • Model composition
  • Async task queues
  • Packaging for deployment
  • Cloud deployment

Learn by Examples

  • Overview
  • LLM inference: vLLM
  • Agent: Function calling
  • Agent: LangGraph
  • LLM safety: ShieldGemma
  • RAG: Document ingestion and search
  • Stable Diffusion XL Turbo
  • ComfyUI: Deploy workflows as APIs
  • ControlNet
  • MLflow
  • XGBoost

Build with BentoML

  • Create online API Services
  • Define input and output types
  • Load and manage models
  • Work with GPUs
  • Call an API endpoint
  • Parallelize requests handling
  • Define the runtime environment
  • Run distributed Services
  • Configure template arguments
  • Configure lifecycle hooks
  • Mount ASGI applications
  • Stream responses
  • Define a WebSocket endpoint
  • Add a UI with Gradio
  • Observability
    • Monitoring
    • Logging
    • Metrics
    • Tracing
  • Customize error responses
  • Test API endpoints

Scale with BentoCloud

  • Deployment
    • Create Deployments
    • Configure Deployments
    • Manage Deployments
    • Call Deployment endpoints
    • Create canary Deployments
    • Sandboxes
    • Batch inference jobs
    • Build CI/CD pipelines
  • Scaling
    • Concurrency and autoscaling
    • Scale across multiple regions with Gateways
  • Manage secrets
  • Manage API tokens
  • Develop with Codespaces
  • Administering
    • Manage users
    • Split staging and production environments
    • Bring Your Own Cloud
    • Configure standby instances

References

  • BentoML
    • Bento and model APIs
    • BentoML SDK
    • Bento build options
    • BentoML CLI
    • Client API
    • Framework APIs
      • Diffusers
      • ONNX
      • Scikit-Learn
      • Transformers
      • Flax
      • TensorFlow
      • TorchScript
      • XGBoost
      • Picklable Model
      • PyTorch
      • LightGBM
      • MLflow
      • CatBoost
      • fast.ai
      • EasyOCR
      • Keras
      • Ray
      • Detectron
    • Configurations
    • Batch inference
    • Exceptions
    • Container APIs
    • Types
  • BentoCloud
    • Deployment details
    • BentoCloud CLI
    • BentoCloud API
Back to top
Copyright © 2022-2026, bentoml.com
Made with Furo