CatBoost is a machine learning algorithm that uses gradient boosting on decision trees. It is available as an open source library. To learn more about CatBoost, visit their documentation.

BentoML provides native support for CatBoost, and this guide provides an overview of how to use BentoML with CatBoost.

Saving a trained CatBoost model#

In this example, we will train a new model using UCI’s breast cancer dataset.

import bentoml

import catboost as cbt

from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()

X =
y =

model = cbt.CatBoostClassifier(

# train the model, y)

Use save_model to save the model instance to BentoML model store:

bento_model = bentoml.catboost.save_model("catboost_cancer_clf", model)

To verify that the saved learner can be loaded properly:

model = bentoml.catboost.load_model("catboost_cancer_clf:latest")

model.predict(cbt.Pool([[1.308e+01, 1.571e+01, 8.563e+01, 5.200e+02, 1.075e-01, 1.270e-01,
    4.568e-02, 3.110e-02, 1.967e-01, 6.811e-02, 1.852e-01, 7.477e-01,
    1.383e+00, 1.467e+01, 4.097e-03, 1.898e-02, 1.698e-02, 6.490e-03,
    1.678e-02, 2.425e-03, 1.450e+01, 2.049e+01, 9.609e+01, 6.305e+02,
    1.312e-01, 2.776e-01, 1.890e-01, 7.283e-02, 3.184e-01, 8.183e-02]]))

Building a Service using CatBoost#

See also

Building a Service: more information on creating a prediction service with BentoML.

import bentoml

import numpy as np

from import NumpyNdarray

runner = bentoml.catboost.get("catboost_cancer_clf:latest").to_runner()

svc = bentoml.Service("cancer_clf", runners=[runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
async def classify_cancer(input: np.ndarray) -> np.ndarray:
   # returns sentiment score of a given text
   res = await runner.predict.async_run(input)
   return res

When constructing a bentofile.yaml, there are two ways to include CatBoost as a dependency, via python or conda:

    - catboost
  - conda-forge
  - catboost

Using Runners#

See also

See concepts/runner:Using Runners doc for a general introduction to the Runner concept and its usage.

A CatBoost Runner can be created as follows:

runner = bentoml.catboost.get("model_name:model_version").to_runner() is generally a drop-in replacement for model.predict.

While a Pool can be passed to a CatBoost Runner, BentoML does not support adaptive batching for Pool objects.

To use adaptive batching feature from BentoML, we recommend our users to use either NumPy ndarray or Pandas DataFrame instead.


Currently staged_predict callback is not yet supported with bentoml.catboost.

Using GPU#

CatBoost Runners will automatically use task_type=GPU if a GPU is detected.

This behavior can be disabled using the BentoML configuration file:


   # resources can be configured at the top level
   resources: 0
   # or per runner

Adaptive batching#

See also

Adaptive Batching: a general introduction to adaptive batching in BentoML.

CatBoost’s model.predict supports taking batch input for inference. This is disabled by default, but can be enabled using the appropriate signature when saving your model.


BentoML does not currently support adaptive batching for Pool input. In order to enable batching, use either a NumPy ndarray or a Pandas DataFrame instead.

bento_model = bentoml.catboost.save_model(
 "catboost_cancer_clf", model, signatures={"predict": {"batchable": True}}