XGBoost¶

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.

This document explains how to serve and deploy an XGBoost model for predicting breast cancer with BentoML. You can find all the source code in examples/xgboost.

Prerequisites¶

Install dependencies¶

pip install xgboost bentoml scikit-learn

Train and save a model¶

This example uses the scikit-learn framework to load and preprocess the breast cancer dataset, which is then converted into an XGBoost-compatible format (DMatrix) to train the machine learning model.

import typing as t
from sklearn.datasets import load_breast_cancer
from sklearn.utils import Bunch
import xgboost as xgb

# Load the data
cancer: Bunch = t.cast("Bunch", load_breast_cancer())
cancer_data = t.cast("ext.NpNDArray", cancer.data)
cancer_target = t.cast("ext.NpNDArray", cancer.target)
dt = xgb.DMatrix(cancer_data, label=cancer_target)

# Specify model parameters
param = {
    "max_depth": 3,
    "eta": 0.3,
    "objective": "multi:softprob",
    "num_class": 2
}

# Train the model
model = xgb.train(param, dt)

After training, use the bentoml.xgboost.save_model API to save the model to the BentoML Load and manage models, a local directory to store and manage models. You can retrieve this model later in other services to run predictions.

import bentoml

# Specify the model name and the model to be saved
bentoml.xgboost.save_model("cancer", model)

To verify that the model has been successfully saved, run:

$ bentoml models list

Tag                      Module           Size       Creation Time
cancer:xa2npbboccvv7u4c  bentoml.xgboost  23.17 KiB  2024-06-19 07:51:21

Test the saved model¶

To ensure that the saved model works correctly, try loading it and running a prediction:

import bentoml
import xgboost as xgb

# Load the model by setting the model tag
booster = bentoml.xgboost.load_model("cancer:xa2npbboccvv7u4c")

# Predict using a sample
res = booster.predict(xgb.DMatrix([[1.308e+01, 1.571e+01, 8.563e+01, 5.200e+02, 1.075e-01, 1.270e-01,
    4.568e-02, 3.110e-02, 1.967e-01, 6.811e-02, 1.852e-01, 7.477e-01,
    1.383e+00, 1.467e+01, 4.097e-03, 1.898e-02, 1.698e-02, 6.490e-03,
    1.678e-02, 2.425e-03, 1.450e+01, 2.049e+01, 9.609e+01, 6.305e+02,
    1.312e-01, 2.776e-01, 1.890e-01, 7.283e-02, 3.184e-01, 8.183e-02]]))

print(res)

Expected result:

[[0.02664177 0.9733583 ]] # The probability of the sample belonging to class 0 and class 1

Create a BentoML Service¶

Create a separate service.py file where you define a BentoML Service to expose the model as a web service.

import bentoml
import numpy as np
import xgboost as xgb
import os

@bentoml.service(
    resources={"cpu": "2"},
    traffic={"timeout": 10},
)
class CancerClassifier:
    # Retrieve the latest version of the model from the BentoML Model Store
    bento_model = bentoml.models.get("cancer:latest")

    def __init__(self):
        self.model = bentoml.xgboost.load_model(self.bento_model)

        # Check resource availability
        if os.getenv("CUDA_VISIBLE_DEVICES") not in (None, "", "-1"):
            self.model.set_param({"predictor": "gpu_predictor", "gpu_id": 0})  # type: ignore (incomplete XGBoost types)
        else:
            nthreads = os.getenv("OMP_NUM_THREADS")
            if nthreads:
                nthreads = max(int(nthreads), 1)
            else:
                nthreads = 1
            self.model.set_param(
                {"predictor": "cpu_predictor", "nthread": nthreads}
            )

    @bentoml.api
    def predict(self, data: np.ndarray) -> np.ndarray:
        return self.model.predict(xgb.DMatrix(data))

The Service code:

  • Uses the @bentoml.service decorator to define a BentoML Service. Optionally, you can set additional configurations like resource allocation and traffic timeout.

  • Retrieves the model from the Model Store and defines it a class variable.

  • Checks resource availability like GPUs and the number of threads.

  • Uses the @bentoml.api decorator to expose the predict function as an API endpoint, which takes a NumPy array as input and returns a NumPy array. Note that the input data is converted into a DMatrix, which is the data structure XGBoost uses for datasets.

Run bentoml serve in your project directory to start the Service.

$ bentoml serve service:CancerClassifier

2024-06-19T08:37:31+0000 [WARNING] [cli] Converting 'CancerClassifier' to lowercase: 'cancerclassifier'.
2024-06-19T08:37:31+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:CancerClassifier" listening on http://localhost:3000 (Press CTRL+C to quit)

The server is active at http://localhost:3000. You can interact with it in different ways.

curl -X 'POST' \
    'http://localhost:3000/predict' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
    "data": [
        [1.308e+01, 1.571e+01, 8.563e+01, 5.200e+02, 1.075e-01, 1.270e-01,
        4.568e-02, 3.110e-02, 1.967e-01, 6.811e-02, 1.852e-01, 7.477e-01,
        1.383e+00, 1.467e+01, 4.097e-03, 1.898e-02, 1.698e-02, 6.490e-03,
        1.678e-02, 2.425e-03, 1.450e+01, 2.049e+01, 9.609e+01, 6.305e+02,
        1.312e-01, 2.776e-01, 1.890e-01, 7.283e-02, 3.184e-01, 8.183e-02]
      ]
    }'
import bentoml

with bentoml.SyncHTTPClient("http://localhost:3000") as client:
    result = client.predict(
        data=[
            [1.308e+01, 1.571e+01, 8.563e+01, 5.200e+02, 1.075e-01, 1.270e-01,
            4.568e-02, 3.110e-02, 1.967e-01, 6.811e-02, 1.852e-01, 7.477e-01,
            1.383e+00, 1.467e+01, 4.097e-03, 1.898e-02, 1.698e-02, 6.490e-03,
            1.678e-02, 2.425e-03, 1.450e+01, 2.049e+01, 9.609e+01, 6.305e+02,
            1.312e-01, 2.776e-01, 1.890e-01, 7.283e-02, 3.184e-01, 8.183e-02]
        ],
    )
    print(result)

Visit http://localhost:3000, scroll down to Service APIs, specify the data, and click Execute.

../_static/img/use-cases/custom-models/xgboost/service-ui.png

Deploy to BentoCloud¶

After the Service is ready, you can deploy it to BentoCloud for better management and scalability. Sign up for a BentoCloud account and get $10 in free credits.

First, specify a configuration YAML file (bentofile.yaml) to define the build options for a Bento, the unified distribution format in BentoML containing source code, Python packages, model references, and so on. Here is an example file:

service: "service:CancerClassifier"
labels:
  owner: bentoml-team
  stage: demo
include:
  - "*.py"
python:
  packages:
    - xgboost
    - scikit-learn

Log in to BentoCloud by running bentoml cloud login, then run the following command to deploy the project.

bentoml deploy .

Once the Deployment is up and running on BentoCloud, you can access it via the exposed URL.

../_static/img/use-cases/custom-models/xgboost/bentocloud-ui.png

Note

For custom deployment in your own infrastructure, use BentoML to generate an OCI-compliant image.