{"id":2054,"date":"2022-04-05T17:38:40","date_gmt":"2022-04-05T17:38:40","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/04\/05\/customize-the-amazon-sagemaker-xgboost-algorithm-container\/"},"modified":"2022-04-05T17:38:40","modified_gmt":"2022-04-05T17:38:40","slug":"customize-the-amazon-sagemaker-xgboost-algorithm-container","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/04\/05\/customize-the-amazon-sagemaker-xgboost-algorithm-container\/","title":{"rendered":"Customize the Amazon SageMaker XGBoost algorithm container"},"content":{"rendered":"<div id=\"\">\n<p>The built-in <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> XGBoost algorithm provides a managed container to run the popular <a href=\"https:\/\/xgboost.readthedocs.io\/en\/stable\/\" target=\"_blank\" rel=\"noopener noreferrer\">XGBoost<\/a> machine learning (ML) framework, with added convenience of supporting advanced training or inference features like distributed training, dataset sharding for large-scale datasets, <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/a-b-testing-ml-models-in-production-using-amazon-sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">A\/B model testing<\/a>, or <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/multi-model-endpoints.html\" target=\"_blank\" rel=\"noopener noreferrer\">multi-model inference<\/a> endpoints. You can also extend this powerful algorithm to accommodate different requirements.<\/p>\n<p>Packaging the code and dependencies in a single container is a convenient and robust approach for long-term code maintenance, reproducibility, and auditing purposes. Modifying the container directly follows the base container faithfully and avoids duplicating existing functions already supported by the base container. In this post, we review the inner workings of the SageMaker XGBoost algorithm container and provide pragmatic scripts to directly customize the container.<\/p>\n<h2>SageMaker XGBoost container structure<\/h2>\n<p>The SageMaker built-in XGBoost algorithm is packaged as a stand-alone container, <a href=\"https:\/\/github.com\/aws\/sagemaker-xgboost-container\" target=\"_blank\" rel=\"noopener noreferrer\">available on GitHub<\/a>, and can be extended under the developer-friendly Apache 2.0 open-source license. The container packages the <a href=\"https:\/\/xgboost.readthedocs.io\/en\/stable\/\" target=\"_blank\" rel=\"noopener noreferrer\">open-source XGBoost algorithm<\/a> and ancillary tools to run the algorithm in the SageMaker environment integrated with other AWS Cloud services. This allows you to train XGBoost models on a variety of <a href=\"https:\/\/sagemaker.readthedocs.io\/en\/stable\/api\/utility\/inputs.html\" target=\"_blank\" rel=\"noopener noreferrer\">data sources<\/a>, make <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/how-it-works-batch.html\" target=\"_blank\" rel=\"noopener noreferrer\">batch predictions<\/a> on offline data, or host an <a href=\"https:\/\/sagemaker.readthedocs.io\/en\/stable\/frameworks\/xgboost\/using_xgboost.html#deploy-open-source-xgboost-models\" target=\"_blank\" rel=\"noopener noreferrer\">inference endpoint<\/a> in a real-time <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/inference-pipeline-real-time.html\" target=\"_blank\" rel=\"noopener noreferrer\">pipeline<\/a>.<\/p>\n<p>The container supports training and inference operations with different entry points. For inference mode, the entry can be found in the main function in the <a href=\"https:\/\/github.com\/aws\/sagemaker-xgboost-container\/blob\/master\/src\/sagemaker_xgboost_container\/serving.py\" target=\"_blank\" rel=\"noopener noreferrer\">serving.py script<\/a>. For real-time inference serving, the container runs a <a href=\"https:\/\/flask.palletsprojects.com\/en\/2.0.x\/\" target=\"_blank\" rel=\"noopener noreferrer\">Flask<\/a>-based <a href=\"https:\/\/github.com\/aws\/sagemaker-containers\/blob\/master\/src\/sagemaker_containers\/_server.py\" target=\"_blank\" rel=\"noopener noreferrer\">web server<\/a> that when <a href=\"https:\/\/github.com\/aws\/sagemaker-xgboost-container\/blob\/master\/src\/sagemaker_xgboost_container\/algorithm_mode\/serve.py#L206\" target=\"_blank\" rel=\"noopener noreferrer\">invoked<\/a>, receives an HTTP-encoded request containing the data, decodes the data into the XGBoost\u2019s <a href=\"https:\/\/xgboost.readthedocs.io\/en\/latest\/python\/python_api.html#xgboost.DMatrix\" target=\"_blank\" rel=\"noopener noreferrer\">DMatrix<\/a> format, <a href=\"https:\/\/github.com\/aws\/sagemaker-xgboost-container\/blob\/master\/src\/sagemaker_xgboost_container\/algorithm_mode\/serve.py#L135%5D,%20%5Bmakes%20prediction%5D(https:\/\/github.com\/aws\/sagemaker-xgboost-container\/blob\/master\/src\/sagemaker_xgboost_container\/algorithm_mode\/serve.py#L76\" target=\"_blank\" rel=\"noopener noreferrer\">loads the model<\/a>, and returns an <a href=\"https:\/\/github.com\/aws\/sagemaker-xgboost-container\/blob\/master\/src\/sagemaker_xgboost_container\/algorithm_mode\/serve.py#L239\" target=\"_blank\" rel=\"noopener noreferrer\">HTTP-encoded response back<\/a>. These methods are encapsulated under the <a href=\"https:\/\/github.com\/aws\/sagemaker-xgboost-container\/blob\/master\/src\/sagemaker_xgboost_container\/algorithm_mode\/serve.py#L59\" target=\"_blank\" rel=\"noopener noreferrer\">ScoringService<\/a> class, which can also be customized through the script mode to a great extent (see the Appendix below).<\/p>\n<p>The entry point for training mode (algorithm mode) is the main function in the <a href=\"https:\/\/github.com\/aws\/sagemaker-xgboost-container\/blob\/master\/src\/sagemaker_xgboost_container\/training.py\" target=\"_blank\" rel=\"noopener noreferrer\">training.py<\/a>. The main function sets up the training environment and calls the training job function. It\u2019s flexible enough to allow for distributed or single-node training, or utilities like cross validation. The heart of the training process can be found in the <a href=\"https:\/\/github.com\/aws\/sagemaker-xgboost-container\/blob\/master\/src\/sagemaker_xgboost_container\/algorithm_mode\/train.py#L177\" target=\"_blank\" rel=\"noopener noreferrer\">train_job<\/a> function.<\/p>\n<p>Docker files packaging the container can be found in the <a href=\"https:\/\/github.com\/aws\/sagemaker-xgboost-container\/tree\/master\/docker\/1.3-1\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repo<\/a>. Note that the container is built in two steps: a <a href=\"https:\/\/github.com\/aws\/sagemaker-xgboost-container\/blob\/master\/docker\/1.3-1\/base\/Dockerfile.cpu\" target=\"_blank\" rel=\"noopener noreferrer\">base<\/a> container is built first, followed by the <a href=\"https:\/\/github.com\/aws\/sagemaker-xgboost-container\/blob\/master\/docker\/1.3-1\/final\/Dockerfile.cpu\" target=\"_blank\" rel=\"noopener noreferrer\">final<\/a> container on top.<\/p>\n<h2>Solution overview<\/h2>\n<p>You can modify and rebuild the container through the source code. However, this involves collecting and rebuilding all dependencies and packages from scratch. In this post, we discuss a more straightforward approach that modifies the container on top of the already-built and publicly-available SageMaker XGBoost algorithm container image directly.<\/p>\n<p>In this approach, we <a href=\"https:\/\/docs.docker.com\/engine\/reference\/commandline\/pull\/\" target=\"_blank\" rel=\"noopener noreferrer\">pull<\/a> a copy of the public SageMaker XGBoost image, modify the scripts or add packages, and rebuild the container on top. The modified container can be stored in a private repository. This way, we avoid rebuilding intermediary dependencies and instead build directly on top of the already-built libraries packaged in the official container.<\/p>\n<p>The following figure shows an overview of the script used to pull the public base image, modify and rebuild the image, and upload it to a private <a href=\"http:\/\/aws.amazon.com\/ecr\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Container Registry<\/a> (Amazon ECR) repository. The <a href=\"https:\/\/github.com\/aws-samples\/extending-sagemaker-xgboost-algorithm\/blob\/main\/src\/docker_build.sh\" target=\"_blank\" rel=\"noopener noreferrer\">bash script<\/a> in the accompanying code of this post performs all the workflow steps shown in the diagram. The accompanying <a href=\"https:\/\/github.com\/aws-samples\/extending-sagemaker-xgboost-algorithm\/blob\/main\/walkthrough.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">notebook<\/a> shows an example where the URI of a specific version of the SageMaker XGBoost algorithm is first retrieved and passed to the<a href=\"https:\/\/github.com\/aws-samples\/extending-sagemaker-xgboost-algorithm\/blob\/main\/src\/docker_build.sh\" target=\"_blank\" rel=\"noopener noreferrer\"> bash script<\/a>, which replaces two of the Python scripts in the image, rebuilds it, and pushes the modified image to a private Amazon ECR repository. You can modify the accompanying code to suit your needs.<\/p>\n<p>\u00ad<a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/18\/ML-2441-archdiag.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-34356\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/18\/ML-2441-archdiag.png\" alt=\"\" width=\"570\" height=\"529\"><\/a><\/p>\n<h2>Prerequisites<\/h2>\n<p>The <a href=\"https:\/\/github.com\/aws-samples\/extending-sagemaker-xgboost-algorithm\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repository<\/a> contains the code accompanying this post. You can run the <a href=\"https:\/\/github.com\/aws-samples\/extending-sagemaker-xgboost-algorithm\/blob\/main\/walkthrough.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">sample notebook<\/a> in your AWS account, or use the provided <a href=\"http:\/\/aws.amazon.com\/cloudformation\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CloudFormation<\/a> stack to deploy the notebook using a SageMaker notebook. You need the following prerequisites:<\/p>\n<ul>\n<li>An AWS account.<\/li>\n<li>Necessary permissions to run SageMaker batch transform and training jobs, and Amazon ECR privileges. The CloudFormation template creates sample <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) roles.<\/li>\n<\/ul>\n<h2>Deploy the solution<\/h2>\n<p>To create your solution resources using AWS CloudFormation, choose <strong>Launch Stack<\/strong>:<br \/><a href=\"https:\/\/console.aws.amazon.com\/cloudformation\/home?#\/stacks\/new?stackName=extending-xgboost-blogcfstack&amp;templateURL=https:\/\/aws-blogs-artifacts-public.s3.amazonaws.com\/artifacts\/ML-2441\/cft.yaml\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15948 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/16\/2-LaunchStack.jpg\" alt=\"\" width=\"107\" height=\"20\"><\/a><\/p>\n<p>The stack deploys a SageMaker notebook preconfigured to clone the GitHub repository. The walkthrough <a href=\"https:\/\/gitlab.aws.dev\/peymanr\/extending-xgboost\/-\/blob\/master\/walkthrough.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">notebook<\/a> includes the steps to pull the public SageMaker XGBoost image for a given version, modify it, and push the custom container to a private Amazon ECR repository. The notebook uses the public <a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/abalone\" target=\"_blank\" rel=\"noopener noreferrer\">Abalone dataset<\/a> as a sample, trains a model using the SageMaker XGBoost built-in training mode, and reuses this model in the custom image to perform batch transform jobs that produce inference together with SHAP values.<\/p>\n<h2>Conclusion<\/h2>\n<p>SageMaker built-in algorithms provide a variety of features and functionalities, and can be extended further under the Apache 2.0 open-source license. In this post, we reviewed how to extend the production built-in container for the SageMaker XGBoost algorithm to meet production requirements like backward code and API compatibility.<\/p>\n<p>The sample notebook and helper <a href=\"https:\/\/github.com\/aws-samples\/extending-sagemaker-xgboost-algorithm\/blob\/main\/src\/docker_build.sh\" target=\"_blank\" rel=\"noopener noreferrer\">scripts<\/a> provide a convenient starting point to customize SageMaker XGBoost container image the way you would like it. Give it a try!<\/p>\n<h2>Appendix: Script mode<\/h2>\n<p><a href=\"https:\/\/sagemaker-examples.readthedocs.io\/en\/latest\/sagemaker-script-mode\/sagemaker-script-mode.html\" target=\"_blank\" rel=\"noopener noreferrer\">Script mode<\/a> provides a way to modify many SageMaker built-in algorithms by providing an interface to replace the functions responsible for transforming the inputs and loading the model. Script mode isn\u2019t as flexible as directly modifying the container, but it provides a completely Python-based route to customize the built-in algorithm with no need to work directly with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Docker_(software)\" target=\"_blank\" rel=\"noopener noreferrer\">Docker<\/a>.<\/p>\n<p>In script mode, a <code>user-module<\/code> is provided to customize data decoding, loading of the model, and making predictions. The user module can define a <code>transformer_fn<\/code> that handles all aspects of processing the request to preparing the response. Or instead of defining <code>transformer_fn<\/code>, you can provide custom methods <code>model_fn<\/code>, <code>input_fn<\/code>, <code>predict_fn<\/code>, and <code>output_fn<\/code> individually to customize loading the model and decoding and preparing the input for prediction. For a more thorough overview of script mode, see <a href=\"https:\/\/sagemaker-examples.readthedocs.io\/en\/latest\/sagemaker-script-mode\/sagemaker-script-mode.html\" target=\"_blank\" rel=\"noopener noreferrer\">Bring Your Own Model with SageMaker Script Mode<\/a>.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-20750 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/01\/15\/Peyman-Razaghi.jpg\" alt=\"\" width=\"100\" height=\"133\"><strong>Peyman Razaghi<\/strong> is a Data Scientist at AWS. He holds a PhD in information theory from the University of Toronto and was a post-doctoral research scientist at the University of Southern California (USC), Los Angeles. Before joining AWS, Peyman was a staff systems engineer at Qualcomm contributing to a number of notable international telecommunication standards. He has authored several scientific research articles peer-reviewed in statistics and systems-engineering area, and enjoys parenting and road cycling outside work.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/customize-the-amazon-sagemaker-xgboost-algorithm-container\/<\/p>\n","protected":false},"author":0,"featured_media":2055,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2054"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=2054"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2054\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/2055"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=2054"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=2054"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=2054"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}