{"id":228,"date":"2020-09-15T02:08:29","date_gmt":"2020-09-15T02:08:29","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/09\/15\/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks\/"},"modified":"2020-09-15T02:08:29","modified_gmt":"2020-09-15T02:08:29","slug":"using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/09\/15\/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks\/","title":{"rendered":"Using the Amazon SageMaker Studio Image Build CLI to build container images from your Studio notebooks"},"content":{"rendered":"<div id=\"\">\n<p>The new Amazon SageMaker Studio Image Build convenience package allows data scientists and developers to easily build custom container images from your Studio notebooks via a <a target=\"_blank\" href=\"https:\/\/github.com\/aws-samples\/sagemaker-studio-image-build-cli\" rel=\"noopener noreferrer\">new CLI<\/a>. The new CLI eliminates the need to manually set up and connect to Docker build environments for building container images in Amazon SageMaker Studio.<\/p>\n<p>Amazon SageMaker Studio provides a fully integrated development environment for machine learning (ML). <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> offers a variety of <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/algos.html\" target=\"_blank\" rel=\"noopener noreferrer\">built-in algorithms<\/a>, <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/frameworks.html\" target=\"_blank\" rel=\"noopener noreferrer\">built-in frameworks<\/a>, and the flexibility to use any algorithm or framework by bringing your own container images. The Amazon SageMaker Studio Image Build CLI lets you build Amazon SageMaker-compatible Docker images directly from your Amazon SageMaker Studio environments. Prior to this feature, you could only build your Docker images from Amazon Studio notebooks by setting up and connecting to secondary Docker build environments.<\/p>\n<p>You can now easily create container images directly from Amazon SageMaker Studio by using the simple CLI. The CLI abstracts the previous need to set up a secondary build environment and allows you to focus and spend time on the ML problem you\u2019re trying to solve as opposed to creating workflows for Docker builds. The new CLI automatically sets up your reusable build environment that you interact with via high-level commands. You essentially tell the CLI to build your image, without having to worry about the underlying workflow orchestrated through the CLI, and the output is a link to your <a href=\"http:\/\/aws.amazon.com\/ecr\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Container Registry<\/a> (Amazon ECR) image location. The following diagram illustrates this architecture.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15812 size-full\" title=\"SageMaker Studio architecture\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/11\/1-Flowchart.jpg\" alt=\"\" width=\"900\" height=\"453\"><\/p>\n<p>The CLI uses the following underlying AWS services:<\/p>\n<ul>\n<li>\n<strong>Amazon S3 <\/strong>\u2013 The new CLI packages your Dockerfile and container code, along with a buildspec.yml file used by <a href=\"https:\/\/aws.amazon.com\/codebuild\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CodeBuild<\/a>, into a .zip file stored in <a href=\"https:\/\/aws.amazon.com\/s3\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service (Amazon S3<\/a>). By default, this file is automatically cleaned up following the build to avoid unnecessary storage charges.<\/li>\n<li>\n<strong>AWS CodeBuild <\/strong>\u2013 CodeBuild is a fully managed build environment that allows you to build Docker images using a transient build environment. CodeBuild is dependent on a buildspec.yml file that contains build commands and settings that it uses to run your build. The new CLI takes care of automatically generating this file. The CLI automatically kicks off the container build using the packaged files from Amazon S3. <a href=\"https:\/\/aws.amazon.com\/codebuild\/pricing\" target=\"_blank\" rel=\"noopener noreferrer\">CodeBuild pricing<\/a> is pay-as-you-go and based on build minutes and the build compute used. By default, the CLI uses <code>general1.small<\/code> compute.<\/li>\n<li>\n<strong>Amazon ECR <\/strong>\u2013 Built Docker images are tagged and pushed to Amazon ECR. Amazon SageMaker expects training and inference images to be stored in Amazon ECR, so after the image is successfully pushed to the repository, you\u2019re ready to go. The CLI returns a link to the URI of the image that you can include in your Amazon SageMaker training and hosting calls.<\/li>\n<\/ul>\n<p>Now that we\u2019ve outlined the underlying AWS services and benefits of using the new Amazon SageMaker Studio Image Build convenience package to abstract your container build environments, let\u2019s explore how to get started using the CLI!<\/p>\n<h2>Prerequisites<\/h2>\n<p>To use the CLI, we need to ensure the Amazon SageMaker execution role used by your Studio notebook environment (or another <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) role, if you prefer) has the required permissions to interact with the resources used by the CLI, including access to CodeBuild and Amazon ECR.<\/p>\n<p>Your role should have a trust policy with CodeBuild. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-json\">{\r\n  \"Version\": \"2012-10-17\",\r\n  \"Statement\": [\r\n    {\r\n      \"Effect\": \"Allow\",\r\n      \"Principal\": {\r\n        \"Service\": [\r\n          \"codebuild.amazonaws.com\"\r\n        ]\r\n      },\r\n      \"Action\": \"sts:AssumeRole\"\r\n    }\r\n  ]\r\n}\r\n<\/code><\/pre>\n<\/div>\n<p>You also need to make sure the appropriate permissions are included in your role to run the build in CodeBuild, create a repository in Amazon ECR, and push images to that repository. The following code is an example policy that you should modify as necessary to meet your needs and security requirements:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-json\">{\r\n    \"Version\": \"2012-10-17\",\r\n    \"Statement\": [\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n                \"codebuild:DeleteProject\",\r\n                \"codebuild:CreateProject\",\r\n                \"codebuild:BatchGetBuilds\",\r\n                \"codebuild:StartBuild\"\r\n            ],\r\n            \"Resource\": \"arn:aws:codebuild:*:*:project\/sagemaker-studio*\"\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": \"logs:CreateLogStream\",\r\n            \"Resource\": \"arn:aws:logs:*:*:log-group:\/aws\/codebuild\/sagemaker-studio*\"\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n                \"logs:GetLogEvents\",\r\n                \"logs:PutLogEvents\"\r\n            ],\r\n            \"Resource\": \"arn:aws:logs:*:*:log-group:\/aws\/codebuild\/sagemaker-studio*:log-stream:*\"\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": \"logs:CreateLogGroup\",\r\n            \"Resource\": \"*\"\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n                \"ecr:CreateRepository\",\r\n                \"ecr:BatchGetImage\",\r\n                \"ecr:CompleteLayerUpload\",\r\n                \"ecr:DescribeImages\",\r\n                \"ecr:DescribeRepositories\",\r\n                \"ecr:UploadLayerPart\",\r\n                \"ecr:ListImages\",\r\n                \"ecr:InitiateLayerUpload\",\r\n                \"ecr:BatchCheckLayerAvailability\",\r\n                \"ecr:PutImage\"\r\n            ],\r\n            \"Resource\": \"arn:aws:ecr:*:*:repository\/sagemaker-studio*\"\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": \"ecr:GetAuthorizationToken\",\r\n            \"Resource\": \"*\"\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n              \"s3:GetObject\",\r\n              \"s3:DeleteObject\",\r\n              \"s3:PutObject\"\r\n              ],\r\n            \"Resource\": \"arn:aws:s3:::sagemaker-*\/*\"\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n                \"s3:CreateBucket\"\r\n            ],\r\n            \"Resource\": \"arn:aws:s3:::sagemaker*\"\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n                \"iam:GetRole\",\r\n                \"iam:ListRoles\"\r\n            ],\r\n            \"Resource\": \"*\"\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": \"iam:PassRole\",\r\n            \"Resource\": \"arn:aws:iam::*:role\/*\",\r\n            \"Condition\": {\r\n                \"StringLikeIfExists\": {\r\n                    \"iam:PassedToService\": \"codebuild.amazonaws.com\"\r\n                }\r\n            }\r\n        }\r\n    ]\r\n}\r\n<\/code><\/pre>\n<\/div>\n<p>You must also install the package in your Studio notebook environment to be able use the convenience package. To install, simply use <code>pip install<\/code> within your notebook environment:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-bash\">!pip install sagemaker-studio-image-build<\/code><\/pre>\n<\/div>\n<h2>Using the CLI<\/h2>\n<p>After completing these prerequisites, you\u2019re ready to start taking advantage of the new CLI to easily build your custom bring-your-own Docker images from Amazon SageMaker Studio without worrying about the underlying setup and configuration of build services.<\/p>\n<p>To use the CLI, you can navigate to the directory containing your Dockerfile and enter the following code:<\/p>\n<p><code>sm-docker build .<\/code><\/p>\n<p>Alternatively, you can explicitly identify the path to your Dockerfile using the <code>--file<\/code> argument:<\/p>\n<p><code>sm-docker build . --file \/path\/to\/Dockerfile<\/code><\/p>\n<p>It\u2019s that simple! The command automatically logs build output to your notebook and returns the image URI of your Docker image. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-bash\">[Container] 2020\/07\/11 06:07:24 Phase complete: POST_BUILD State: SUCCEEDED\r\n[Container] 2020\/07\/11 06:07:24 Phase context status code:  Message:\r\nImage URI: &lt;account-id&gt;.dkr.ecr.us-east-1.amazonaws.com\/sagemaker-studio-&lt;studioID&gt;:default-&lt;hash&gt;\r\n<\/code><\/pre>\n<\/div>\n<p>The CLI takes care of the rest. Let\u2019s take a deeper look at what the CLI is actually doing. The following diagram illustrates this process.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15813 size-full\" title=\"SageMaker Studio CLI process\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/11\/2-Flowchart.jpg\" alt=\"\" width=\"900\" height=\"471\"><\/p>\n<p>The workflow contains the following steps:<\/p>\n<ol>\n<li>The CLI automatically zips the directory containing your Dockerfile, generates the buildspec for AWS CodeBuild, and adds the .zip package the final .zip file. By default, the final .zip package is put in the Amazon SageMaker default session S3 bucket. Alternatively, you can specify a custom bucket using the <code>--bucket<\/code> argument.<\/li>\n<li>After packaging your files for build, the CLI creates an ECR repository if one doesn\u2019t exist. By default, the ECR repository created has the naming convention of <code>sagemaker-studio-<\/code><em><code><span>&lt;studioID&gt;<\/span><\/code>. <\/em>The final step performed by the CLI is to create a temporary build project in CodeBuild and start the build, which builds your container image, tags it, and pushes it to the ECR repository.<\/li>\n<\/ol>\n<p>The great part about the CLI is you no longer have to set any of this up or worry about the underlying activities to easily build your container images from Amazon SageMaker Studio.<\/p>\n<p>You can also optionally customize your build environment by using supported arguments such as the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-bash\">--repository mynewrepo:1.0     &lt;== By default, the ECR repository uses the naming \r\n                                   sagemaker-studio-&lt;studio-domainid&gt;.  You can set \r\n                                   this parameter to push to an existing repository  \r\n                                   or create a new repository with your preferred \r\n                                   naming. The default tagging strategy uses *user-profile-name*.\r\n                                   This parameter can also be used to customize the \r\n                                   tagging strategy. \r\n                                   \r\n                                   Usage: sm-docker build . --repository mynewrepo:1.0\r\n                                   \r\n--role &lt;iam-role-name&gt;         &lt;== By default, the CLI uses the SageMaker Execution\r\n                                   Role for interacting with the AWS Services the CLI \r\n                                   uses (CodeBuild, ECR). You can optionally specify \r\n                                   an alternative role that has the required permissions\r\n                                   specified in the prerequisites \r\n                                   \r\n                                    Usage: sm-docker build .  --role build-cli-role\r\n                                    \r\n--bucket &lt;bucket-name&gt;.        &lt;== By default, the CLI uses the SageMaker default \r\n                                   session bucket for storing your packaged input \r\n                                   sent to CodeBuild.  You can optionally specify a\r\n                                   preferred S3 bucket to use. \r\n                                   \r\n                                   Usage: sm-docker build . --bucket codebuild-tmp-build\r\n                                   \r\n--no-logs                       &lt;== By default, the CLI will show the output logs of the\r\n                                    running CodeBuild build.  This is typically useful\r\n                                    in case you need to debug the build; however, you \r\n                                    can optionally set this argument to suppress log\r\n                                    output.\r\n                                    \r\n                                    Usage: sm-docker build . --no-logs\r\n<\/code><\/pre>\n<\/div>\n<h2>Changes from Amazon SageMaker classic notebooks<\/h2>\n<p>To help illustrate the changes required when moving from bring-your-own Amazon SageMaker example notebooks or your own custom developed notebooks, we\u2019ve provided two example notebooks showing the changes required to use the Amazon SageMaker Studio Image Build CLI:<\/p>\n<ul>\n<li>The <a href=\"https:\/\/github.com\/awslabs\/amazon-sagemaker-examples\/tree\/master\/aws_sagemaker_studio\/sagemaker_studio_image_build\/tensorflow_bring_your_own\" target=\"_blank\" rel=\"noopener noreferrer\">TensorFlow Bring Your Own<\/a> example notebook is based on the existing <a href=\"https:\/\/github.com\/awslabs\/amazon-sagemaker-examples\/tree\/master\/advanced_functionality\/tensorflow_bring_your_own\" target=\"_blank\" rel=\"noopener noreferrer\">TensorFlow Bring Your Own<\/a> and adapted to use the new CLI with Amazon SageMaker Studio.<\/li>\n<li>The <a href=\"https:\/\/github.com\/awslabs\/amazon-sagemaker-examples\/blob\/master\/aws_sagemaker_studio\/sagemaker_studio_image_build\/xgboost_bring_your_own\/Batch_Transform_BYO_XGB.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">BYO XGBoost notebook<\/a> demonstrates a typical data science user flow of data exploration and feature engineering, model training using a custom XGBoost container built using the CLI, and using Amazon SageMaker batch transform for offline or batch inference.<\/li>\n<\/ul>\n<p>The key change required to adapt your existing notebooks to use the new CLI in Amazon SageMaker Studio removes the need for the <code>build_and_push.sh<\/code> script in your directory structure. The <code>build_and_push.sh<\/code> script used in classic notebook instances is used to build your Docker image and push it to Amazon ECR, which is now replaced by the new CLI for Studio. The following image compares the directory structures.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15814 size-full\" title=\"Directory structures\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/11\/3-Directory-structures.jpg\" alt=\"\" width=\"900\" height=\"413\"><\/p>\n<h2>Summary<\/h2>\n<p>This post discussed how you can simplify the build of your Docker images from Amazon SageMaker Studio by using the new Amazon SageMaker Studio Image Build CLI convenience package. It abstracts the setup of your Docker build environments by automatically setting up the underlying services and workflow necessary for building Docker images. This package allows you to interact with an abstracted build environment through simple CLI commands in Amazon SageMaker Studio so you can focus on building models! For more information, see the <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-studio-image-build-cli\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repo<\/a>.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><img decoding=\"async\" class=\"alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2019\/11\/25\/shelbees-100.jpg\"><\/p>\n<p><strong>Shelbee Eigenbrode<\/strong> is a solutions architect at Amazon Web Services (AWS). Her current areas of depth include DevOps combined with machine learning and artificial intelligence. She\u2019s been in technology for 22 years, spanning multiple roles and technologies. In her spare time she enjoys reading, spending time with her family, friends and her fur family (aka. dogs).<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><img decoding=\"async\" class=\"alignleft\" src=\"https:\/\/aws-ml-blog.s3.amazonaws.com\/images\/sagemaker-notebooks\/jaipreet-singh-100.jpg\"><\/p>\n<p><strong>Jaipreet Singh<\/strong> is a Senior Software Engineer on the Amazon SageMaker Studio team. He has been working on Amazon SageMaker since its inception in 2017 and has contributed to various Project Jupyter open-source projects. In his spare time, he enjoys hiking and skiing in the PNW.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-15818 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/11\/SamLiu.jpg\" alt=\"\" width=\"101\" height=\"121\"><strong>Sam Liu<\/strong> is a product manager at Amazon Web Services (AWS). His current focus is the infrastructure and tooling of machine learning and artificial intelligence. Beyond that, he has 10 years of experience building machine learning applications in various industries. In his spare time, he enjoys making short videos for technical education or animal protection.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-15884 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/12\/stefan-natu.jpg\" alt=\"\" width=\"100\" height=\"113\">Stefan Natu<\/strong> is a Sr. Machine Learning Specialist at Amazon Web Services. He is focused on helping financial services customers build and operationalize end-to-end machine learning solutions on AWS. His academic background is in theoretical physics, and in the past, he worked on a number of data science problems in retail and energy verticals. In his spare time, he enjoys reading machine learning blogs, traveling, playing the guitar, and exploring the food scene in New York City.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks\/<\/p>\n","protected":false},"author":0,"featured_media":229,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/228"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=228"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/228\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/229"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=228"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=228"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}