{"id":943,"date":"2021-09-25T06:46:45","date_gmt":"2021-09-25T06:46:45","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/09\/25\/customize-amazon-sagemaker-studio-using-lifecycle-configurations\/"},"modified":"2021-09-25T06:46:45","modified_gmt":"2021-09-25T06:46:45","slug":"customize-amazon-sagemaker-studio-using-lifecycle-configurations","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/09\/25\/customize-amazon-sagemaker-studio-using-lifecycle-configurations\/","title":{"rendered":"Customize Amazon SageMaker Studio using Lifecycle Configurations"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/studio.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Studio<\/a> is a web-based, integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. It provides all the tools you need to take your models from experimentation to production while boosting your productivity. You can write code, track experiments, visualize data, and perform debugging and monitoring within a single, integrated visual interface.<\/p>\n<p>We\u2019re excited to announce Lifecycle Configuration for Studio, a new capability that enables developers to automate customization for your Studio development environments.<\/p>\n<p>Lifecycle configurations are shell scripts triggered by Studio lifecycle events, such as starting a new Studio notebook. You can use these shell scripts to automate customization for your Studio environments, such as installing JupyterLab extensions, preloading datasets, and setting up source code repositories.<\/p>\n<p>Previously, customizations to Studio environments were possible, but you needed to reapply them manually every time apps were deleted or recreated. Lifecycle configuration provides a way to automatically and repeatably apply your customizations.<\/p>\n<p>In this post, we show you how to use lifecycle configurations for three common customization use cases:<\/p>\n<ul>\n<li>Installing custom packages<\/li>\n<li>Configuring auto-shutdown of inactive notebook apps<\/li>\n<li>Setting up Git configuration<\/li>\n<\/ul>\n<p>For more examples, visit the <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-studio-lifecycle-config-examples\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker Studio Lifecycle Configuration Samples repository on GitHub<\/a>.<\/p>\n<h2>Install custom packages on base kernel images<\/h2>\n<p>One common use case for lifecycle configuration is to install custom libraries so they\u2019re available right away whenever you start a new kernel app. Lifecycle configuration allows you to automate this process without the need to build a custom Studio image.<\/p>\n<p>Say that you need to install <a href=\"https:\/\/arrow.apache.org\/docs\/python\/parquet.html\" target=\"_blank\" rel=\"noopener noreferrer\">pyarrow<\/a> in your notebook environment so that you can work with a Parquet-formatted training dataset for your ML model. Let\u2019s see how to use lifecycle configuration to automate the installation of this dependency in the kernel.<\/p>\n<p>The following is the typical workflow for using lifecycle configuration in your apps:<\/p>\n<ol>\n<li>Write the script.<\/li>\n<li>Convert the script to a base64 encoded string.<\/li>\n<li>Create a lifecycle configuration entity via the <a href=\"http:\/\/aws.amazon.com\/cli\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Command Line Interface<\/a> (AWS CLI).<\/li>\n<li>Associate the lifecycle configuration to a domain or user profile.<\/li>\n<li>Start the Studio app with the specified lifecycle configuration.<\/li>\n<\/ol>\n<h3>Write the script<\/h3>\n<p>The following sample script installs pyarrow using the pip package manager. You can modify this script to install the dependencies you need for your own notebooks:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-bash\"># This script installs a single pip package on a SageMaker Studio Kernel Application\n#!\/bin\/bash\n\nset -eux\n\n# PARAMETERS\nPACKAGE=pyarrow\n\npip install --upgrade $PACKAGE\n<\/code><\/pre>\n<\/p><\/div>\n<p>One helpful practice when creating and debugging your own scripts is to use <code>set -eux<\/code>, which helps you to see in the logs where a failure occurred. It writes the commands line by line while it\u2019s running, and stops the script right away when there is a failure.<\/p>\n<p>Let\u2019s save the preceding script as a file called <code>install-package.sh<\/code>.<\/p>\n<h3>Convert the script to a base64 encoded string<\/h3>\n<p>When creating the lifecycle config, we pass the script contents as a base64 encoded string. This requirement prevents errors due to the encoding of spacing and line breaks. Here\u2019s how you can base64 encode the contents of the file we just created. In a terminal, use the following command:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-bash\">LCC_CONTENT=`openssl base64 -A -in install-package.sh`<\/code><\/pre>\n<\/p><\/div>\n<p>The base64-encoded script content is now saved in the <code>LCC_CONTENT<\/code> variable.<\/p>\n<h3>Create a lifecycle configuration entity via the AWS CLI<\/h3>\n<p>Now we can create the lifecycle configuration entity using the AWS CLI, specifying the base64-encoded content saved in the <code>LCC_CONTENT<\/code> variable as the lifecycle configuration content.<\/p>\n<p>At this point, you need to determine whether the lifecycle configuration should belong to the JupyterServer or KernelGateway app type:<\/p>\n<ul>\n<li><strong>JupyterServer<\/strong> \u2013 Enables access to the visual interface for Studio<\/li>\n<li><strong>KernelGateway<\/strong> \u2013 Enables access to the code run environment and kernels for your Studio notebooks and terminals<\/li>\n<\/ul>\n<p>In this case, because we want to customize the kernel environment that the notebook code runs in by installing additional custom packages, we should specify <code>KernelGateway<\/code> for the lifecycle configuration app type. In the following code, we name the created entity <code>install-pip-package-on-kernel<\/code>, but you are free to use your own:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">aws sagemaker create-studio-lifecycle-config \n--studio-lifecycle-config-name install-pip-package-on-kernel \n--studio-lifecycle-config-content $LCC_CONTENT \n--studio-lifecycle-config-app-type KernelGateway\n<\/code><\/pre>\n<\/p><\/div>\n<p>After you create the lifecycle configuration, note the lifecycle configuration ARN returned in the response:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-json\">{\n    \"StudioLifecycleConfigArn\": \"arn:aws:sagemaker:us-east-2:123456789012:studio-lifecycle-config\/install-pip-package-on-kernel\"\n}\n<\/code><\/pre>\n<\/p><\/div>\n<p>Keep in mind that the lifecycle configuration entity is immutable. If you need to update a lifecycle configuration entity, you should instead create a new lifecycle configuration entity, update apps to use the new lifecycle configuration entity, and delete the old lifecycle configuration entity.<\/p>\n<h3>Associate the lifecycle configuration to a domain or user profile<\/h3>\n<p>Before you can use the lifecycle configuration, you need to associate it with a Studio domain or user profile. The set of lifecycle configurations specified in the domain or user profile settings determines which lifecycle configurations are available for the domain or user profile to use. You can add lifecycle configurations to either existing or new domains and user profiles. Note that lifecycle configurations attached to a domain are inherited by all users of a domain, but those attached to a user are scoped specifically to that user.<\/p>\n<p>You can use the AWS CLI to create a user profile that can use our new lifecycle configuration. Because this lifecycle configuration is associated with the <code>KernelGateway<\/code> app type, we add it to the list of lifecycle config ARNs under <code>KernelGatewayAppSettings<\/code>. Make sure to replace the domain ID in the following script. You can find your domain ID in the Studio Control Panel under <strong>Studio Summary<\/strong>.<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">aws sagemaker create-user-profile --domain-id d-abc123 \n--user-profile-name my-new-user \n--user-settings '{\n\"KernelGatewayAppSettings\": {\n  \"LifecycleConfigArns\":\n    [\"arn:aws:sagemaker:us-east-2:123456789012:studio-lifecycle-config\/install-pip-package-on-kernel\"]\n  }\n}'\n<\/code><\/pre>\n<\/p><\/div>\n<p>Alternatively, you can update an existing user profile to add the lifecycle configuration:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">aws sagemaker update-user-profile --domain-id d-abc123 \n--user-profile-name my-existing-user \n--user-settings '{\n\"KernelGatewayAppSettings\": {\n  \"LifecycleConfigArns\":\n    [\"arn:aws:sagemaker:us-east-2:123456789012:studio-lifecycle-config\/install-pip-package-on-kernel\"]\n  }\n}'\n<\/code><\/pre>\n<\/p><\/div>\n<h3>Start the app<\/h3>\n<p>After you add the lifecycle configuration to the domain or user, in the Studio user interface, go to the Launcher where you create new notebooks. Next to the image selection option (<strong>Select a SageMaker image<\/strong>), you can see the <strong>Select a start-up script<\/strong> option.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28244\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/16\/1-5600-start-the-app.jpg\" alt=\"\" width=\"800\" height=\"381\"><\/p>\n<p>Choose the script from the available ones for your user or domain on the drop-down menu.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28245\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/16\/2-5600-drop-down-menu.jpg\" alt=\"\" width=\"800\" height=\"409\"><\/p>\n<p>You can verify the contents of the script after you choose it.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28246\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/16\/3-5600-choose.jpg\" alt=\"\" width=\"800\" height=\"412\"><\/p>\n<p>The new notebook now uses the specified script.<\/p>\n<h2>Configure auto-shutdown of inactive kernels<\/h2>\n<p>Let\u2019s say you\u2019re an administrator for a Studio domain, and want to save costs by having notebook apps shut down automatically after long periods of inactivity. You can create a lifecycle configuration on that installs the <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-studio-auto-shutdown-extension\" target=\"_blank\" rel=\"noopener noreferrer\">Studio auto-shutdown JupyterLab extension<\/a> by default on users\u2019 JupyterServer apps, so users don\u2019t have to install it manually, and it stays enabled even if the JupyterServer app gets restarted.<\/p>\n<p>The following <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-studio-lifecycle-config-examples\/blob\/main\/scripts\/install-autoshutdown-extension\/on-jupyter-server-start.sh\" target=\"_blank\" rel=\"noopener noreferrer\">script from the Studio lifecycle configuration example scripts repository (install-autoshutdown-extension)<\/a> installs the extension:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-bash\"># Installs SageMaker Studio's Auto Shutdown Idle Kernel Sessions extension\n#!\/bin\/bash\n\nset -eux\n\nsudo yum -y install wget\nwget https:\/\/github.com\/aws-samples\/sagemaker-studio-auto-shutdown-extension\/raw\/main\/sagemaker_studio_autoshutdown-0.1.1.tar.gz\npip install sagemaker_studio_autoshutdown-0.1.1.tar.gz\njlpm config set cache-folder \/tmp\/yarncache\njupyter lab build --debug --minimize=False\n\n# restarts jupyter server\nnohup supervisorctl -c \/etc\/supervisor\/conf.d\/supervisord.conf restart jupyterlabserver\n<\/code><\/pre>\n<\/p><\/div>\n<p>Because we\u2019re customizing the JupyterServer app by installing a JupyterLab extension, this lifecycle configuration should be associated with the <code>JupyterServer<\/code> app type when creating the SageMaker entity:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-bash\">LCC_CONTENT=`openssl base64 -A -in install-autoshutdown.sh`  # install-autoshutdown.sh is a file with the above script contents\n\naws sagemaker create-studio-lifecycle-config \n--studio-lifecycle-config-name install-autoshutdown-extension \n--studio-lifecycle-config-content $LCC_CONTENT \n--studio-lifecycle-config-app-type JupyterServer\n<\/code><\/pre>\n<\/p><\/div>\n<p>We want to make this the default lifecycle configuration for all users in the domain. We can accomplish this by adding the lifecycle configuration to the default settings of the domain using the <code>DefaultResourceSpec<\/code> settings. This way, the script runs by default whenever users in the domain log in to Studio for the first time or restart Studio:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-pytho\">aws sagemaker update-domain --domain-id d-abc123 \n--default-user-settings '{\n\"JupyterServerAppSettings\": {\n  \"DefaultResourceSpec\": {\n    \"LifecycleConfigArn\": \"arn:aws:sagemaker:us-east-2:123456789012:studio-lifecycle-config\/install-autoshutdown-extension\",\n    \"InstanceType\": \"system\"\n   },\n   \"LifecycleConfigArns\": [\n     \"arn:aws:sagemaker:us-east-2:123456789012:studio-lifecycle-config\/install-autoshutdown-extension\"\n   ]\n}}'\n<\/code><\/pre>\n<\/p><\/div>\n<p>For per-user overrides, you can specify a default lifecycle configuration in the user profile, which overrides any specified for the domain.<\/p>\n<h2>Set up Git configuration<\/h2>\n<p>Developers often store their code or notebooks in version-controlled Git repositories to collaborate with others. Typically, this requires developers to configure user information or credentials in their development environment.<\/p>\n<p>For example, before making any Git commits from your Studio environment, you want to configure the email and user name that is associated with commits. Typically, the commands look like the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">git config --global user.email \"you@example.com\"\ngit config --global user.name \"Your Name\"\n<\/code><\/pre>\n<\/p><\/div>\n<p>To have these settings persisted every time the Jupyter Server restarts, you can use the following <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-studio-lifecycle-config-examples\/blob\/main\/scripts\/set-git-config\/on-jupyter-server-start.sh\" target=\"_blank\" rel=\"noopener noreferrer\">script (set-git-config) from the example scripts repository<\/a> (make sure you modify it to add your own user name and email) as a <code>JupyterServer<\/code> lifecycle configuration script for your user profile.<\/p>\n<p>Another frequent use case is set up Git credentials for authentication to remote repositories. Although you can use the <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) execution role used by Studio to automatically authenticate to <a href=\"https:\/\/aws.amazon.com\/codecommit\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CodeCommit<\/a> repositories, developers may also need to manually set up a password or developer token to connect to other repository sources such as GitHub.<\/p>\n<p>The following <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-studio-lifecycle-config-examples\/blob\/main\/scripts\/set-git-credentials\/on-jupyter-server-start.sh\" target=\"_blank\" rel=\"noopener noreferrer\">script (set-git-credentials) from the example scripts repository<\/a> shows you how to set up a workflow that retrieves a password or developer token from <a href=\"https:\/\/aws.amazon.com\/secrets-manager\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Secrets Manager<\/a> directly when authenticating to your remote repository. Storing passwords and tokens in Secrets Manager eliminates the need to store any sensitive information on the <a href=\"https:\/\/aws.amazon.com\/efs\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic File System<\/a> (Amazon EFS) storage instance backing your Studio domain. Make sure that you modify the script with your own user name, secret name and key, and Region.<\/p>\n<p>Similar to the previous example, to set up the lifecycle configuration, you complete the following steps:<\/p>\n<ol>\n<li>Base64 encode the script.<\/li>\n<li>Create a lifecycle configuration entity. For git configuration scripts, use <code>JupyterServer<\/code> as the app type.<\/li>\n<li>Attach it to the Studio entity that you want to make the lifecycle configuration available for use with. Because these scripts set up user-specific credentials, attach these at the user profile level.<\/li>\n<\/ol>\n<p>When you specify a default lifecycle configuration in the user profile, it overrides any default JupyterServer lifecycle configuration specified at the domain level. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">aws sagemaker update-user-profile --domain-id d-abc123 \n--user-profile-name my-user \n--user-settings '{\n\"JupyterServerAppSettings\": {\n  \"DefaultResourceSpec\": {\n    \"LifecycleConfigArn\": \"arn:aws:sagemaker:us-east-2:123456789012:studio-lifecycle-config\/set-git-config\",\n    \"InstanceType\": \"system\"\n  },\n  \"LifecycleConfigArns\": [\n     \"arn:aws:sagemaker:us-east-2:123456789012:studio-lifecycle-config\/set-git-config\"\n   ]\n}}'\n<\/code><\/pre>\n<\/p><\/div>\n<h2>Conclusion<\/h2>\n<p>In this post, we highlighted three use cases that represent common automation tasks for developers and data scientists, but you can find more examples of what you can do with lifecycle configurations in the <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-studio-lifecycle-config-examples\" target=\"_blank\" rel=\"noopener noreferrer\">public repository of notebook lifecycle configuration scripts<\/a>.<\/p>\n<p>You can start using lifecycle configuration for Studio today, in all Regions where Studio is available.<\/p>\n<p>For more information, see the following resources:<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-16952 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/09\/angandy.jpg\" alt=\"\" width=\"100\" height=\"134\"> Andrew Ang<\/strong> is a Deep Learning Architect at the Amazon ML Solutions Lab, where he helps AWS customers identify and build AI\/ML solutions to address their business problems.<\/p>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"alignleft size-full wp-image-2367\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2017\/11\/15\/sumit_thakur_100.jpg\" alt=\"\" width=\"100\" height=\"134\">Sumit Thakur<\/strong> is a Senior Product Manager for Amazon Machine Learning where he loves working on products that make it easy for customers to get started with machine learning on cloud. In his spare time, he likes connecting with nature and watching sci-fi TV series.<\/p>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-28445 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/23\/Ram-Vegiraju.jpg\" alt=\"\" width=\"99\" height=\"115\">Ram Vegiraju<\/strong> is a ML Architect with the SageMaker Service team. He focuses on helping customers build and optimize their AI\/ML solutions on Amazon SageMaker. In his spare time, he loves traveling and writing.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/10\/Rama-Thamman.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-18205 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/10\/Rama-Thamman.jpg\" alt=\"\" width=\"100\" height=\"127\"><\/a>Rama Thamman<\/strong> is a Software Development Manager with the AI Platforms team, leading the ML Migrations team.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/customize-amazon-sagemaker-studio-using-lifecycle-configurations\/<\/p>\n","protected":false},"author":0,"featured_media":944,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/943"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=943"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/943\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/944"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=943"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=943"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=943"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}