{"id":652,"date":"2020-12-02T17:45:59","date_gmt":"2020-12-02T17:45:59","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/12\/02\/private-package-installation-in-amazon-sagemaker-running-in-internet-free-mode\/"},"modified":"2020-12-02T17:45:59","modified_gmt":"2020-12-02T17:45:59","slug":"private-package-installation-in-amazon-sagemaker-running-in-internet-free-mode","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/12\/02\/private-package-installation-in-amazon-sagemaker-running-in-internet-free-mode\/","title":{"rendered":"Private package installation in Amazon SageMaker running in internet-free mode"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/studio.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Studio<\/a> notebooks and <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> notebook instances are internet-enabled by default. However, many regulated industries, such as financial industries, healthcare, telecommunications, and others, require that network traffic traverses their own <a href=\"http:\/\/aws.amazon.com\/vpc\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Virtual Private Cloud<\/a> (Amazon VPC) to restrict and control which traffic can go through public internet. Although you can disable direct internet access to Sagemaker Studio notebooks and notebook instances, you need to ensure that your data scientists can still gain access to popular packages. Therefore, you may choose to build your own isolated dev environments that contain your choice of packages and kernels.<\/p>\n<p>In this post, we learn how to set up such an environment for Amazon SageMaker notebook instances and SageMaker Studio. We also describe how to integrate this environment with <a href=\"https:\/\/aws.amazon.com\/codeartifact\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CodeArtifact<\/a>, which is a fully managed artifact repository that makes it easy for organizations of any size to securely store, publish, and share software packages used in your software development process.<\/p>\n<h2>Solution overview<\/h2>\n<p>In this post, we cover the following steps:<strong>\u00a0<\/strong><\/p>\n<ol>\n<li>Set up the Amazon SageMaker for internet-free mode.<\/li>\n<li>Set up the Conda repository using <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3). You create a bucket that hosts your Conda channels.<\/li>\n<li>Set up the Python Package Index (PyPI) repository using CodeArtifact. You create a repository and set up <a href=\"https:\/\/aws.amazon.com\/privatelink\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS PrivateLink<\/a> endpoints for CodeArtifact.<\/li>\n<li>Build an isolated dev environment with Amazon SageMaker notebook instances. In this step, you use the lifecycle configuration feature to build a custom Conda environment and configure your PyPI client.<\/li>\n<li>Install packages in SageMaker Studio notebooks. In this last step, you can <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/studio-byoi-create.html\" target=\"_blank\" rel=\"noopener noreferrer\">create a custom Amazon SageMaker image<\/a> and install the packages through Conda or pip client.<\/li>\n<\/ol>\n<h2>Setting up Amazon SageMaker for internet-free mode<\/h2>\n<p>We assume that you have already set up a VPC that lets you provision a private, isolated section of the AWS Cloud where you can launch AWS resources in a virtual network. You use it to host Amazon SageMaker and other components of your data science environment. For more information about building secure environments or well-architected pillars, see the following whitepaper, <a href=\"https:\/\/d1.awsstatic.com\/whitepapers\/architecture\/wellarchitected-Financial-Services-Industry-Lens.pdf?did=wp_card&amp;trk=wp_card\" target=\"_blank\" rel=\"noopener noreferrer\">Financial Services Industry Lens: AWS Well-Architected Framework<\/a>.<\/p>\n<h3>Creating an Amazon SageMaker notebook instance<\/h3>\n<p>You can disable internet access for Amazon SageMaker notebooks, and also associate them to your secure VPC environment, which allows you to apply network-level control, such as access to resources through security groups, or to control ingress and egress traffic of data.<\/p>\n<ol>\n<li>On the Amazon SageMaker console, choose <strong>Notebook instances<\/strong> in the navigation pane.<\/li>\n<li>Choose <strong>Create notebook instance<\/strong>.<\/li>\n<li>For <strong>IAM<\/strong> role, choose your role.<\/li>\n<li>For <strong>VPC<\/strong>, choose your VPC<strong>.<\/strong>\n<\/li>\n<li>For <strong>Subnet<\/strong>, choose your subnet.<\/li>\n<li>For <strong>Security groups(s)<\/strong>, choose your security group.<\/li>\n<li>For <strong>Direct internet access<\/strong>, select <strong>Disable \u2014 use VPC only<\/strong>.<\/li>\n<li>Choose <strong>Create notebook instance<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18963\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/25\/Private-package-installation-1.jpg\" alt=\"\" width=\"800\" height=\"550\"><\/p>\n<ol start=\"9\">\n<li>Connect to your notebook instance from your VPC instead of connecting over the public internet.<\/li>\n<\/ol>\n<p>Amazon SageMaker notebook instances support VPC <a href=\"https:\/\/docs.aws.amazon.com\/AmazonVPC\/latest\/UserGuide\/vpce-interface.html\" target=\"_blank\" rel=\"noopener noreferrer\">interface endpoints<\/a>. When you use a VPC interface endpoint, communication between your VPC and the notebook instance is conducted entirely and securely within the AWS network instead of the public internet. For instructions, see <a href=\"https:\/\/docs.aws.amazon.com\/AmazonVPC\/latest\/UserGuide\/vpce-interface.html#create-interface-endpoint\" target=\"_blank\" rel=\"noopener noreferrer\">Creating an interface endpoint<\/a>.<\/p>\n<h3>Setting up SageMaker Studio<\/h3>\n<p>Similar to Amazon SageMaker notebook instances, you can <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/securing-amazon-sagemaker-studio-connectivity-using-a-private-vpc\/\" target=\"_blank\" rel=\"noopener noreferrer\">launch SageMaker Studio in a VPC<\/a> of your choice, and also disable direct internet access to add an additional layer of security.<\/p>\n<ol>\n<li>On the Amazon SageMaker console, choose <strong>Amazon SageMaker Studio<\/strong> in the navigation pane.<\/li>\n<li>Choose <strong>Standard setup.<\/strong>\n<\/li>\n<li>To disable direct internet access, in the <strong>Network<\/strong> section, select the <strong>VPC only<\/strong> network access type for when you onboard to Studio or call the <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_CreateDomain.html\" target=\"_blank\" rel=\"noopener noreferrer\">CreateDomain<\/a> API.<\/li>\n<\/ol>\n<p>Doing so prevents Amazon SageMaker from providing internet access to your SageMaker Studio notebooks.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18964\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/25\/Private-package-installation-2.jpg\" alt=\"\" width=\"800\" height=\"318\"><\/p>\n<ol start=\"4\">\n<li>Create interface endpoints (via AWS PrivateLink) to access the following (and other AWS services you may require):\n<ol>\n<li>Amazon SageMaker API<\/li>\n<li>Amazon SageMaker runtime<\/li>\n<li>Amazon S3<\/li>\n<li>\n<a href=\"https:\/\/docs.aws.amazon.com\/STS\/latest\/APIReference\/welcome.html\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Security Token Service<\/a> (AWS STS)<\/li>\n<li><a href=\"http:\/\/aws.amazon.com\/cloudwatch\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon CloudWatch<\/a><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h2>Setting up a custom Conda repository using Amazon S3<\/h2>\n<p>Amazon SageMaker notebooks come with multiple environments already installed. The different Jupyter kernels in Amazon SageMaker notebooks are separate Conda environments. If you want to use an external library in a specific kernel, you can install the library in the environment for that kernel. This is typically done using <code>conda install<\/code>. When you use a <code>conda<\/code> command to install a package, Conda environment searches a set of default channels, which are usually online or remote channels (URLs) that host the Conda packages. However, because we assume the notebook instances don\u2019t have internet access, we modify those Conda channel paths to a private repository where our packages are stored.<\/p>\n<ol>\n<li>Build such custom channel is to create a bucket in Amazon S3.<\/li>\n<li>Copy the packages into the bucket.<\/li>\n<\/ol>\n<p>These packages can be either approved packages among the organization or the custom packages built using <code>conda build<\/code>. These packages need to be indexed periodically or as soon as there is an update. The methods to index packages are out of scope of this post.<\/p>\n<p>Because we set up the notebook to not allow direct internet access, the notebook can\u2019t connect to the S3 buckets that contain the channels unless you create a VPC endpoint.<\/p>\n<ol start=\"3\">\n<li>Create an Amazon S3 VPC endpoint to send the traffic through the VPC instead of the public internet.<\/li>\n<\/ol>\n<p>By creating a VPC endpoint, you allow your notebook instance to access the bucket where you stored the channels and its packages.<\/p>\n<ol start=\"4\">\n<li>We recommend that you also create a custom resource-based bucket policy that allows only requests from your private VPC to access to your S3 buckets. For instructions, see <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/vpc-endpoints-s3.html\" target=\"_blank\" rel=\"noopener noreferrer\">Endpoints for Amazon S3<\/a>.<\/li>\n<li>Replace the default channels of the Conda environment in your Amazon SageMaker notebooks with your custom channel (we do that in the next step when we build the isolated dev environment):\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\"># remove default channel from the .condarc\r\nconda config --remove channels 'defaults'\r\n# add the conda channels to the .condarc file\r\nconda config --add channels 's3:\/\/user-conda-repository\/main\/'\r\nconda config --add channels 's3:\/\/user-conda-repository\/condaforge\/'<\/code><\/pre>\n<\/div>\n<\/li>\n<\/ol>\n<h2>Setting up a custom PyPI repository using CodeArtifact<\/h2>\n<p>Data scientists typically use package managers such as pip, maven, npm, and others to install packages into their environments. By default, when you use pip to install a package, it downloads the package from the public <a href=\"https:\/\/pypi.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">PyPI repository.<\/a> To secure your environment, you can use private package management tools either on premises, such as Artifactory or Nexus, or on AWS, such as CodeArtifact. This allows you to allow access only to approved packages and perform safety checks. Alternatively, you may choose use a private PyPI mirror set up on <a href=\"http:\/\/aws.amazon.com\/ecs\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Container Service<\/a> (Amazon ECS) or <a href=\"https:\/\/aws.amazon.com\/fargate\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Fargate<\/a> to mirror the public PyPI repository in your private environment. For more information on this approach, see <a href=\"https:\/\/sagemaker-workshop.com\/security_for_sysops.html\" target=\"_blank\" rel=\"noopener noreferrer\">Building Secure Environments<\/a>.<\/p>\n<p>If you want to use pip to install Python packages, you can use CodeArtifact to control access to and validate the safety of the Python packages. CodeArtifact is a managed artifact repository service to help developers and organizations securely store and share the software packages used in your development, build, and deployment processes. The CodeArtifact integration with <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM), support for <a href=\"http:\/\/aws.amazon.com\/cloudtrail\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CloudTrail<\/a>, and encryption with <a href=\"http:\/\/aws.amazon.com\/kms\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Key Management Service<\/a> (AWS KMS) gives you visibility and the ability to control who has access to the packages.<\/p>\n<p>You can configure CodeArtifact to fetch software packages from public repositories such as PyPI. PyPI helps you find and install software developed and shared by the Python community. When you pull a package from PyPI, CodeArtifact automatically downloads and stores application dependencies from the public repositories, so recent versions are always available to you.<\/p>\n<h3>Creating a repository for PyPI<\/h3>\n<p>You can create a repository using the CodeArtifact console or the <a href=\"http:\/\/aws.amazon.com\/cli\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Command Line Interface (AWS CLI)<\/a>. Each repository is associated with the AWS account that you use when you create it. The following screenshot shows the view of choosing your AWS account on the CodeArtifact console.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-19181 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/30\/img3.png\" alt=\"\" width=\"800\" height=\"345\"><\/p>\n<p>A repository can have one or more CodeArtifact repository associated with it as an upstream repository. It can facilitate two needs.<\/p>\n<p>Firstly, it allows a package manager client to access the packages contained in more than one repository using a single URL endpoint.<\/p>\n<p>Secondly, when you create a repository, it doesn\u2019t contain any packages. If an upstream repository has an external connection to a public repository, the repositories that are downstream from it can pull packages from that public repository. For example, the repository <code>my-shared-python-repository<\/code> has an upstream repository named <a href=\"https:\/\/console.aws.amazon.com\/codesuite\/codeartifact\/d\/079329190341\/my-org\/r\/pypi-store?region=us-east-1\" target=\"_blank\" rel=\"noopener noreferrer\">pypi-store<\/a>, which acts as an intermediate repository that connects your repository to an external connection (your <a href=\"https:\/\/pypi.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">PyPI<\/a> repository). In this case, a package manager that is connected to <code>my-shared-python-repository<\/code> can pull packages from the PyPI public repository. The following screenshot shows this package flow.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18966\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/25\/Private-package-installation-4.jpg\" alt=\"\" width=\"800\" height=\"288\"><\/p>\n<p>For instructions on creating a CodeArtifact repository, see <a href=\"https:\/\/aws.amazon.com\/blogs\/aws\/software-package-management-with-aws-codeartifact\/\" target=\"_blank\" rel=\"noopener noreferrer\">Software Package Management with AWS CodeArtifact<\/a>.<\/p>\n<p>Because we disable internet access for the Amazon SageMaker notebooks, in the next section, we set up AWS PrivateLink endpoints to make sure all the traffic for installing the package in the notebooks traverses through the VPC.<\/p>\n<h3>Setting up AWS PrivateLink endpoints for CodeArtifact<\/h3>\n<p>You can configure CodeArtifact to use an interface VPC endpoint to improve the security of your VPC. When you use an interface VPC endpoint, you don\u2019t need an internet gateway, NAT device, or virtual private gateway. To create VPC endpoints for CodeArtifact, you can use the AWS CLI or Amazon VPC console. For this post, we use the <a href=\"http:\/\/aws.amazon.com\/ec2\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Compute Cloud<\/a> (Amazon EC2) <code>create-vpc-endpoint<\/code> AWS CLI command. The following two VPC endpoints are required so that all requests to CodeArtifact are in the AWS network.<\/p>\n<p>The following command creates an endpoint to access CodeArtifact repositories:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">aws ec2 create-vpc-endpoint --vpc-id vpcid --vpc-endpoint-type Interface \r\n  --service-name com.amazonaws.region.codeartifact.api --subnet-ids subnetid \r\n  --security-group-ids groupid<\/code><\/pre>\n<\/div>\n<p>The following command creates an endpoint to access package managers and build tools:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">aws ec2 create-vpc-endpoint --vpc-id vpcid --vpc-endpoint-type Interface \r\n  --service-name com.amazonaws.region.codeartifact.repositories --subnet-ids subnetid \r\n  --security-group-ids groupid --private-dns-enabled<\/code><\/pre>\n<\/div>\n<p>CodeArtifact uses Amazon S3 to store package assets. To pull packages from CodeArtifact, you must create a gateway endpoint for Amazon S3. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">aws ec2 create-vpc-endpoint --vpc-id vpcid --service-name com.amazonaws.region.s3 \r\n  --route-table-ids routetableid<\/code><\/pre>\n<\/div>\n<h2>Building your dev environment<\/h2>\n<p>Amazon SageMaker periodically updates the Python and dependency versions in the environments installed on the Amazon SageMaker notebook instances (when you stop and start) or in the images launched in SageMaker Studio. This might cause some incompatibility if you have your own managed package repositories and dependencies. You can freeze your dependencies in internet-free mode so that:<\/p>\n<ul>\n<li>You\u2019re not affected by periodic updates from Amazon SageMaker to the base environment<\/li>\n<li>You have better control over the dependencies in your environments and can get ample time to update or upgrade your dependencies<\/li>\n<\/ul>\n<h3>Using Amazon SageMaker notebook instances<\/h3>\n<p>To create your own dev environment with specific versions of Python and dependencies, you can use <em>lifecycle configuration<\/em> scripts. A lifecycle configuration provides shell scripts that run only when you create the notebook instance or whenever you start one. When you create a notebook instance, you can create a new lifecycle configuration and the scripts it uses or apply one that you already have. Amazon SageMaker has a <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-notebook-instance-lifecycle-config-samples\/tree\/master\/scripts\/persistent-conda-ebs\" target=\"_blank\" rel=\"noopener noreferrer\">lifecycle config script sample<\/a> that you can use and modify to create isolated dependencies as described earlier. With this script, you can do the following:<\/p>\n<ul>\n<li>Build an isolated installation of Conda<\/li>\n<li>Create a Conda environment with it<\/li>\n<li>Make the environment available as a kernel in Jupyter<\/li>\n<\/ul>\n<p>This makes sure that dependencies in that kernel aren\u2019t affected by the upgrades that Amazon SageMaker periodically roles out to the underlying AMI. This script installs a custom, persistent installation of Conda on the notebook instance\u2019s EBS volume, and ensures that these custom environments are available as kernels in Jupyter. We add Conda and CodeArtifact configuration to this script.<\/p>\n<p>The on-create script downloads and installs a custom Conda installation to the EBS volume via Miniconda. Any relevant packages can be installed here.<\/p>\n<ol>\n<li>Set up CodeArtifact.<\/li>\n<li>Set up your Conda channels.<\/li>\n<li>Install <code>ipykernel<\/code> to make sure that the custom environment can be used as a Jupyter kernel.<\/li>\n<li>Make sure the notebook instance has internet connectivity to download the Miniconda installer.<\/li>\n<\/ol>\n<p>The on-create script installs the <code>ipykernel<\/code> library so you can use create custom environments as Jupyter kernels, and uses <code>pip install and conda install<\/code> to install libraries. You can adapt the script to create custom environments and install the libraries that you want. Amazon SageMaker doesn\u2019t update these libraries when you stop and restart the notebook instance, so you can make sure that your custom environment has specific versions of libraries that you want. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">#!\/bin\/bash\r\n\r\n set -e\r\n sudo -u ec2-user -i &lt;&lt;'EOF'\r\n unset SUDO_UID\r\n\r\n # Configure common package managers to use CodeArtifact\r\n aws codeartifact login --tool pip --domain my-org --domain-owner &lt;000000000000&gt; --repository  my-shared-python-repository  --endpoint-url https:\/\/vpce-xxxxx.api.codeartifact.us-east-1.vpce.amazonaws.com \r\n\r\n # Install a separate conda installation via Miniconda\r\n WORKING_DIR=\/home\/ec2-user\/SageMaker\/custom-miniconda\r\n mkdir -p \"$WORKING_DIR\"\r\n wget https:\/\/repo.anaconda.com\/miniconda\/Miniconda3-4.6.14-Linux-x86_64.sh -O \"$WORKING_DIR\/miniconda.sh\"\r\n bash \"$WORKING_DIR\/miniconda.sh\" -b -u -p \"$WORKING_DIR\/miniconda\" \r\n rm -rf \"$WORKING_DIR\/miniconda.sh\"\r\n\r\n # Create a custom conda environment\r\n source \"$WORKING_DIR\/miniconda\/bin\/activate\"\r\n\r\n # remove default channel from the .condarc \r\n conda config --remove channels 'defaults'\r\n # add the conda channels to the .condarc file\r\n conda config --add channels 's3:\/\/user-conda-repository\/main\/'\r\n conda config --add channels 's3:\/\/user-conda-repository\/condaforge\/'\r\n\r\n KERNEL_NAME=\"custom_python\"\r\n PYTHON=\"3.6\"\r\n\r\n conda create --yes --name \"$KERNEL_NAME\" python=\"$PYTHON\"\r\n conda activate \"$KERNEL_NAME\"\r\n\r\n pip install --quiet ipykernel\r\n\r\n # Customize these lines as necessary to install the required packages\r\n conda install --yes numpy\r\n pip install --quiet boto3\r\n\r\n EOF\r\n<\/code><\/pre>\n<\/div>\n<p>The on-start script uses the custom Conda environment created in the on-create script, and uses the <code>ipykernel<\/code> package to add that as a kernel in Jupyter, so that they appear in the drop-down list in the Jupyter <strong>New<\/strong> menu. It also logs in to CodeArtifact to enable installing the packages from the custom repository. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">#!\/bin\/bash\r\n\r\nset -e\r\n\r\nsudo -u ec2-user -i &lt;&lt;'EOF'\r\nunset SUDO_UID\r\n\r\n# Get pip artifact\r\n\/home\/ec2-user\/SageMaker\/aws\/aws codeartifact login --tool pip --domain &lt;my-org&gt; --domain-owner &lt;xxxxxxxxx&gt; --repository  &lt;my-shared-python-repository.  --endpoint-url &lt;https:\/\/vpce-xxxxxxxx.api.codeartifact.us-east-1.vpce.amazonaws.com&gt; \r\n\r\nWORKING_DIR=\/home\/ec2-user\/SageMaker\/custom-miniconda\/\r\nsource \"$WORKING_DIR\/miniconda\/bin\/activate\"\r\n\r\nfor env in $WORKING_DIR\/miniconda\/envs\/*; do\r\n    BASENAME=$(basename \"$env\")\r\n    source activate \"$BASENAME\"\r\n    python -m ipykernel install --user --name \"$BASENAME\" --display-name \"Custom ($BASENAME)\"\r\ndone\r\n\r\n\r\nEOF\r\n\r\necho \"Restarting the Jupyter server..\"\r\nrestart jupyter-server<\/code><\/pre>\n<\/div>\n<p>CodeArtifact authorization tokens are valid for a default period of 12 hours. You can add a cron job to the on-start script to refresh the token automatically, or log in to CodeArtifact again in the Jupyter notebook terminal.<\/p>\n<h3>Using SageMaker Studio notebooks<\/h3>\n<p>You can <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/studio-byoi-create.html\" target=\"_blank\" rel=\"noopener noreferrer\">create your own custom Amazon SageMaker images<\/a> in your private dev environment in SageMaker Studio. You can add the custom kernels, packages, and any other files required to run a Jupyter notebook in your image. It gives you the control and flexibility to do the following:<\/p>\n<ul>\n<li>Install your own custom packages in the image<\/li>\n<li>Configure the images to be integrated with your custom repositories for package installation by users<\/li>\n<\/ul>\n<p>For example, you can install a selection of R or Python packages when building the image:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\"># Dockerfile\r\nRUN conda install --quiet --yes \r\n    'r-base=4.0.0' \r\n    'r-caret=6.*' \r\n    'r-crayon=1.3*' \r\n    'r-devtools=2.3*' \r\n    'r-forecast=8.12*' \r\n    'r-hexbin=1.28*'<\/code><\/pre>\n<\/div>\n<p>Or you can set up the Conda in the image to just use your own custom channels in Amazon S3 to install packages by changing the configuration of Conda channels:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\"># Dockerfile\r\nRUN \r\n    # add the conda channels to the .condarc file\r\n    conda config --add channels 's3:\/\/my-conda-repository\/_conda-forge\/' &amp;&amp; \r\n    conda config --add channels 's3:\/\/my-conda-repository\/main\/' &amp;&amp; \r\n    # remove defaults from the .condarc \r\n    conda config --remove channels 'defaults'<\/code><\/pre>\n<\/div>\n<p>You should use the CodeArtifact login command in SageMaker Studio to fetch credentials for use with pip:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\"># PyPIconfig.py\r\n# Configure common package managers to use CodeArtifact\r\n aws codeartifact login --tool pip --domain my-org --domain-owner &lt;000000000000&gt; --repository  my-shared-python-repository  --endpoint-url https:\/\/vpce-xxxxx.api.codeartifact.us-east-1.vpce.amazonaws.com <\/code><\/pre>\n<\/div>\n<p>CodeArtifact needs authorization tokens. You can add a cron job into the image to run the above command periodically. Alternatively, you can execute it manually when the notebooks get started. To make it simple for your users, you can add the preceding command to a shell script (such as <code>PyPIConfig.sh<\/code>) and copy the file into to the image to be loaded in SageMaker Studio. In your Dockerfile, add the following command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\"># Dockerfile\r\nCOPY PyPIconfig.sh \/home\/PyPIconfig.sh<\/code><\/pre>\n<\/div>\n<p>For ease of use, the <code>PyPIconfig.sh<\/code> is available in \/home on SageMaker Studio. You can easily run it to configure your pip client in SageMaker Studio and fetch an authorization token from CodeArtifact using your AWS credentials.<\/p>\n<p>Now, you can build and push your image into <a href=\"http:\/\/aws.amazon.com\/ecr\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Container Repository<\/a> (Amazon ECR). Finally, attach the image to multiple users (by attaching to a domain) or a single user (by attaching to the user\u2019s profile) in SageMaker Studio. The following screenshot shows the configuration on the SageMaker Studio control panel.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-19182 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/30\/img5-1.png\" alt=\"\" width=\"800\" height=\"321\"><\/p>\n<p>For more information about building a custom image and attaching it to SageMaker Studio, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/studio-byoi-create-sdk.html\" target=\"_blank\" rel=\"noopener noreferrer\">Bring your own custom SageMaker image tutorial<\/a>.<\/p>\n<h2>Installing the packages<\/h2>\n<p>In Amazon SageMaker notebook instances, as soon as you start the Jupyter notebook, you see a new kernel in Jupyter in the drop-down list of kernels (see the following screenshot). This environment is isolated from other default Conda environments.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18968\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/25\/Private-package-installation-6.jpg\" alt=\"\" width=\"800\" height=\"435\"><\/p>\n<p>In your notebook, when you use <code>pip install <em>&lt;package name&gt;<\/em><\/code>, the Python package manager client connects to your custom repository instead of the public repositories. Also, if you use <code>conda install <em>&lt;package name&gt;<\/em><\/code>, the notebook instance uses the packages in your Amazon S3 channels to install it. See the following screenshot of this code.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-19183 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/30\/img7.png\" alt=\"\" width=\"800\" height=\"339\"><\/p>\n<p>In SageMaker Studio, the custom images appear in the image selector dialog box of the SageMaker Studio Launcher. As soon as you select your own custom image, the kernel you installed in the image appears in the kernel selector dialog box. See the following screenshot.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18970\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/25\/Private-package-installation-8.jpg\" alt=\"\" width=\"800\" height=\"453\"><\/p>\n<p>As mentioned before, CodeArtifact authorization tokens are valid for a default period of 12 hours. If you\u2019re using CodeArtifact, you can open a terminal or notebook in SageMaker Studio and run the <code>PyPIconfig.sh<\/code> file to configure your client or refresh your expired token:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\"># Configure PyPI package managers to use CodeArtifact\r\n \/home\/pyPIconfig.sh<\/code><\/pre>\n<\/div>\n<p>The following screenshot shows your view in SageMaker Studio.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-19184 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/30\/img9.png\" alt=\"\" width=\"800\" height=\"215\"><\/p>\n<h2>Conclusion<\/h2>\n<p>This post demonstrated how to build a private environment for Amazon SageMaker notebook instances and SageMaker Studio to have better control over the dependencies in your environments. To build the private environment, we used the lifecycle configuration feature in notebook instances. The sample lifecycle config scripts are available on the <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-notebook-instance-lifecycle-config-samples\/tree\/master\/scripts\/persistent-conda-ebs\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repo<\/a>. To install custom packages in SageMaker Studio, we built a custom image and attached it to SageMaker Studio. For more information about this feature, see <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/bringing-your-own-custom-container-image-to-amazon-sagemaker-studio-notebooks\/\" target=\"_blank\" rel=\"noopener noreferrer\">Bringing your own custom container image to Amazon SageMaker Studio notebooks<\/a>. For this solution, we used CodeArtifact, which makes it easy to build a PyPI repository for approved Python packages across the organization. For more information, see <a href=\"https:\/\/aws.amazon.com\/blogs\/aws\/software-package-management-with-aws-codeartifact\/\" target=\"_blank\" rel=\"noopener noreferrer\">Software Package Management with AWS CodeArtifact<\/a>.<\/p>\n<p>Give the CodeArtifact a try, and share your feedback and questions in the comments.<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-12219 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/05\/07\/saeed-aghabozorgi.jpg\" alt=\"\" width=\"100\" height=\"135\">Saeed Aghabozorgi Ph.D<\/strong>. is senior ML Specialist in AWS, with a track record of developing enterprise level solutions that substantially increase customers\u2019 ability to turn their data into actionable knowledge. He is also a researcher in the artificial intelligence and machine learning field.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-15884 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/12\/stefan-natu.jpg\" alt=\"\" width=\"100\" height=\"113\"><strong>Stefan Natu\u00a0<\/strong>is a Sr. Machine Learning Specialist at AWS. He is focused on helping financial services customers build end-to-end machine learning solutions on AWS. In his spare time, he enjoys reading machine learning blogs, playing the guitar, and exploring the food scene in New York City.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/private-package-installation-in-amazon-sagemaker-running-in-internet-free-mode\/<\/p>\n","protected":false},"author":0,"featured_media":653,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/652"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=652"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/652\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/653"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=652"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=652"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=652"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}