{"id":1956,"date":"2022-03-10T19:51:53","date_gmt":"2022-03-10T19:51:53","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/03\/10\/secure-amazon-s3-access-for-isolated-amazon-sagemaker-notebook-instances\/"},"modified":"2022-03-10T19:51:53","modified_gmt":"2022-03-10T19:51:53","slug":"secure-amazon-s3-access-for-isolated-amazon-sagemaker-notebook-instances","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/03\/10\/secure-amazon-s3-access-for-isolated-amazon-sagemaker-notebook-instances\/","title":{"rendered":"Secure Amazon S3 access for isolated Amazon SageMaker notebook instances"},"content":{"rendered":"<div id=\"\">\n<p>In this post, we will demonstrate how to securely launch notebook instances in a private subnet of an <a href=\"http:\/\/aws.amazon.com\/vpc\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Virtual Private Cloud<\/a> (Amazon VPC), with internet access disabled, and to securely connect to <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) using VPC endpoints. This post is for network and security architects that support decentralized data science teams on AWS.<\/p>\n<p>SageMaker notebook instances can be deployed in a private subnet and we recommend deploying them without internet access. Securing your notebook instances within a private subnet helps prevent unauthorized internet access to your notebook instances, which may contain sensitive information.<\/p>\n<p>The examples in this post will use Notebook instance <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/notebook-lifecycle-config.html\">Lifecycle Configurations<\/a> (LCCs) to connect to an S3 VPC endpoint and download idle-usage detection and termination scripts onto the notebook instance. These scripts are configured to be run as cron jobs, thus helping to save costs by automatically stopping idle capacity.<\/p>\n<h2>Solution overview<\/h2>\n<p>The following diagram describes the solution we implement. We create a SageMaker notebook instance in a private subnet of a VPC. We attach to that notebook instance a lifecycle configuration that copies an idle-shutdown script from Amazon S3 to the notebook instance at boot time (when starting a stopped notebook instance). The lifecycle configuration accesses the S3 bucket via <a href=\"https:\/\/aws.amazon.com\/privatelink\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS PrivateLink<\/a>.<\/p>\n<p>This architecture allows our internet-disabled SageMaker notebook instance to access S3 files, without traversing the public internet. Because the network traffic does not traverse the public internet, we significantly reduce the number of vectors bad actors can exploit in order to compromise the security posture of the notebook instance.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-01-HighLevelArch.png\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-31104 size-large alignnone\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-01-HighLevelArch-1024x358.png\" alt=\"High Level Architecture\" width=\"1024\" height=\"358\"><\/a><\/p>\n<h2>Prerequisites<\/h2>\n<p>We assume you have an AWS account, in addition to an Amazon VPC with at least one private subnet that is isolated from the internet. If you do not know how to create a VPC with a public\/private subnet, check out <a href=\"https:\/\/docs.aws.amazon.com\/AmazonECS\/latest\/developerguide\/create-public-private-vpc.html\">this guide<\/a>. A subnet is isolated from the internet if its route table doesn\u2019t forward traffic to the internet through the NAT gateway and Internet gateway to the internet. The following screenshot shows an example of an isolated route table. Traffic stays within the subnet; there are no NAT gateways or internet gateways that could forward traffic to the internet.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-02-PrereqRouteTable.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-31103 alignnone\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-02-PrereqRouteTable-1024x294.png\" alt=\"Prerequisite Route Table\" width=\"1024\" height=\"294\"><\/a><\/p>\n<p>Additionally, we need an S3 bucket. Any S3 bucket with the secure default configuration settings can work. Make sure you have read and write access to this bucket from the user account. This is important when we test our solution.\u00a0 This entry in the <a href=\"https:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/userguide\/s3-access-control.html\">S3 User Guide<\/a> should clarify how to do this.<\/p>\n<p>Now we create a SageMaker notebook instance. The notebook instance should be deployed into an isolated subnet with <strong>Direct Internet Access<\/strong> selected as <strong>Disabled<\/strong>.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-03-NotebookInstanceConfig.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-31102\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-03-NotebookInstanceConfig.png\" alt=\"Notebook Instance Configuration\" width=\"725\" height=\"558\"><\/a><\/p>\n<p>We also need to configure this notebook to run as the root user. Under Permissions and encryption, choose Enable for the Root access setting.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/14\/ML-4423-12-RootConfig.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33074\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/14\/ML-4423-12-RootConfig.png\" alt=\"Root Config\" width=\"635\" height=\"393\"><\/a><\/p>\n<p>Once these settings have been configured, choose <strong>Create notebook instance<\/strong> at the bottom of the window.<\/p>\n<h2>Configure access to Amazon S3<\/h2>\n<p>To configure access to Amazon S3, complete the following steps:<\/p>\n<ol>\n<li>On the Amazon S3 console, navigate to the S3 bucket you use to store scripts.<\/li>\n<\/ol>\n<p>Access to objects in this bucket is only granted if explicitly allowed via an <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) policy.<\/p>\n<ol start=\"2\">\n<li>In this bucket, create a folder called lifecycle-configurations.<\/li>\n<li>Copy the following <a href=\"https:\/\/raw.githubusercontent.com\/aws-samples\/amazon-sagemaker-notebook-instance-lifecycle-config-samples\/master\/scripts\/auto-stop-idle\/autostop.py\" target=\"_blank\" rel=\"noopener noreferrer\">script from GitHub<\/a> and save it in your S3 bucket with the key <code>lifecycle-configurations\/autostop.py<\/code>.<\/li>\n<\/ol>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-04-NotebookConsoleView.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-large wp-image-31101\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-04-NotebookConsoleView-1024x453.png\" alt=\"Notebook Console View\" width=\"1024\" height=\"453\"><\/a><\/p>\n<p>We can now begin modifying our network to allow access between Amazon S3 and our isolated notebook instance.<\/p>\n<ol start=\"4\">\n<li>Write a least privilege IAM policy defining access to this bucket and the lifecycle policy script.<\/li>\n<li>Create an AWS PrivateLink gateway endpoint to Amazon S3.<\/li>\n<li>Create a SageMaker lifecycle configuration that requests the <code>autostop.py<\/code> script from Amazon S3 via an API call.<\/li>\n<li>Attach the lifecycle configuration to the notebook instance.<\/li>\n<\/ol>\n<p>After you implement these steps, we can test the configuration by performing an Amazon S3 CLI command in a notebook cell. If the command is successful, we have successfully implemented least privilege access to Amazon S3 from an isolated network location with AWS PrivateLink.<\/p>\n<p>A more robust test would be to leave the notebook instance idle and allow the lifecycle policy to run as expected. If all goes well, the notebook instance should shut down after a 5-minute idle period.<\/p>\n<h2>Configure AWS PrivateLink for Amazon S3<\/h2>\n<p>AWS PrivateLink is a networking service that creates private endpoints in your VPC for other AWS services like <a href=\"http:\/\/aws.amazon.com\/ec2\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Compute Cloud<\/a> (Amazon EC2), Amazon S3, and <a href=\"http:\/\/aws.amazon.com\/sns\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Notification Service<\/a> (Amazon SNS). These endpoints facilitate API requests to other AWS services through your VPC instead of through the public internet. This is the crucial component that allows our solution to privately and securely access the S3 bucket that contains our lifecycle configuration script.<\/p>\n<ol>\n<li>On the Amazon VPC console, choose <strong>Endpoints<\/strong>.<\/li>\n<\/ol>\n<p>The list of endpoints is empty by default.<\/p>\n<ol start=\"2\">\n<li>Choose <strong>Create endpoint<\/strong>.<\/li>\n<li>For <strong>Service category<\/strong>, select <strong>AWS services<\/strong>.<\/li>\n<li>For <strong>Service Name<\/strong>, search for S3 and select the gateway option.<\/li>\n<li>For <strong>VPC<\/strong>, choose whichever private subnets you created earlier.<\/li>\n<li>For <strong>Configure route tables<\/strong>, select the default route table for that VPC.<\/li>\n<li>Under <strong>Policy<\/strong>, select the <strong>Custom<\/strong> option and enter the following policy code:<\/li>\n<\/ol>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-05-PrivateLinkConfig.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-large wp-image-31100\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-05-PrivateLinkConfig-1024x583.png\" alt=\"Private Link Configuration\" width=\"1024\" height=\"583\"><\/a><\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">{\n  \"Version\": \"2008-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Principal\": \"*\",\n      \"Action\": [\n        \"s3:Get*\",\n        \"s3:List*\"\n      ],\n      \"Resource\": [\n        \"arn:aws:s3:::<strong>&lt;bucket-name&gt;<\/strong>\",\n        \"arn:aws:s3:::<strong>&lt;bucket-name&gt;<\/strong>\/lifecycle-configurations\/*\"\n      ]\n    }\n  ]\n}<\/code><\/pre>\n<\/p><\/div>\n<p>This policy document allows read-only access to the lifecycle-configurations S3 buckets. This policy restricts S3 operations to only the lifecycle-configurations bucket, we can additional buckets to the resource clause as we need. Although this endpoint\u2019s policy isn\u2019t least privilege access for our notebook instance, it still protects our S3 bucket resources from being modified by resources in this VPC.<\/p>\n<ol start=\"8\">\n<li>To create this endpoint with the AWS CLI, run the following command:<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">aws ec2 create-vpc-endpoint --vpc-endpoint-type Gateway --vpc-id vpc-id --service-name com.amazonaws.region.s3 --route-table-ids route-table-id --policy-document \n'{\n    \"Version\": \"2008-10-17\",\n    \"Statement\": [\n      {\n        \"Effect\": \"Allow\",\n        \"PrincipalGroup\": \"*\",\n        \"Action\": [\n          \"s3:Get*\",\n          \"s3:List*\"\n        ],\n        \"Resource\": [\n          \"arn:aws:s3:::<strong>&lt;bucket-name&gt;<\/strong>\",\n          \"arn:aws:s3:::<strong>&lt;bucket-name&gt;<\/strong>\/lifecycle-configurations\/*\"\n        ]\n      }\n   ]\n}'<\/code><\/pre>\n<\/p><\/div>\n<p>Gateway endpoints automatically modify the specified route tables to route traffic through to this endpoint. Although a route has been added, our VPC is still isolated. The route points to a managed prefix list, or a list of predefined IP addresses, used by the endpoint service to route traffic through this VPC to the Amazon S3 PrivateLink endpoint.<\/p>\n<h2>Modify the SageMaker notebook instance IAM role<\/h2>\n<p>We start by crafting a least privilege IAM policy for our notebook instance role\u2019s policy document.<\/p>\n<ol>\n<li>On the IAM console, choose <strong>Policies<\/strong>.<\/li>\n<li>Choose <strong>Create policy<\/strong>.<\/li>\n<li>On the <strong>JSON<\/strong> tab, enter the following code:<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre><code class=\"lang-iam\">{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Sid\": \"S3LifecycleConfigurationReadPolicy\",\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:ListBucket\"\n      ],\n      \"Resource\": [\n        \"arn:aws:s3:::<strong>&lt;bucket-name&gt;<\/strong>\",\n        \"arn:aws:s3:::<strong>&lt;bucket-name&gt;<\/strong>\/lifecycle-configurations\/*\"\n      ]\n    }\n  ]\n}<\/code><\/pre>\n<\/p><\/div>\n<p>This policy is an example of <em>least privilege access<\/em>, a security paradigm that is foundational to a <a href=\"https:\/\/aws.amazon.com\/blogs\/security\/zero-trust-architectures-an-aws-perspective\/\">Zero Trust<\/a> architecture. This policy allows requests for GetObject and ListBucket API calls only, specifically on the Amazon S3 resources that manage our lifecycle policies. This IAM policy document can only be applied in instances where you\u2019re downloading lifecycle policies from Amazon S3.<\/p>\n<ol start=\"4\">\n<li>Save this policy as <code>S3LifecycleConfigurationReadPolicy<\/code>.<\/li>\n<li>In the navigation pane, choose <strong>Roles<\/strong>.<\/li>\n<li>Search for and choose the role attached to the isolated notebook instances and edit the role\u2019s policy document.<\/li>\n<li>Search for the newly created policy and attach it to this role\u2019s policy document.<\/li>\n<\/ol>\n<p>Now your isolated notebook has permissions to access Amazon S3 via the <code>GetObject<\/code> and <code>ListBucket<\/code> API calls. We can test this by running the following snippet in a notebook cell:<\/p>\n<p><code>!aws s3api get-object --bucket <strong>&lt;bucket-name&gt;<\/strong> --key lifecycle-configurations\/autostop.py autostop.py<\/code><\/p>\n<p>At this point in the configuration, you should no longer see a permission denied error, but a timeout error. This is good; it means we have permission to access Amazon S3 but we haven\u2019t established the network connectivity to do so. We do this in the next section.<\/p>\n<p>Next, we create our IAM policy and role via the <a href=\"http:\/\/aws.amazon.com\/cli\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Command Line Interface<\/a> (AWS CLI).<\/p>\n<ol start=\"8\">\n<li>Create the following policy and save the ARN from the output for a later step:<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">aws iam create-policy --policy-name S3LifecycleConfigurationReadPolicy --policy-document \n&gt; '{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Sid\": \"S3LifecycleConfigurationReadPolicy\",\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:ListBucket\"\n      ],\n      \"Resource\": [\n        \"arn:aws:s3:::<strong>&lt;bucket-name&gt;<\/strong>\",\n        \"arn:aws:s3:::<strong>&lt;bucket-name&gt;<\/strong>\/lifecycle-configurations\/*\"\n      ]\n    }\n  ]\n}'<\/code><\/pre>\n<\/p><\/div>\n<ol start=\"9\">\n<li>Create the role:<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">aws iam create-role --role-name GeneralIsolatedNotebook --assume-role-policy-document \n&gt; '{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Sid\": \"\",\n      \"Effect\": \"Allow\",\n      \"Principal\": {\n        \"Service\": \"sagemaker.amazonaws.com\"\n      },\n      \"Action\": \"sts:AssumeRole\"\n    }\n  ]\n}'<\/code><\/pre>\n<\/p><\/div>\n<ol start=\"10\">\n<li>Attach our custom policy to the new role:<\/li>\n<\/ol>\n<p><code>aws iam attach-role-policy --role-name GeneralIsolatedNotebookRole --policy-arn policy-arn<\/code><\/p>\n<ol start=\"11\">\n<li>Repeat these steps to create a new policy called <code>StopNotebookInstance<\/code>.<\/li>\n<\/ol>\n<p>This policy gives the <code>autostop.py<\/code> script the ability to shut down the notebook instance. The JSON for this policy is as follows:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Sid\": \"VisualEditor0\",\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"sagemaker:StopNotebookInstance\",\n        \"sagemaker:DescribeNotebookInstance\"\n      ],\n      \"Resource\": \"arn:aws:sagemaker:region-name:329542461890:notebook-instance\/*\"\n    }\n  ]\n}<\/code><\/pre>\n<\/p><\/div>\n<ol start=\"12\">\n<li>Create and attach this policy to the notebook instance\u2019s role using either the AWS Console for IAM or the AWS CLI.<\/li>\n<\/ol>\n<p>We allow this policy to act on any notebook instance in this account. This is acceptable because we want to reuse this policy for additional notebook instances. For your implementation, be sure to craft separate least privilege access-style policies for any additional SageMaker actions that a specific notebook takes.<\/p>\n<h2>Create a lifecycle configuration<\/h2>\n<p>Lifecycle configurations are bash scripts that run on the notebook instance at startup. This feature makes lifecycle configurations flexible and powerful, but limited by the capabilities of the bash programming language. A common design pattern is to run secondary scripts written in a high-level programming language like Python. This pattern allows us to manage lifecycle configurations in source control. We can also define fairly complex state management logic using a high-level language.<\/p>\n<p>The following lifecycle configuration is a bash script that copies a Python script from Amazon S3. After copying the file, the bash script creates a new entry in cron that runs the Python script every 5 minutes. The Python script makes an API call to the Jupyter process running on the notebook instance. This API is used to discern if the notebook instance has been idle for the timeout duration. If the script determines the notebook instance has been idle for the last 5 minutes, it will shutdown the notebook instance.\u00a0 This is a good practice for cost &amp; emissions-savings. The 5 minute idle timeout period can be modified by changing the value of the <code>IDLE_TIME<\/code> variable.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">#!\/bin\/bash\nset -e\nIDLE_TIME=3600\numask 022\necho \"Fetching the autostop script\"\naws s3 cp s3:\/\/<strong>&lt;bucket-name&gt;<\/strong>\/lifecycle-configurations\/autostop.py \/\u00a0\necho \"Starting the SageMaker autostop script in cron\"\n(crontab -l 2&gt;\/dev\/null; echo \"*\/5 * * * * \/usr\/bin\/python \/autostop.py --time $IDLE_TIME --ignore-connections\") | crontab \u2013<\/code><\/pre>\n<\/p><\/div>\n<p>To create a lifecycle configuration, complete the following steps:<\/p>\n<ol>\n<li>On the SageMaker console, choose <strong>Notebooks<\/strong>.<\/li>\n<li>Choose <strong>Lifecycle configurations<\/strong>.<\/li>\n<li>Choose <strong>Create configuration<\/strong>.<\/li>\n<li>On the <strong>Start notebook tab<\/strong>, enter the preceding bash script.<\/li>\n<li>Provide a descriptive name for the script.<\/li>\n<li>Choose <strong>Create configuration<\/strong>.<\/li>\n<\/ol>\n<p>You can also create the lifecycle configuration with the AWS CLI (see the following code). Note that the script itself must be base64 encoded. Keep this in mind when using the AWS CLI to create these configurations.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">aws sagemaker create-notebook-instance-lifecycle-config --notebook-instance-lifecycle-config-name auto-stop-idle-from-s3 --on-start Content='base64-encoded-script'<\/code><\/pre>\n<\/p><\/div>\n<p>After you create the lifecycle configuration, it appears in the list of available configurations.<\/p>\n<ol start=\"7\">\n<li>From here, navigate back to your notebook instance. If the notebook instance is running, turn it off by selecting the notebook instance and choosing <strong>Stop<\/strong> on the top left corner.<\/li>\n<li>Choose <strong>Edit<\/strong> in the section <strong>Notebook instance settings<\/strong>.<\/li>\n<li>Select your new lifecycle configuration from the list and choose <strong>Update notebook instance<\/strong>.<\/li>\n<\/ol>\n<p>The ARN of the lifecycle configuration is now attached to your notebook instance.<\/p>\n<p>To do this in the AWS CLI, run the following command:<\/p>\n<p><code>aws sagemaker update-notebook-instance --notebook-instance-name notebook-name --lifecycle-config-name lifecycle-config-name<\/code><\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-09-ReconfiguredNotebook.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-large wp-image-31094\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-09-ReconfiguredNotebook-1024x404.png\" alt=\"Reconfigured Notebook with Lifecycle Policy\" width=\"1024\" height=\"404\"><\/a><\/p>\n<h2>Test Amazon S3 network access from an isolated notebook instance<\/h2>\n<p>To test this process, we need to make sure we can copy the Python file from Amazon S3 into our isolated notebook instance. Because we configured our lifecycle configuration to run on notebook startup, we only need to start our notebook instance to run the test. When our notebook starts, open a Jupyter notebook and examine the local file system. Our <code>autostop.py<\/code> script from the S3 bucket has now been installed onto our notebook instance.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-10-Evidence.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-large wp-image-31096\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-10-Evidence-1024x227.png\" alt=\"File Transfer Test\" width=\"1024\" height=\"227\"><\/a><\/p>\n<p>If your notebook has root permissions, you can even examine the notebook\u2019s crontab by running the following:<\/p>\n<p>We need to run this command as the root user because the LCC adds the cron job to the cron service as the root user. This proves that the <code>autostop.py<\/code> script has been added to the crontab on notebook startup. Because this command opens the cron file, you have to manually stop the kernel command to view the output.<\/p>\n<p data-wp-editing=\"1\"><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-11-Evidence2.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-large wp-image-31095\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/ML-4423-11-Evidence2-1024x186.png\" alt=\"Crontab Verification\" width=\"1024\" height=\"186\"><\/a><\/p>\n<h2>Clean up<\/h2>\n<p>When you destroy the VPC endpoint, the notebook instance loses access to the S3 bucket. This introduces a timeout error on notebook startup. Remove the lifecycle configuration from the notebook instance. To do this, select the notebook instance within the Amazon SageMaker service of the AWS Management Console and choose <strong>Edit <\/strong>in the section <strong>Notebook instance settings<\/strong>. Now the notebook instance doesn\u2019t attempt to pull the <code>autostop.py<\/code> script from Amazon S3.<\/p>\n<h2>Conclusion<\/h2>\n<p>SageMaker allows you to provision notebook instances within a private subnet of a VPC. As an option you can also disable internet access for such notebooks to improve the security posture of these notebooks. Disabling internet access adds defense in depth against bad actors, and allows data scientists to work with notebooks in a secure environment.<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/frgud-headshot.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-31120 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/19\/frgud-headshot.jpg\" alt=\"frgud Headshot\" width=\"79\" height=\"117\"><\/a><strong>Dan Ferguson<\/strong> is a Solutions Architect at Amazon Web Services, focusing primarily on Private Equity &amp; Growth Equity investments into late-stage startups.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/secure-amazon-s3-access-for-isolated-amazon-sagemaker-notebook-instances\/<\/p>\n","protected":false},"author":0,"featured_media":1957,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1956"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1956"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1956\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1957"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1956"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1956"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1956"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}