{"id":750,"date":"2021-01-12T04:08:34","date_gmt":"2021-01-12T04:08:34","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2021\/01\/12\/hosting-a-private-pypi-server-for-amazon-sagemaker-studio-notebooks-in-a-vpc\/"},"modified":"2021-01-12T04:08:34","modified_gmt":"2021-01-12T04:08:34","slug":"hosting-a-private-pypi-server-for-amazon-sagemaker-studio-notebooks-in-a-vpc","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/01\/12\/hosting-a-private-pypi-server-for-amazon-sagemaker-studio-notebooks-in-a-vpc\/","title":{"rendered":"Hosting a private PyPI server for Amazon SageMaker Studio notebooks in a VPC"},"content":{"rendered":"<div id=\"\">\n<p>Amazon SageMaker Studio notebooks provide a full-featured <a href=\"https:\/\/aws.amazon.com\/blogs\/aws\/amazon-sagemaker-studio-the-first-fully-integrated-development-environment-for-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">integrated development environment<\/a> (IDE) for flexible machine learning (ML) experimentation and development. Security measures secure and support a versatile and collaborative environment. In some cases, such as to protect sensitive data or meet regulatory requirements, security protocols require that public internet access be disabled in the development environment.<\/p>\n<p>Typically, developers have access to the public internet and can install any new libraries you want to import. You can install Python packages from the public Python Package Index (PyPI), a Python software repository, using standard tools such as pip. You can find hundreds of thousands of packages, including common packages such as NumPy, Pandas, Matplotlib, Pytest, Requests, Django, and BeautifulSoup.<\/p>\n<p>In a development environment with internet access disabled, you can instead mirror packages and host your own PyPI server hosted in your own <a href=\"https:\/\/aws.amazon.com\/vpc\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Virtual Private Cloud (Amazon VPC)<\/a>. A VPC is a logically isolated virtual network into which you can launch resources, such as <a href=\"https:\/\/aws.amazon.com\/ec2\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Compute Cloud (Amazon EC2)<\/a> instances and SageMaker Studio domains. You have fine-grained access control over its network connectivity. You can specify an IP address range for the VPC and associate security groups to control its inbound and outbound traffic. You can also add subnets that use a subset of IP addresses within the VPC, and choose whether each subnet is open to the public internet or is private.<\/p>\n<p>When you use a local PyPI server with this architecture and install Python libraries from your SageMaker Studio notebook, you connect to your private server instead of a public package index, and all traffic remains within a single secured VPC and private subnet.<\/p>\n<p>SageMaker Studio recently launched <a href=\"https:\/\/aws.amazon.com\/about-aws\/whats-new\/2020\/10\/now-launch-amazon-sagemaker-in-your-amazon-vpc\/\" target=\"_blank\" rel=\"noopener noreferrer\">VPC integration<\/a> to meet these security needs. You can now launch Studio notebooks within a private VPC, disabling internet access. To install Python packages within this secure environment, you can configure an EC2 instance in your VPC that acts as a PyPI server for your notebooks. This enables you to maintain productivity and ease of package installation while working within a private environment that isn\u2019t accessible from the public internet.<\/p>\n<h2>Solution overview<\/h2>\n<p>This solution creates a private PyPI server on an EC2 instance, and connects it to a SageMaker Studio notebook through network configuration including a VPC, private subnet, security group, and elastic network interface. The following diagram illustrates this architecture.<\/p>\n<h2><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-20029\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/12\/17\/ML-1349-1.jpg\" alt=\"The following diagram illustrates this architecture.\" width=\"800\" height=\"381\"><\/h2>\n<p>You complete the following steps to implement this solution:<\/p>\n<ol>\n<li>Launch an EC2 instance within a VPC, subnet, and security group.<\/li>\n<li>Configure the instance to function as a private PyPI server.<\/li>\n<li>Create a VPC endpoint and add security group rules.<\/li>\n<li>Create a VPC-only SageMaker Studio domain, user, and notebook with the necessary permissions and networking.<\/li>\n<li>Install a Python package from the PyPI server onto the SageMaker Studio notebook.<\/li>\n<\/ol>\n<h2>Prerequisites<\/h2>\n<p>This is an intermediate-level solution with the following prerequisites:<\/p>\n<h2>Launching an EC2 instance<\/h2>\n<p>For this post, we launch a new EC2 instance in the <code>us-east-2<\/code> Region. For the full list of available Regions supporting SageMaker Studio, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/regions-quotas.html\" target=\"_blank\" rel=\"noopener noreferrer\">Supported Regions and Quotas<\/a>.<\/p>\n<ol>\n<li>On the Amazon EC2 console, launch a new instance in a Region supporting SageMaker Studio.<\/li>\n<li>Choose an Amazon Linux 2 AMI.<\/li>\n<li>Choose a t2.medium instance (or larger t2, if preferred).<\/li>\n<li>On the <strong>Step 3: Configure Instance Details<\/strong> page, for Network, choose your VPC.<\/li>\n<li>For Subnet, choose your subnet.<\/li>\n<\/ol>\n<p>You can use the default VPC and subnet, use other existing resources, or create new ones. Make sure to note the VPC and subnet you select for later reference.<\/p>\n<ol start=\"6\">\n<li>Leave all other settings as-is.<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-20488 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/01\/11\/Step-3-Configure-Instance-Details.jpg\" alt=\"\" width=\"800\" height=\"324\">\n<\/li>\n<li>Use default storage and tag settings.<\/li>\n<li>On the <strong>Step 6: Configure Security Group<\/strong> page, for <strong>Assign a security group<\/strong>, select <strong>Create a new security group.<\/strong>\n<\/li>\n<li>For <strong>Security group name<\/strong>, enter <code>studio-SG<\/code>.<\/li>\n<li>For <strong>Type<\/strong>, choose <strong>SSH<\/strong> on port range 22.<\/li>\n<li>For <strong>Source<\/strong>, choose <strong>My IP<\/strong>.<\/li>\n<\/ol>\n<p>This allows you to SSH onto the instance from your current internet network.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-20489 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/01\/11\/Step-6-Configure-Security-Group.jpg\" alt=\"\" width=\"800\" height=\"218\"><\/p>\n<ol start=\"12\">\n<li>Create a new key pair, <code>studio-host<\/code>.<\/li>\n<li>Launch the instance.<\/li>\n<\/ol>\n<p>For more information about launching an instance, see <a href=\"https:\/\/docs.aws.amazon.com\/AWSEC2\/latest\/UserGuide\/EC2_GetStarted.html\" target=\"_blank\" rel=\"noopener noreferrer\">Tutorial: Getting started with Amazon EC2 Linux instances<\/a>.<\/p>\n<h2>Configuring the instance as a PyPI server<\/h2>\n<p>To configure your instance, complete the following steps:<\/p>\n<ol>\n<li>Open a terminal window and navigate to the directory containing your .pem file.<\/li>\n<li>Change the key permissions and SSH onto your instance, substituting in the public IP address and Region:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">chmod 400 studio-host.pem\r\nssh -i \"studio-host.pem\" ec2-user@ec2-x-x-x-x.{region}.compute.amazonaws.com<\/code><\/pre>\n<\/div>\n<\/li>\n<\/ol>\n<p>If needed, you can find the SSH command by selecting your instance on the console, choosing <strong>Connect<\/strong>, and navigating to the <strong>SSH Client<\/strong> tab.<\/p>\n<ol start=\"3\">\n<li>Install <a href=\"https:\/\/pip.pypa.io\/en\/stable\/\" target=\"_blank\" rel=\"noopener noreferrer\">pip<\/a>, which you use to install Python packages, and <a href=\"https:\/\/bandersnatch.readthedocs.io\/en\/latest\/\" target=\"_blank\" rel=\"noopener noreferrer\">bandersnatch<\/a>, which you use to mirror packages from the public PyPI server onto your instance. For this post, we use the package <a href=\"https:\/\/github.com\/awslabs\/aws-data-wrangler\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Data Wrangler<\/a>, an AWS Professional Services open-source library that integrates Pandas DataFrames with AWS services:\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">sudo yum install python3-pip\r\nsudo pip3 install multidict==4.7.6\r\nsudo pip3 install yarl==1.6.0\r\nsudo pip3 install bandersnatch<\/code><\/pre>\n<\/div>\n<\/li>\n<\/ol>\n<p>You now configure bandersnatch to specify packages and their versions to mirror.<\/p>\n<ol start=\"4\">\n<li>Open a config file:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">sudo vim \/etc\/bandersnatch.conf<\/code><\/pre>\n<\/div>\n<\/li>\n<\/ol>\n<ol start=\"5\">\n<li>Enter the following file contents:\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">[mirror]\r\ndirectory = \/pypi\r\nmaster = https:\/\/pypi.org\r\ntimeout = 10\r\nworkers = 3\r\nhash-index = false\r\nstop-on-error = false\r\njson = false\r\n\r\n[plugins]\r\nenabled =\r\n    whitelist_project\r\n    allowlist_release\r\n\r\n[whitelist]\r\npackages =\r\n    awswrangler==1.10.0\r\n    pyarrow==2.0.0\r\n    SQLAlchemy==1.3.10\r\n    s3fs==0.4.2\r\n    numpy==1.18.4\r\n    sqlalchemy-redshift==0.7.9\r\n    boto3==1.15.10\r\n    pandas==1.1.0\r\n    psycopg2-binary==2.8.0\r\n    pymysql==0.9.3\r\n    botocore==1.18.10\r\n    fsspec==0.7.4\r\n    s3transfer==0.3.2\r\n    jmespath==0.9.4\r\n    pytz==2019.3\r\n    python-dateutil==2.8.1\r\n    urllib3==1.25.8\r\n    six==1.14.0\r\n<\/code><\/pre>\n<\/div>\n<\/li>\n<\/ol>\n<ol start=\"6\">\n<li>Mirror the libraries and list the directory contents to view that the libraries have been copied onto the instance:\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">sudo \/usr\/local\/bin\/bandersnatch mirror\r\nls \/pypi\/web\/simple\/<\/code><\/pre>\n<\/div>\n<\/li>\n<\/ol>\n<p>You must configure pip so that when pip is run to install packages, they are searched for within your private PyPI server instead of on the public server. The file already exists, and you add two more lines to the existing file.<\/p>\n<ol start=\"7\">\n<li>Open the file:\n          <\/li>\n<\/ol>\n<ol start=\"8\">\n<li>Ensure your pip config file reads as follows, adding the last two lines:\n<div class=\"hide-language\">\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">[global] \r\ndisable_pip_version_check = 1 \r\nformat = columns \r\nindex-url = http:\/\/localhost\/simple \r\ntrusted-host = localhost<\/code><\/pre>\n<\/div>\n<\/div>\n<\/li>\n<\/ol>\n<ol start=\"9\">\n<li>Install and configure nginx so that the instance can function as a private web server:\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">sudo amazon-linux-extras install nginx1\r\nsudo vim \/etc\/nginx\/nginx.conf<\/code><\/pre>\n<\/div>\n<\/li>\n<\/ol>\n<ol start=\"10\">\n<li>Update the server section of the nginx config file to change the <code>server_name<\/code> to <code>localhost<\/code>, listen on the private IP address, and add the root and index locations. The server section of the nginx config file should be as follows:\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">server {\r\n        <strong>listen x.x.x.x:80;<\/strong>\r\n        listen       80;\r\n        listen       [::]:80;\r\n        <strong>server_name localhost;<\/strong>\r\n        root         \/usr\/share\/nginx\/html;\r\n\r\n        # Load configuration files for the default server block.\r\n        include \/etc\/nginx\/default.d\/*.conf;\r\n\r\n       <strong> location \/ { root \/pypi\/web\/; index index.html index.htm index.php; }<\/strong>\r\n\r\n        error_page 404 \/404.html;\r\n            location = \/40x.html {\r\n        }\r\n\r\n        error_page 500 502 503 504 \/50x.html;\r\n            location = \/50x.html {\r\n        }\r\n    }\r\n<\/code><\/pre>\n<\/div>\n<\/li>\n<li>Start the server and install the package locally to test it out:\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">sudo service nginx start\r\npip3 install --user awswrangler<\/code><\/pre>\n<\/div>\n<\/li>\n<\/ol>\n<p>Note that the packages are collected from the localhost, not the public package index.<\/p>\n<p>You now have a private PyPI server ready for use.<\/p>\n<h2>Creating a VPC endpoint<\/h2>\n<p>VPC endpoints allow resources within a VPC to access AWS services. For this solution, you will create an endpoint for the SageMaker API. You can extend this solution by adding more endpoints for other services you need to access from your notebook.<\/p>\n<p>There are two types of VPC endpoints:<\/p>\n<ul>\n<li>\n<strong>Interface endpoints<\/strong> \u2013 Elastic network interfaces within a subnet that serve as entry points for traffic destined to a supported AWS service, such as SageMaker<\/li>\n<li>\n<strong>Gateway endpoints<\/strong> \u2013 Only supported for Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB<\/li>\n<\/ul>\n<ol>\n<li>On the Amazon VPC console, choose <strong>Endpoints.<\/strong>\n<\/li>\n<li>Choose <strong>Create Endpoint.<\/strong>\n<\/li>\n<li>Create the SageMaker API endpoint <code>com.amazonaws.{region}<\/code>.<code>sagemaker.api<\/code>.<\/li>\n<li>Make sure you choose the same VPC, subnet, and security group used by your EC2 instance.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-20032\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/12\/17\/ML-1349-4.jpg\" alt=\"Make sure you choose the same VPC, subnet, and security group used by your EC2 instance.\" width=\"800\" height=\"337\"><\/p>\n<p>When finished, your endpoint is listed as shown in the following screenshot.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-20484 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/01\/11\/Endpoint-ID.jpg\" alt=\"\" width=\"800\" height=\"53\"><\/p>\n<p>For more information about VPC endpoints, including the distinction between interface endpoints and gateway endpoints, see <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/vpc-endpoints.html\" target=\"_blank\" rel=\"noopener noreferrer\">VPC endpoints<\/a>.<\/p>\n<h3>Editing your security group rules<\/h3>\n<p>Edit your security group to add an inbound rule allowing all traffic from within the security group. This allows the Studio notebook to communicate with the EC2 instance because they both reside within this security group.<\/p>\n<p>You can search for the security group name on the Amazon EC2 console, and you receive a suggested ID.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-20530 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/01\/12\/Edit-inbound-rules.jpg\" alt=\"\" width=\"800\" height=\"381\"><\/p>\n<p>After you add the rule, the security group has two inbound rules: one allowing SSH on port 22 from your IP to connect to the EC2 instance, and another allowing all traffic from within the security group.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-20485 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/01\/11\/Inbound-Rules.jpg\" alt=\"\" width=\"800\" height=\"209\"><\/p>\n<p>For more information about security groups, see <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/VPC_SecurityGroups.html\" target=\"_blank\" rel=\"noopener noreferrer\">Security groups for your VPC<\/a>.<\/p>\n<h2>Creating VPC-only SageMaker Studio resources<\/h2>\n<p>All SageMaker Studio resources reside within a domain, with a maximum of one domain per Region in an AWS account. A domain contains one or more users, and as a user you can open a Studio notebook. For more information about creating a domain, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_CreateDomain.html\" target=\"_blank\" rel=\"noopener noreferrer\">CreateDomain<\/a>.<\/p>\n<p>With the recent release of VPC support for Studio, you can choose from two networking options: public internet only and VPC only. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/studio-notebooks-and-internet-access.html\" target=\"_blank\" rel=\"noopener noreferrer\">Connect SageMaker Studio Notebooks to Resources in a VPC<\/a> and <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/securing-amazon-sagemaker-studio-connectivity-using-a-private-vpc\/\" target=\"_blank\" rel=\"noopener noreferrer\">Securing Amazon SageMaker Studio connectivity using a private VPC<\/a>. For this post, we create a VPC-only domain.<\/p>\n<ol>\n<li>On the SageMaker Studio console, Select <strong>Standard setup<\/strong>.<\/li>\n<\/ol>\n<p>This allows for detailed configuration.<\/p>\n<ol start=\"2\">\n<li>For <strong>Authentication method<\/strong>, select <strong>AWS Identity and Access Management (IAM)<\/strong>.<img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-20035\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/12\/17\/ML-1349-7.jpg\" alt=\"For Authentication method, select AWS Identity and Access Management (IAM).\" width=\"800\" height=\"346\">\n<\/li>\n<li>Under Permissions, choose <strong>Create a new role<\/strong>.<\/li>\n<li>Use the default settings.<\/li>\n<li>Choose <strong>Create role<\/strong>.<\/li>\n<\/ol>\n<p>This creates a new SageMaker execution role.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-20486 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/01\/11\/Permission.jpg\" alt=\"\" width=\"800\" height=\"257\"><\/p>\n<ol start=\"6\">\n<li>In the <strong>Network<\/strong> <strong>and Storage<\/strong> section, configure your VPC and subnet to match those of the EC2 instance.<\/li>\n<li>For <strong>Network Access for Studio<\/strong>, select <strong>VPC Only<\/strong>.<\/li>\n<li>For <strong>Security group(s)<\/strong>, choose the same security group as used for the EC2 instance.<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-20531 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/01\/12\/vpc.jpg\" alt=\"\" width=\"800\" height=\"612\">\n<\/li>\n<li>Choose<strong> Submit<\/strong>.<\/li>\n<\/ol>\n<p>Wait approximately a minute to see the banner notification that SageMaker Studio is ready.<\/p>\n<p>You now create a Studio user within the domain.<\/p>\n<ol start=\"10\">\n<li>Choose <strong>Add user<\/strong>.<\/li>\n<li>Give the user a name (for example, <code>studio-user<\/code>).<\/li>\n<li>Choose the role you just created, <code>AmazonSageMaker-ExecutionRole-&lt;timestamp when the role was created&gt;<\/code>.<\/li>\n<li>Choose <strong>Submit<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-20487 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/01\/11\/SageMaker-Studio-Control-Panel.jpg\" alt=\"\" width=\"800\" height=\"428\"><\/p>\n<p>This concludes the initial SageMaker Studio resource creation. You now have a Studio domain and user ready for use and can proceed with creating and using a notebook.<\/p>\n<h2>Installing a Python package onto the SageMaker Studio notebook<\/h2>\n<p>To start using the PyPI server from the SageMaker Studio notebook, complete the following steps:<\/p>\n<ol>\n<li>On the SageMaker Studio Control Panel, choose <strong>Open Studio <\/strong>next to the user name.<\/li>\n<li>Wait for your Studio environment to load.<\/li>\n<\/ol>\n<p>You can now see the Studio UI. For more information, see the <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/studio-ui.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Studio UI Overview<\/a>.<\/p>\n<ol start=\"3\">\n<li>Use the default SageMaker JumpStart Data Science image and create a new <strong>Notebook Python 3<\/strong>.<\/li>\n<li>Wait a few minutes for the image to launch and your notebook to be available.<\/li>\n<\/ol>\n<p>If you try to run a command before the notebook is available, you get the message: <code>Note: The kernel is still starting. Please execute this cell again after the kernel is started.<\/code> After your image has launched, you see it listed under <strong>Kernel Sessions<\/strong>, along with items for <strong>Running Instances<\/strong> and <strong>Running Apps<\/strong>. The kernel runs within the app, and the app runs on the instance.<\/p>\n<p>Now you\u2019re ready to configure your notebook. The first step is pip configuration, so that when you install a package using pip, your notebook searches for the package on the private PyPI server instead of through the public internet at pypi.org.<\/p>\n<ol start=\"5\">\n<li>Run the following command in a notebook cell, substituting your EC2 instance\u2019s private IP address:\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">!printf '[global]nindex-url = http:\/\/x.x.x.x\/simplentrusted-host = x.x.x.x'| sudo tee \/etc\/pip.conf<\/code><\/pre>\n<\/div>\n<\/li>\n<\/ol>\n<ol start=\"6\">\n<li>To check that the file was successfully written, run the following command:\n          <\/li>\n<\/ol>\n<p>Now you\u2019re ready to install Python packages from your server.<\/p>\n<ol start=\"7\">\n<li>To see that AWS Data Wrangler isn\u2019t installed by default, try to import it with the command:\n          <\/li>\n<\/ol>\n<ol start=\"8\">\n<li>Install the package and append to your Python path:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">!pip install awswrangler\r\nimport sys\r\nsys.path.append('\/home\/sagemaker-user\/.local\/lib\/python3.7\/site-packages')<\/code><\/pre>\n<\/div>\n<\/li>\n<\/ol>\n<p>The library was installed from your private server\u2019s index, as you specified in the pip config file, http:\/\/{EC2-IP}\/simple.<\/p>\n<div class=\"hide-language\">\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-20308 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/01\/05\/ML-1349-NEW.jpg\" alt=\"The library was installed from our private server\u2019s index, as you specified in the pip config file,\" width=\"800\" height=\"416\"><\/p>\n<\/div>\n<ol start=\"9\">\n<li>Now that the package has been installed, you can import the package smoothly:\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-20309 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/01\/05\/ML-1349-12-NEW.jpg\" alt=\"Now that the package has been installed, you can import the package smoothly:\" width=\"800\" height=\"182\"><\/p>\n<\/li>\n<\/ol>\n<p>Now your notebook is ready for development, including installation of the Python libraries of your choice! Moreover, your PyPI server remains operational and available even when you delete your notebooks or use multiple notebooks. Your PyPI server is separated from your development environment, giving you freedom to manage your notebook resources in the way that best suits your needs.<\/p>\n<h3>Cleaning up<\/h3>\n<p>To clean up your resources, complete the following steps:<\/p>\n<ol>\n<li>Shut down the running instance in the SageMaker Studio notebook.<\/li>\n<li>Delete any remaining user\u2019s apps on the SageMaker Studio console, including the default app.<\/li>\n<li>Delete the SageMaker Studio user.<\/li>\n<li>Delete Studio in the SageMaker Studio Control Panel.<\/li>\n<li>Stop the EC2 instance.<\/li>\n<li>Terminate the EC2 instance.<\/li>\n<li>Delete the IAM role, VPC endpoint, <code>studio-SG<\/code> security group, and <a href=\"https:\/\/aws.amazon.com\/efs\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic File System (EFS)<\/a> file system.<\/li>\n<li>Delete the rules in the inbound and outbound NFS security groups.<\/li>\n<li>Delete the security groups.<\/li>\n<\/ol>\n<h2>Conclusion<\/h2>\n<p>This post demonstrated how to get started with SageMaker Studio in VPC-only mode, while retaining the ability to install Python packages by hosting a private PyPI server. Now you can move forward with your ML development in notebooks residing within this secure environment.<\/p>\n<p>We invite you to explore other exciting applications of SageMaker Studio, including <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/streamline-modeling-with-amazon-sagemaker-studio-and-amazon-experiments-sdk\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Experiments<\/a> and <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/scheduling-jupyter-notebooks-on-sagemaker-ephemeral-instances\/\" target=\"_blank\" rel=\"noopener noreferrer\">scheduling notebooks on SageMaker ephemeral instances<\/a>.<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-20045 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/12\/17\/Julia-Kroll.jpg\" alt=\"\" width=\"100\" height=\"129\"><strong>Julia Kroll<\/strong> is a Data &amp; Machine Learning Engineer for AWS Professional Services. She works with enterprise and public sector customers to build data lake, analytics, and machine learning solutions.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/hosting-a-private-pypi-server-for-amazon-sagemaker-studio-notebooks-in-a-vpc\/<\/p>\n","protected":false},"author":0,"featured_media":751,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/750"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=750"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/750\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/751"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=750"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=750"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=750"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}