{"id":1143,"date":"2021-11-03T08:40:24","date_gmt":"2021-11-03T08:40:24","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/11\/03\/host-rstudio-connect-and-package-manager-for-ml-development-in-rstudio-on-amazon-sagemaker\/"},"modified":"2021-11-03T08:40:24","modified_gmt":"2021-11-03T08:40:24","slug":"host-rstudio-connect-and-package-manager-for-ml-development-in-rstudio-on-amazon-sagemaker","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/11\/03\/host-rstudio-connect-and-package-manager-for-ml-development-in-rstudio-on-amazon-sagemaker\/","title":{"rendered":"Host RStudio Connect and Package Manager for ML development in RStudio on Amazon SageMaker"},"content":{"rendered":"<div id=\"\">\n<p>Today, we <a href=\"https:\/\/aws.amazon.com\/blogs\/aws\/announcing-fully-managed-rstudio-on-amazon-sagemaker-for-data-scientists\/\" target=\"_blank\" rel=\"noopener noreferrer\">announced RStudio on Amazon SageMaker<\/a>, the first machine learning (ML) integrated development environment (IDE) in the cloud for data scientists working in R. The open-source language <a href=\"https:\/\/www.r-project.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">R<\/a> and its rich ecosystem with more than <a href=\"https:\/\/cran.r-project.org\/web\/packages\/\" target=\"_blank\" rel=\"noopener noreferrer\">18,000 packages<\/a> has been a top choice for statisticians, quant analysts, data scientists, and ML engineers. RStudio on SageMaker makes it easy for data scientists to run statistical analysis, build ML models, and create data science content on a centralized environment for the team without worrying about the compute infrastructure.<\/p>\n<p>Along with the RStudio Workbench as part of the RStudio suite for R developers are RStudio Connect and RStudio Package Manager. <a href=\"https:\/\/www.rstudio.com\/products\/connect\/\" target=\"_blank\" rel=\"noopener noreferrer\">RStudio Connect<\/a> makes it easy to surface ML and data science insights off data scientists\u2019 complicated work and put it in the hands of decision-makers. RStudio Connect is designed to allow data scientists to publish insights, dashboards, and web applications. RStudio Connect also makes hosting and managing content simple and scalable for wide consumption.<\/p>\n<p><a href=\"https:\/\/www.rstudio.com\/products\/package-manager\/\" target=\"_blank\" rel=\"noopener noreferrer\">RStudio Package Manager<\/a> helps organize and centralize R packages across ML teams and organizations. As data scientists develop their ML models, they need various packages with different capabilities for their ML use cases in RStudio. Managing the sources and versions of these packages and numerous public repositories manually for enterprise users is prone to errors and is also time-consuming. RStudio Package Manager mitigates these issues by managing the package repository centrally for your organization so that data scientists can install packages quickly and securely, and ensure project reproducibility and repeatability. Security and reproducibility are the most important aspects in regulated industries such as healthcare and finance.<\/p>\n<p>In this post, we first show you how to architect and deploy RStudio Connect and RStudio Package Manager with a well-architected solution in AWS. We then show you how to use RStudio Connect and RStudio Package Manager from RStudio on SageMaker. We use an <a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/breast+cancer+wisconsin+%28original%29\" target=\"_blank\" rel=\"noopener noreferrer\">UCI breast cancer dataset<\/a> to build out several types of ML content in R language in RStudio on SageMaker. The ML content we demonstrate in the post includes R Markdown and an R Shiny application<\/p>\n<h2>Solution overview<\/h2>\n<p>The solution architecture is based on professional versions of RStudio Connect and RStudio Package Manager Docker containers. RStudio Connect and RStudio Package Manager are configured across two <a href=\"https:\/\/docs.aws.amazon.com\/AmazonRDS\/latest\/UserGuide\/Concepts.RegionsAndAvailabilityZones.html\" target=\"_blank\" rel=\"noopener noreferrer\">Availability Zones<\/a> for high availability. Both RStudio Connect and RStudio Package Manager containers support automatic scaling to handle incoming traffic depending on the incoming number of requests, memory, and CPU usage within the containers.<\/p>\n<p>Container images are stored and fetched from <a href=\"https:\/\/aws.amazon.com\/ecr\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Container Registry<\/a> (Amazon ECR) with vulnerability scan enabled. Vulnerability issues should be addressed before deploying the images.<\/p>\n<p>The following diagram illustrates the solution architecture.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-5364-image001-1.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29879 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-5364-image001-1.png\" alt=\"\" width=\"1544\" height=\"1541\"><\/a><\/p>\n<p>The following are the steps in the solution workflow:<\/p>\n<ol>\n<li>R users access RStudio Connect and RStudio Package Manager via <a href=\"https:\/\/aws.amazon.com\/route53\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Route 53<\/a>. Route 53 is a DNS service for incoming requests.<\/li>\n<li>Route 53 resolves incoming requests and forwards those to <a href=\"https:\/\/aws.amazon.com\/waf\" target=\"_blank\" rel=\"noopener noreferrer\">AWS WAF<\/a> for security checks.<\/li>\n<li>Valid requests reach an <a href=\"https:\/\/aws.amazon.com\/elasticloadbalancing\/application-load-balancer\/\" target=\"_blank\" rel=\"noopener noreferrer\">Application Load Balancer<\/a> (ALB), which forwards these to the <a href=\"https:\/\/aws.amazon.com\/ecs\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Container Service<\/a> (Amazon ECS) cluster. The ALB checks incoming requests for an HTTPS certificate, which is issued and validated by <a href=\"https:\/\/aws.amazon.com\/certificate-manager\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Certificate Manager<\/a>.<\/li>\n<li>Amazon ECS controls the containers in a cluster of <a href=\"http:\/\/aws.amazon.com\/ec2\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Compute Cloud<\/a> (Amazon EC2) instances (EC2 launch type) in an <a href=\"https:\/\/docs.aws.amazon.com\/autoscaling\/ec2\/userguide\/AutoScalingGroup.html\" target=\"_blank\" rel=\"noopener noreferrer\">Auto Scaling group<\/a> and is responsible for scaling up and down the number of containers as needed using an <a href=\"https:\/\/docs.aws.amazon.com\/AmazonECS\/latest\/developerguide\/cluster-capacity-providers.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon ECS capacity provider<\/a>.<\/li>\n<li>Incoming requests are processed by the RStudio Connect server on any of the available RStudio Connect containers; users are authenticated and applications are rendered on the web browser. RStudio Package Manager requests are routed to the Package Manager container.<\/li>\n<li><a href=\"https:\/\/aws.amazon.com\/rds\/aurora\/serverless\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Aurora Serverless<\/a> PostgreSQL databases are used to provide high availability utilizing multiple containers for both RStudio Connect and RStudio Package Manager. Aurora backs up the serverless cluster databases automatically. Data on Aurora is encrypted at rest using <a href=\"https:\/\/aws.amazon.com\/kms\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Key Management Service<\/a> (AWS KMS).<\/li>\n<li><a href=\"https:\/\/aws.amazon.com\/efs\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic File System<\/a> (Amazon EFS) provides the persistent file system required by RStudio Connect and RStudio Package Manager. Data on Amazon EFS is encrypted at rest using AWS KMS. Amazon EFS is an NFS file system that stores data in multiple Availability Zones in an <a href=\"https:\/\/aws.amazon.com\/about-aws\/global-infrastructure\/regions_az\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Region<\/a> for data durability and high availability. Files created on the RStudio Connect and RStudio Package Manager container Amazon EFS mounts are automatically backed up by Amazon EFS.<\/li>\n<li>If the user session communicates with the public internet, outbound requests are sent to a <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/vpc-nat-gateway.html\" target=\"_blank\" rel=\"noopener noreferrer\">NAT gateway<\/a> from the private container subnet.<\/li>\n<li>The NAT gateway sends outbound requests to be processed via an <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/VPC_Internet_Gateway.html\" target=\"_blank\" rel=\"noopener noreferrer\">internet gateway<\/a>. Routes to the internet can also be configured by <a href=\"https:\/\/aws.amazon.com\/transit-gateway\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Transit Gateway.<\/a><\/li>\n<\/ol>\n<p>We use <a href=\"https:\/\/aws.amazon.com\/cdk\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Cloud Development Kit<\/a> (AWS CDK) for Python to develop the infrastructure code and store the code in an <a href=\"https:\/\/aws.amazon.com\/codecommit\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CodeCommit<\/a> repository, so that <a href=\"https:\/\/aws.amazon.com\/codepipeline\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CodePipeline<\/a> can integrate the AWS CDK stacks for automated builds.<\/p>\n<p>The deployment code utilizes <a href=\"https:\/\/docs.aws.amazon.com\/Route53\/latest\/DeveloperGuide\/AboutHZWorkingWith.html\" target=\"_blank\" rel=\"noopener noreferrer\">Route 53 public hosted zones<\/a> to service the RStudio Connect and RStudio Package Manager on publicly accessible URLs. You can use <a href=\"https:\/\/docs.aws.amazon.com\/Route53\/latest\/DeveloperGuide\/hosted-zones-private.html\" target=\"_blank\" rel=\"noopener noreferrer\">Route 53 private hosted zones<\/a> for the RStudio Connect and RStudio Package Manager containers with an internal ALB, which provides private endpoints for users coming from RStudio on SageMaker in a VPC-only connectivity mode. This means you don\u2019t need a preexisting public domain in your AWS account. However, you need to fetch the public Docker images (<a href=\"https:\/\/hub.docker.com\/r\/rstudio\/rstudio-connect\" target=\"_blank\" rel=\"noopener noreferrer\">RStudio Connect<\/a>, <a href=\"https:\/\/hub.docker.com\/r\/rstudio\/rstudio-package-manager\" target=\"_blank\" rel=\"noopener noreferrer\">RStudio Package Manager<\/a>) and store those in a private Amazon ECR repository and point the deployment code to those images for the infrastructure build.<\/p>\n<p>If all communications between AWS services must stay within AWS, you can use <a href=\"https:\/\/aws.amazon.com\/privatelink\" target=\"_blank\" rel=\"noopener noreferrer\">AWS PrivateLink<\/a> to configure VPC endpoints for AWS services. AWS PrivateLink makes sure that inter-service traffic is not exposed to the internet for AWS service endpoints.<\/p>\n<p>You can also refer to the <a href=\"https:\/\/docs.rstudio.com\/rstudio-team\/cloudformation\/\" target=\"_blank\" rel=\"noopener noreferrer\">RStudio Team solution from RStudio<\/a> to learn how to deploy an RStudio technology stack on Amazon EC2 in AWS as an alternative to the solution discussed in this post.<\/p>\n<h2>Prerequisites<\/h2>\n<p>To deploy the AWS CDK stacks from the source code, you need to review and perform the prerequisites described in the accompanying <a href=\"https:\/\/github.com\/aws-samples\/aws-fargate-with-rstudio-open-source\/tree\/rsc-rspm#prerequisites\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repository<\/a> to make sure you have the necessary resources to proceed.<\/p>\n<h2>Launch the solution<\/h2>\n<ol>\n<li>Clone the <a href=\"https:\/\/github.com\/aws-samples\/aws-fargate-with-rstudio-open-source.git\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repository<\/a>, check out the <a href=\"https:\/\/github.com\/aws-samples\/aws-fargate-with-rstudio-open-source\/tree\/rsc-rspm\">rsc-rspm branch<\/a>, and move into the aws-fargate-with-rstudio-open-source folder.\n          <\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/codecommit\/latest\/userguide\/how-to-create-repository.html\" target=\"_blank\" rel=\"noopener noreferrer\">Create a CodeCommit repository<\/a> to hold the source code for installation of RStudio Connect\/RStudio Package Manager with the following command:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">aws codecommit --profile <span>&lt;profile of AWS account&gt;<\/span> create-repository --repository-name <span>&lt;name of repository&gt;<\/span><\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Pass the required parameters in <a href=\"https:\/\/github.com\/aws-samples\/aws-fargate-with-rstudio-open-source\/blob\/rsc-rspm\/cdk.json\" target=\"_blank\" rel=\"noopener noreferrer\">cdk.json<\/a>\u00a0following Step 3 in the <strong>Installation Steps<\/strong> section of the <a href=\"https:\/\/github.com\/aws-samples\/aws-fargate-with-rstudio-open-source\/tree\/rsc-rspm#installation-steps\" target=\"_blank\" rel=\"noopener noreferrer\">readme<\/a> file.<\/li>\n<li>Install the <a href=\"https:\/\/github.com\/aws-samples\/aws-fargate-with-rstudio-open-source\/blob\/rsc-rspm\/setup.py#L37-L59\" target=\"_blank\" rel=\"noopener noreferrer\">package requirements<\/a>\u00a0for the AWS CDK application:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">python3 -m pip install -r requirements.txt<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Before committing the code into the CodeCommit repository, synthesize the AWS CDK stacks. This ensures all the necessary context values are populated into the <code>cdk.context.json<\/code> file and avoids the dummy values being mapped.\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">cdk synth --profile <span>&lt;AWS CLI profile of the account&gt;<\/span><\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Commit the changes into the CodeCommit repo you created. Follow Step 5 in the <a href=\"https:\/\/github.com\/aws-samples\/aws-fargate-with-rstudio-open-source\/tree\/rsc-rspm#installation-steps\" target=\"_blank\" rel=\"noopener noreferrer\">Installation Steps<\/a> of the readme if you need help with the Git commands.<\/li>\n<li>Deploy the AWS CDK stacks to install RStudio Connect\/RStudio Package Manager using CodePipeline. This step takes around 30 minutes.\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">cdk deploy --profile <span>&lt;AWS CLI profile of the account&gt;<\/span><\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Navigate to the <a href=\"https:\/\/us-west-2.console.aws.amazon.com\/codesuite\/codepipeline\/pipelines?region=us-west-2\" target=\"_blank\" rel=\"noopener noreferrer\">CodePipeline console<\/a> (the link takes you to the <code>us-west-2<\/code> Region). Monitor the pipeline and confirm that the services are built successfully.<\/li>\n<\/ol>\n<p>The pipeline name is <code>RSC-RSPM-App-Pipeline-<span>&lt;instance&gt;<\/span><\/code>. From this point onwards, the pipeline is triggered on commits to the CodeCommit repository you created. There is no need to run <code>cdk deploy<\/code> (Step 7) anymore.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image003.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29846\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image003.jpg\" alt=\"\" width=\"1341\" height=\"472\"><\/a><\/p>\n<ol start=\"9\">\n<li>When the pipeline installation is complete, you can access RStudio Connect and RStudio Package Manager using the following URLs, where\u00a0<code>r53_base_domain<\/code>, and <code>instance<\/code> are parameters you passed into <a href=\"https:\/\/github.com\/aws-samples\/aws-fargate-with-rstudio-open-source\/blob\/rsc-rspm\/cdk.json\" target=\"_blank\" rel=\"noopener noreferrer\">cdk.json<\/a>:\n<ol type=\"a\">\n<li><code>https:\/\/connect.<span>&lt;instance&gt;<\/span>.<span>&lt;r53_base_domain&gt;<\/span><\/code><\/li>\n<li><code>https:\/\/package.<span>&lt;instance&gt;<\/span>.<span>&lt;r53_base_domain&gt;<\/span><\/code><\/li>\n<\/ol>\n<\/li>\n<li>You can use <a href=\"https:\/\/aws.amazon.com\/blogs\/containers\/new-using-amazon-ecs-exec-access-your-containers-fargate-ec2\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon ECS Exec<\/a> to log in to both RStudio Connect and RStudio Package Manager containers. Follow the <a href=\"https:\/\/github.com\/aws-samples\/aws-fargate-with-rstudio-open-source\/tree\/rsc-rspm#notes-on-using-amazon-ecs-exec\" target=\"_blank\" rel=\"noopener noreferrer\">readme<\/a> for instructions.<\/li>\n<\/ol>\n<h2>Manage packages with RStudio Package Manager<\/h2>\n<p>RStudio Package Manager helps with enabling consistency and standardization of R packages across an organization. In RStudio Package Manager, an IT administrator can include an approved package in the repository. Multiple groups can be created to have access to different packages or package versions. RStudio Package Manager also handles all the updating and versioning of the packages. The administrator can enable automatic updates to the packages, or can also configure RStudio Package Manager in a way that the packages can only be updated manually, which provides more isolation between RStudio Package Manager and the CRAN service.<\/p>\n<h3>Configure RStudio Package Manager<\/h3>\n<p>We can create a repository that pulls the packages from the RStudio CRAN by using the <a href=\"https:\/\/docs.rstudio.com\/rspm\/admin\/quickstarts.html#quickstart-curated-and-local\" target=\"_blank\" rel=\"noopener noreferrer\">following commands<\/a>. We need to SSH into RStudio Package Manager using Amazon ECS Exec to run these commands.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\"># Initiate a sync\nrspm sync --wait \n# Create a repository:\nrspm create repo --name=dev-cran --description='Access CRAN packages'\n# Subscribe the repository to the cran source\nrspm subscribe --repo=dev-cran --source=cran \n<\/code><\/pre>\n<\/p><\/div>\n<p>The commands create a repository and subscribe it to the built-in source named <code>cran<\/code>. When this is complete, the <code>dev-cran<\/code> repository is available in the web interface of RStudio Package Manager, as shown in the following screenshot. This web interface is accessible by the administrator as well as the users who have the URL for it.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image005.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29847\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image005.jpg\" alt=\"\" width=\"1445\" height=\"1259\"><\/a><\/p>\n<p>In addition to serving CRAN packages, repositories can be created to distribute local packages, Git packages, local packages along with CRAN packages, a subset of approved CRAN and local packages, and bleeding edge packages from GitHub. For further details on how to create repositories, see <a href=\"https:\/\/docs.rstudio.com\/rspm\/admin\/quickstarts.html#quickstart-cran\" target=\"_blank\" rel=\"noopener noreferrer\">Serving CRAN Packages<\/a>. In addition, RStudio Package Manager supports <a href=\"https:\/\/docs.rstudio.com\/rspm\/admin\/getting-started\/configuration\/#quickstart-bioconductor\" target=\"_blank\" rel=\"noopener noreferrer\">Bioconductor<\/a>. Bioconductor is a commonly used ecosystem of R packages in life sciences. We can combine Bioconductor packages with CRAN as well as local packages in RStudio Package Manager.<\/p>\n<h3>RStudio Package Manager package versions<\/h3>\n<p>In the web interface of RStudio Package Manager, on the <strong>Setup<\/strong> tab, you can choose a repository by date in a calendar view. You can also choose whether to use the latest version of the packages, or freeze the packages to a particular snapshot, as shown in the following screenshot.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image007.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29848\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image007.jpg\" alt=\"\" width=\"1433\" height=\"1062\"><\/a><\/p>\n<p>On the <strong>Setup<\/strong> tab, we can also see what system prerequisites might be needed for the repository\u2019s packages, along with the commands to install them.<\/p>\n<h2>Configure an RStudio on SageMaker domain to use RStudio Connect and RStudio Package Manager<\/h2>\n<p>When creating a SageMaker domain with RStudio, you have an option to set a default RStudio Connect server and RStudio Package Manager repository for all users in your SageMaker domain. During the SageMaker domain creation process, as detailed in the <strong>Create a SageMaker domain with RStudio<\/strong> section in <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/get-started-with-rstudio-on-amazon-sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Getting Started with RStudio on Amazon SageMaker<\/a>,\u00a0you can configure default RStudio Connect and RStudio Package Manager URLs for all user profiles in <strong>Step 3: RStudio settings<\/strong>. For <strong>RStudio Connect<\/strong>, enter the RStudio Connect server URL. For <strong>RStudio Package Manager<\/strong>, enter a CRAN or a Bioconductor repository.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image009.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29849\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image009.png\" alt=\"\" width=\"2052\" height=\"977\"><\/a><\/p>\n<p>The default URLs are configured and saved in <code>\/etc\/rstudio\/rsession.conf<\/code> for all users on RStudio on SageMaker. You can verify the default repository in the R console with <code>options('repos')<\/code>. You should see a repository pointing to your RStudio Package Manager. As for the default RStudio Connect URL, it\u2019s automatically populated when you one-click publish a piece of R content.<\/p>\n<h3>Updating a repository from RStudio Package Manager in an R session<\/h3>\n<p>If you already have a working RStudio on SageMaker and want to use a different repository, you can configure your R session in RStudio on SageMaker to use a repository from your RStudio Package Manager with the following steps:<\/p>\n<ol>\n<li>In an R Session, on the <strong>Tools <\/strong>menu, choose <strong>Global Options<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image011.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29850\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image011.jpg\" alt=\"\" width=\"764\" height=\"395\"><\/a><\/li>\n<li>Choose <strong>Packages<\/strong> and then choose <strong>Change<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image013.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29851\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image013.jpg\" alt=\"\" width=\"606\" height=\"635\"><\/a><\/li>\n<li>In the <strong>Custom<\/strong> field, enter the URL for the selected repository (found on the <strong>Setup<\/strong> tab of the RStudio Package Manager web interface), and choose <strong>OK<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-5364-image015-1.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29866 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-5364-image015-1.jpg\" alt=\"\" width=\"600\" height=\"627\"><\/a><\/li>\n<li>Choose <strong>OK<\/strong> again, and we\u2019re done!<\/li>\n<\/ol>\n<p>Now, the packages that we install in RStudio are sourced from the selected repository from your RStudio Package Manager server. You can verify it with <code>options('repos')<\/code> or by installing a package and see where it is pulling from. For more details, see <a href=\"https:\/\/docs.rstudio.com\/rspm\/admin\/rstudio-server\/#checking-for-success\" target=\"_blank\" rel=\"noopener noreferrer\">Checking For Success<\/a>.<\/p>\n<h3>Update RStudio Connect account in an R session<\/h3>\n<p>If you already have a working RStudio on SageMaker and want to use a different RStudio Connect server than the default, complete the following steps:<\/p>\n<ol>\n<li>On the <strong>Tools <\/strong>menu, choose <strong>Global Options<\/strong>.<\/li>\n<li>Choose <strong>Publishing<\/strong>.<\/li>\n<li>Choose <strong>Connect<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image017.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29853\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image017.jpg\" alt=\"\" width=\"597\" height=\"627\"><\/a><\/li>\n<li>Choose <strong>RStudio Connect<\/strong>.<\/li>\n<li>Enter your server public URL, for example, <code>https:\/\/xxxx.rstudioconnect.com<\/code>, and choose <strong>Next<\/strong>.<\/li>\n<\/ol>\n<p>A new page appears to ask you to log in with an account if this is the first time.<\/p>\n<ol start=\"6\">\n<li>Choose <strong>Connect<\/strong> to proceed.<\/li>\n<li>Choose <strong>Connect Account<\/strong> in the dialog in RStudio.<\/li>\n<\/ol>\n<p>You should see you RStudio Connect user profile and server URL in the list.<\/p>\n<ol start=\"8\">\n<li>Choose <strong>Apply <\/strong>then <strong>OK<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image019.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29854\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image019.jpg\" alt=\"\" width=\"598\" height=\"624\"><\/a><\/li>\n<\/ol>\n<p>For more information, see <a href=\"https:\/\/docs.rstudio.com\/how-to-guides\/pre-tasks\/connect-account\/\" target=\"_blank\" rel=\"noopener noreferrer\">Connect your RStudio Account<\/a>, and <a href=\"https:\/\/docs.rstudio.com\/connect\/user\/connecting\/\" target=\"_blank\" rel=\"noopener noreferrer\">Connecting: RStudio IDE<\/a>.<\/p>\n<p>Now the RStudio Connect server is successfully connected to the RStudio on Amazon SageMaker. We\u2019re ready to build some great content and publish.<\/p>\n<h2>Build ML content in RStudio on Amazon SageMaker<\/h2>\n<p>You can easily create an analysis within RStudio on Amazon SageMaker and push-button publish it to your RStudio Connect so that your collaborators can consume your analysis. For this post, we use a <a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/breast+cancer+wisconsin+%28original%29\" target=\"_blank\" rel=\"noopener noreferrer\">UCI breast cancer dataset<\/a> from <code>mlbench<\/code> to walk through some of the common use cases of publication: R Markdown and Shiny app.<\/p>\n<h3>R Markdown<\/h3>\n<p>R Markdown is a great tool to run your analyses in R as part of a markdown file and share in RStudio Connect. In <code><a href=\"https:\/\/github.com\/aws\/amazon-sagemaker-examples\/tree\/master\/r_examples\/rsconnect_rmarkdown\/breast_cancer_eda.Rmd\" target=\"_blank\" rel=\"noopener noreferrer\">rsconnect_rmarkdown\/breast_cancer_eda.Rmd<\/a><\/code>, we perform two simple analyses and plotting on the dataset along with the texts in markdown:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">```{r breastcancer}\ndata(BreastCancer)\ndf &lt;- BreastCancer\n# convert input values to numeric\nfor(i in 2:10) {\n  df[,i] &lt;- as.numeric(as.character(df[,i]))\n}\nsummary(df)\n```\n\n```{r cl_thickness, echo=FALSE}\nggplot(df, aes(x=Cl.thickness))+\n       geom_histogram(color=\"black\", fill=\"white\", binwidth = 1)+\n       facet_grid(Class ~ .)\n```<\/code><\/pre>\n<\/p><\/div>\n<p>We can preview the file by choosing <strong>Knit<\/strong> and publish it to RStudio Connect by choosing <strong>Publish<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image021.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29855\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image021.jpg\" alt=\"\" width=\"752\" height=\"384\"><\/a>Besides R Markdown, more often than not, you\u2019re building an interactive application or dashboard with Shiny. Let\u2019s look at how we can publish Shiny apps from RStudio on Amazon SageMaker to RStudio Connect.<\/p>\n<h3>Shiny application<\/h3>\n<p><a href=\"https:\/\/shiny.rstudio.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Shiny<\/a> is an R package that makes it easy to create interactive web applications programmatically. It\u2019s popular among data scientists to share their analyses and models through a Shiny application to their stakeholders. In <code><a href=\"https:\/\/github.com\/aws\/amazon-sagemaker-examples\/tree\/master\/r_examples\/rsconnect_shiny\/breast-cancer-app\/\" target=\"_blank\" rel=\"noopener noreferrer\">rsconnect_shiny\/breast-cancer-app\/<\/a><\/code>, we develop an ML model in <code><a href=\"https:\/\/github.com\/aws\/amazon-sagemaker-examples\/tree\/master\/r_examples\/rsconnect_shiny\/breast-cancer-app\/breast_cancer_modeling.r\" target=\"_blank\" rel=\"noopener noreferrer\">breast_cancer_modeling.r<\/a><\/code> and create a web application to allow users to interact with the data and ML model.<\/p>\n<p>To publish, open <code>app.R<\/code> and choose<strong> Publish<\/strong>. Select both <code>app.R<\/code> and <code>breast_cancer_modeling.r<\/code> to publish.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image023.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29856\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-5364-image023.jpg\" alt=\"\" width=\"577\" height=\"417\"><\/a><\/p>\n<p>In the application, you can change two features to visualize in the plot and select the data points in the plot to see actual data and model predictions of whether they are benign or malignant cancer cases. By sliding the probability threshold, you can interact with the model and get a different classification counts. You can see the dashboard in action in the following screenshot.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/shiny-dashboard-breast-cancer2.gif\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29883 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/shiny-dashboard-breast-cancer2.gif\" alt=\"\" width=\"1882\" height=\"1276\"><\/a><\/p>\n<h2>Conclusion<\/h2>\n<p>In this post, we showed you how to deploy RStudio Connect and RStudio Package Manager servers in AWS with an architecture based on AWS Fargate and Amazon ECS, using AWS CDK. With RStudio Connect and RStudio Package Manager running in the cloud, we showed you how to use them from RStudio on Amazon SageMaker. Then we demonstrated how to deploy R-based materials such as R Markdown and Shiny applications to the RStudio Connect instance based on a breast cancer prediction use case.<\/p>\n<p>Having an RStudio Connect instance in the cloud not only enables your ML and data science teams to collaborate more effectively, but also makes sharing ML insights across stakeholders and business units much easier. This in turn promotes the use of ML in your organization for a better business outcome. With RStudio Package Manager, you can quickly and securely manage, serve, and install R packages from trusted sources to ensure project reproducibility.<\/p>\n<p>You can learn more about RStudio on SageMaker from a data scientist\u2019s perspective in the post <a href=\"https:\/\/aws.amazon.com\/blogs\/aws\/announcing-fully-managed-rstudio-on-amazon-sagemaker-for-data-scientists\" target=\"_blank\" rel=\"noopener noreferrer\">Announcing Fully Managed RStudio on Amazon SageMaker for Data Scientists<\/a>. You can also learn more about how to set up and administer RStudio on SageMaker in the post <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/get-started-with-rstudio-on-amazon-sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Getting started with RStudio on Amazon SageMaker<\/a>. To learn more about Amazon SageMaker Studio, the first IDE for ML in the cloud, see <a href=\"https:\/\/aws.amazon.com\/sagemaker\/studio\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Studio<\/a>.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/08\/18\/Michael-Hsieh.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-27322 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/08\/18\/Michael-Hsieh.jpg\" alt=\"\" width=\"100\" height=\"111\"><\/a>Michael Hsieh<\/strong> is a Senior AI\/ML Specialist Solutions Architect. He works with customers to advance their ML journey with a combination of Amazon Machine Learning offerings and his ML domain knowledge. As a Seattle transplant, he loves exploring the great mother nature the region has to offer, such as the hiking trails, scenery kayaking in the SLU, and the sunset at the Shilshole Bay.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/Chayan-Panda-1.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-29860 size-full alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/Chayan-Panda-1.jpg\" alt=\"\" width=\"100\" height=\"100\"><\/a> Chayan Panda<\/strong> is a Cloud Infrastructure Architect. He provides advisory services and thought leadership to AWS customers on robust solution design for cloud migrations, cloud infrastructure (security, network, DevOps), Greenfield platform implementations, big data\/AI\/ML, and serverless and database solutions. When he is not obsessing about customers, he enjoys a short run, music, a book, or travel with his family.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/Farooq-Sabir-1.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-29861 size-full alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/Farooq-Sabir-1.jpg\" alt=\"\" width=\"100\" height=\"100\"><\/a>Farooq Sabir<\/strong> is a Senior AI\/ML Specialist Solutions Architect. He helps customers solve their business problems using data science, machine learning, and artificial intelligence.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/host-rstudio-connect-and-package-manager-for-ml-development-in-rstudio-on-amazon-sagemaker\/<\/p>\n","protected":false},"author":0,"featured_media":1144,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1143"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1143"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1143\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1144"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1143"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1143"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1143"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}