{"id":1093,"date":"2021-10-28T08:40:18","date_gmt":"2021-10-28T08:40:18","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/10\/28\/enhance-your-machine-learning-development-by-using-a-modular-architecture-with-amazon-sagemaker-projects\/"},"modified":"2021-10-28T08:40:18","modified_gmt":"2021-10-28T08:40:18","slug":"enhance-your-machine-learning-development-by-using-a-modular-architecture-with-amazon-sagemaker-projects","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/10\/28\/enhance-your-machine-learning-development-by-using-a-modular-architecture-with-amazon-sagemaker-projects\/","title":{"rendered":"Enhance your machine learning development by using a modular architecture with Amazon SageMaker projects"},"content":{"rendered":"<div id=\"\">\n<p>One of the main challenges in a machine learning (ML) project implementation is the variety and high number of development artifacts and tools used. This includes code in notebooks, modules for data processing and transformation, environment configuration, inference pipeline, and orchestration code. In production workloads, the ML model created within your development framework is almost never the end of the work, but is a part of a larger application or workflow.<\/p>\n<p>Another challenge is the varied nature of ML development activities performed by different user roles. For example, the DevOps engineer develops infrastructure components, such as CI\/CD automation, builds production inference pipelines, and configures security and networking. The data engineer is typically focused on data processing and transformation workflows. The data scientist or ML engineer delivers ML models and model building, training, and validation pipelines.<\/p>\n<p>These challenges call for an architecture and framework that facilitate separation of concerns by allowing each development role to work on their own part of the system, and hide the complexity of integration, security, and environment configuration.<\/p>\n<p>This post illustrates how to introduce a modular component-based architecture in your ML application by implementing reusable, self-contained, and consistent components with <a href=\"https:\/\/aws.amazon.com\/pm\/sagemaker\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a>.<\/p>\n<h2>Solution overview<\/h2>\n<p>As an example of an ML workflow that spans several development domains, the proposed solution implements a use case of an automated pipeline for data transformation, feature extraction, and ingestion into <a href=\"https:\/\/aws.amazon.com\/sagemaker\/feature-store\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Feature Store<\/a>.<\/p>\n<p>On a high level, the workflow comprises the following functional steps:<\/p>\n<ol>\n<li>An upstream data ingestion component uploads data objects to an <a href=\"https:\/\/aws.amazon.com\/s3\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) bucket.<\/li>\n<li>The data upload event launches a data processing and transformation process.<\/li>\n<li>The data transformation process extracts, processes, and transforms features, and ingests them into a designated <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/feature-store-getting-started.html\" target=\"_blank\" rel=\"noopener noreferrer\">feature group<\/a> in Feature Store.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image002.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29641\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image002.png\" alt=\"\" width=\"2124\" height=\"691\"><\/a><\/li>\n<\/ol>\n<h2>Terminology<\/h2>\n<p>This section introduces the following important concepts and definitions.<\/p>\n<h3>ML component<\/h3>\n<p>An <em>ML component<\/em> is a construction unit that contains all the required resources, configuration, and workflows to perform a specific ML task. For example, the proposed data transformation and ingestion pipeline can be delivered as an ML component. ML components have a better integration capability to help you to implement reproducible, governed, and secure ML applications. An ML component can encapsulate all the boilerplate code required to properly set up data access permissions, security keys, tagging, naming, and logging requirements for all resources.<\/p>\n<p>A process of implementing an ML component assumes that a dedicated DevOps or MLOps team performs the design, building, testing, and distribution of components. The recipients of ML components are data scientists, data engineers, and ML engineers.<\/p>\n<p>This separation of development responsibilities brings higher agility, a faster time to market, and less manual heavy lifting, and results in a higher quality and consistency of your ML workflows.<\/p>\n<h3>Amazon SageMaker project<\/h3>\n<p>SageMaker facilitates the development and distribution of ML components with <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-projects-whatis.html\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker projects<\/a>.<\/p>\n<p>A SageMaker <em>project<\/em> is a self-sufficient collection of resources, which can be instantiated and used by the entitled users. A project contains all the resources, artifacts, source code, orchestration, and permissions that are needed to perform a designated ML task or workflow. For example, SageMaker provides <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-projects-templates.html\" target=\"_blank\" rel=\"noopener noreferrer\">MLOps project templates<\/a> to automate setup and implementation of MLOps for your applications.<\/p>\n<p>You can implement a <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-projects-templates-custom.html\" target=\"_blank\" rel=\"noopener noreferrer\">custom SageMaker project template<\/a> to deliver a packaged ML workflow, which can be distributed and provisioned via an <a href=\"https:\/\/aws.amazon.com\/sagemaker\/studio\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Studio<\/a> IDE.<\/p>\n<p>When you implement custom reusable components with SageMaker projects, you can separate the development, testing, and distribution process for ML components from their employment, and follow MLOps best practices.<\/p>\n<h3>Product portfolio<\/h3>\n<p>A project works together with two other AWS services, <a href=\"https:\/\/aws.amazon.com\/servicecatalog\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Service Catalog<\/a> and <a href=\"https:\/\/aws.amazon.com\/cloudformation\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CloudFormation<\/a>, to provide an end-to-end, user-friendly integration in your SageMaker environment and Studio. You can combine multiple projects in a <em>portfolio<\/em>. A SageMaker project is called <em>product<\/em> in the portfolio scope. A product portfolio is delivered via AWS Service Catalog into Studio. You can control who can view and provision specific products by associating user roles with a designated portfolio.<\/p>\n<h2>Solution architecture<\/h2>\n<p>The detailed component architecture of the solution is presented in the following diagram.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image004.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29642\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image004.png\" alt=\"\" width=\"1630\" height=\"1192\"><\/a><\/p>\n<p>A product portfolio (1) defines the automated Feature Store data ingestion product (2) together with the associated user roles that are allowed to use the portfolio and the containing products. CloudFormation templates define both the product portfolio (1) and the product (2). A CloudFormation template (3) contains all the resources, source code, configuration, and permissions that are needed to provision the product in your SageMaker environment.<\/p>\n<p>When AWS CloudFormation deploys the product, it creates a new SageMaker project (4).<\/p>\n<p>The SageMaker project implements the feature ingestion workflow (5). The workflow contains an <a href=\"https:\/\/aws.amazon.com\/lambda\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Lambda<\/a> function, which is launched by an <a href=\"https:\/\/aws.amazon.com\/eventbridge\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon EventBridge <\/a>rule each time new objects are uploaded into a monitored S3 bucket. The Lambda function starts a <a href=\"https:\/\/aws.amazon.com\/sagemaker\/pipelines\/\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker pipeline<\/a> (6), which is defined and provisioned as a part of the SageMaker project. The pipeline implements data transformation and ingestion in Feature Store.<\/p>\n<p>The project also provisions CI\/CD automation (7) with an <a href=\"https:\/\/aws.amazon.com\/codecommit\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CodeCommit<\/a> repository with source code, <a href=\"https:\/\/aws.amazon.com\/codebuild\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CodeBuild<\/a> with a pipeline build script, and <a href=\"https:\/\/aws.amazon.com\/codepipeline\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CodePipeline<\/a> to orchestrate the build and deployment of the SageMaker pipeline (6).<\/p>\n<h3>ML pipeline<\/h3>\n<p>This solution implements an ML pipeline by using <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/pipelines.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Pipelines<\/a>, an ML workflow creation and orchestration framework. The pipeline contains a single step with an <a href=\"https:\/\/aws.amazon.com\/sagemaker\/data-wrangler\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Data Wrangler<\/a> processor for data transformation and ingestion into a feature group in Feature Store. The following diagram shows a data processing pipeline implemented by this solution.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image006.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29643\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image006.png\" alt=\"\" width=\"1182\" height=\"594\"><\/a><\/p>\n<p>Refer to <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/build-tune-and-deploy-an-end-to-end-churn-prediction-model-using-amazon-sagemaker-pipelines\/\" target=\"_blank\" rel=\"noopener noreferrer\">Build, tune, and deploy an end-to-end churn prediction model using Amazon SageMaker Pipelines<\/a> for an example of how to build and use a SageMaker pipeline.<\/p>\n<p>The rest of this post walks you through the implementation of a custom SageMaker project. We discuss how to do the following:<\/p>\n<ul>\n<li>Create a project with your resources<\/li>\n<li>Understand the project lifecycle<\/li>\n<li>View project resources<\/li>\n<li>Create a Studio domain and deploy a product portfolio<\/li>\n<li>Work with the project and run a data transformation and ingestion pipeline<\/li>\n<\/ul>\n<p>The <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repository<\/a> provides the full source code for the end-to-end solution. You can use this code as a starting point for your own custom ML components to deploy using this same reference architecture.<\/p>\n<h2>Author a SageMaker project template<\/h2>\n<p>To get started with a custom SageMaker project, you need the following resources, artifacts, and <a href=\"https:\/\/aws.amazon.com\/iam\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) roles and permissions:<\/p>\n<ul>\n<li>A CloudFormation template that defines an AWS Service Catalog<a href=\"https:\/\/docs.aws.amazon.com\/servicecatalog\/latest\/adminguide\/what-is_concepts.html\" target=\"_blank\" rel=\"noopener noreferrer\"> portfolio<\/a>.<\/li>\n<li>A CloudFormation template that defines a SageMaker project.<\/li>\n<li>IAM roles and permissions needed to run your project components and perform the project\u2019s tasks and workflows.<\/li>\n<li>If your project contains any source code delivered as a part of the project, this code must be also delivered. The solution refers to this source code as the <em>seed code<\/em>.<\/li>\n<\/ul>\n<h3>Files in this solution<\/h3>\n<p>This solution contains all the source code needed to create your custom SageMaker project. The structure of the code repository is as follows:<\/p>\n<ul>\n<li><strong>cfn-templates folder<\/strong>: This folder contains the following:\n<ul>\n<li><strong>project-s3-fs-ingestion.yaml<\/strong> \u2013 A CloudFormation template with the SageMaker project<\/li>\n<li><strong>sm-project-sc-portfolio.yaml<\/strong> \u2013 A CloudFormation template with the product portfolio and managed policies with permissions needed to deploy the product<\/li>\n<\/ul>\n<\/li>\n<li><strong>project-seed-code\/s3-fs-ingestion folder<\/strong> \u2013 Contains the project seed code, including the SageMaker pipeline definition code, build scripts for the CI\/CD CodeBuild project, and source code for the Lambda function<\/li>\n<li><strong>notebooks folder<\/strong> \u2013 Contains the SageMaker notebooks to experiment with the project<\/li>\n<\/ul>\n<p>The following sections describe each part of the project authoring process and give examples of the source code.<\/p>\n<h3>AWS Service Catalog portfolio<\/h3>\n<p>An AWS Service Catalog portfolio is delivered as a CloudFormation template, which defines the following resources:<\/p>\n<ul>\n<li>Portfolio definition.<\/li>\n<li>Product definition.<\/li>\n<li>Product to portfolio association for each product.<\/li>\n<li>Portfolio to <a href=\"https:\/\/docs.aws.amazon.com\/IAM\/latest\/UserGuide\/intro-structure.html#intro-structure-principal\" target=\"_blank\" rel=\"noopener noreferrer\">IAM principle<\/a> association. This defines which IAM principles are allowed to deploy portfolio products.<\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/servicecatalog\/latest\/adminguide\/constraints-launch.html\" target=\"_blank\" rel=\"noopener noreferrer\">Product launch role constraint<\/a>. This defines which IAM role AWS CloudFormation assumes when a user provisions the template.<\/li>\n<\/ul>\n<p>To make your project template available in Studio, you must add the following tag to the product:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">    Tags:\n    - Key: 'sagemaker:studio-visibility'\n        Value: 'true'<\/code><\/pre>\n<\/p><\/div>\n<p>Refer to<a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-projects-templates-custom.html\" target=\"_blank\" rel=\"noopener noreferrer\"> Create Custom Project Templates<\/a> for more details on custom project templates.<\/p>\n<p>This solution contains an <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components\/blob\/main\/cfn-templates\/sm-project-sc-portfolio.yaml\" target=\"_blank\" rel=\"noopener noreferrer\">example<\/a> of an AWS Service Catalog portfolio that contains a single product.<\/p>\n<h3>Product CloudFormation template<\/h3>\n<p>A CloudFormation template defines the product. The product\u2019s template is self-sufficient and contains all the resources, permissions, and artifacts that are needed to deliver the product\u2019s functionality.<\/p>\n<p>For the product to work with SageMaker projects, you must add the following parameters to your product template:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">SageMakerProjectName:\n    Type: String\n    Description: Name of the project\n    MinLength: 1\n    MaxLength: 32\n    AllowedPattern: ^[a-zA-Z](-*[a-zA-Z0-9])*\n\n  SageMakerProjectId:\n    Type: String\n    Description: Service generated Id of the project.<\/code><\/pre>\n<\/p><\/div>\n<p>This solution contains a <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components\/blob\/main\/cfn-templates\/project-s3-fs-ingestion.yaml\" target=\"_blank\" rel=\"noopener noreferrer\">product template<\/a> that creates several resources.<\/p>\n<p>For the data transformation and ingestion pipeline, the template creates the following:<\/p>\n<ul>\n<li>A SageMaker pipeline definition source code.<\/li>\n<li>A Lambda function to start the SageMaker pipeline whenever a new object is uploaded to the monitored S3 bucket.<\/li>\n<li>An IAM execution role for the Lambda function.<\/li>\n<li>An S3 bucket to keep an<a href=\"https:\/\/aws.amazon.com\/cloudtrail\/\" target=\"_blank\" rel=\"noopener noreferrer\"> AWS CloudTrail <\/a>log. You need a CloudTrail log to enable EventBridge notification for object put events on the monitored bucket. You use the CloudTrail-based notification instead of Amazon S3 notifications because you must not overwrite an existing Amazon S3 notification on the monitored bucket.<\/li>\n<li>A CloudTrail log configured to capture <code>WriteOnly<\/code> events on S3 objects under a specified S3 prefix.<\/li>\n<li>An EventBridge rule to launch the Lambda function whenever a new object is uploaded to the monitored S3 bucket. The EventBridge rule pattern monitors the events <code>PutObject<\/code> and <code>CompleteMultipartUpload<\/code>.<\/li>\n<\/ul>\n<p>For CI\/CD automation, the template creates the following:<\/p>\n<ul>\n<li>An S3 bucket to store CodePipeline artifacts<\/li>\n<li>A CodeCommit repository with the SageMaker pipeline definition<\/li>\n<li>An EventBridge rule to launch CodePipeline when the CodeCommit repository is updated<\/li>\n<li>A CodeBuild project to build the SageMaker pipeline<\/li>\n<li>A CodePipeline pipeline to orchestrate the build of the SageMaker pipeline<\/li>\n<\/ul>\n<h3>IAM roles and permissions<\/h3>\n<p>To launch and use a SageMaker project, you need two IAM roles:<\/p>\n<ul>\n<li><strong>An IAM role to launch a product from AWS Service Catalog<\/strong> \u2013 This rule is assumed by AWS Service Catalog and contains permission specifically needed to deploy resources using CloudFormation templates. The AWS Service Catalog-based approach allows data scientists and ML engineers to provision custom ML components and workflows centrally without requiring each ML user to have high-profile permissions policies or going via a manual and non-reproducible individual deployment process.<\/li>\n<li><strong>An IAM role to use resources created by a SageMaker project<\/strong> \u2013 These resources include a CodePipeline pipeline, a SageMaker pipeline, and an EventBridge rule. The project\u2019s CloudFormation template explicitly specifies which resource uses which role.<\/li>\n<\/ul>\n<p>When you enable SageMaker projects for Studio users, the provisioning process creates two IAM roles in your AWS account: <code>AmazonSageMakerServiceCatalogProductsLaunchRole<\/code> and <code>AmazonSageMakerServiceCatalogProductsUseRole<\/code>. The<a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-projects-templates-sm.html\" target=\"_blank\" rel=\"noopener noreferrer\"> SageMaker-provided project templates<\/a> use these roles to deploy and operate the created resources. You can use these roles for your custom SageMaker projects, or you can create your own roles with a specific set of IAM permissions suited to your requirements. Make sure these roles are given all necessary permissions, specifically S3 bucket access, to perform their tasks.<\/p>\n<p>Refer to <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/security-iam-awsmanpol-sc.html\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Managed Policies for SageMaker projects and JumpStart<\/a> for more details on the default roles.<\/p>\n<p>If you create and assign any IAM roles to resources created by the project provisioning via AWS Service Catalog and AWS CloudFormation, the role <code>AmazonSageMakerServiceCatalogProductsLaunchRole<\/code> must have <code>iam:PassRole<\/code> permission for a role you pass to a resource. For example, this solution creates an IAM execution role for the Lambda function. The managed policy for <code>AmazonSageMakerServiceCatalogProductsLaunchRole<\/code> contains the corresponding permission statement:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">- Sid: FSIngestionPermissionPassRole\n    Effect: Allow\n    Action:\n    - 'iam:PassRole'\n    Resource:\n    - !Sub 'arn:aws:iam::${AWS::AccountId}:role\/*StartIngestionPipeline*'<\/code><\/pre>\n<\/p><\/div>\n<p>The following diagram shows all the IAM roles involved and which service or resource assumes which role.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image009.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29644\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image009.png\" alt=\"\" width=\"1620\" height=\"920\"><\/a><\/p>\n<p>The architecture contains the following components:<\/p>\n<ol>\n<li>The SageMaker Service Catalog products launch role. This role calls the <code>iam:PassRole<\/code> API for the SageMaker Service Catalog products use role (2) and the Lambda execution role (4).<\/li>\n<li>The SageMaker Service Catalog products use role. Project resources assume this role to perform their tasks.<\/li>\n<li>The SageMaker execution role. Studio notebooks use this role to access all resources, including S3 buckets.<\/li>\n<li>The Lambda execution role. The Lambda function assumes this role.<\/li>\n<li>The Lambda function <a href=\"https:\/\/docs.aws.amazon.com\/lambda\/latest\/dg\/access-control-resource-based.html\" target=\"_blank\" rel=\"noopener noreferrer\">resource policy<\/a> allows EventBridge to invoke the function.<\/li>\n<\/ol>\n<p>Refer to<a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-projects-studio-updates.html\" target=\"_blank\" rel=\"noopener noreferrer\"> SageMaker Studio Permissions Required to Use Projects<\/a> for more details on the Studio permission setup for projects.<\/p>\n<h3>Project seed code<\/h3>\n<p>If your custom SageMaker project uses CI\/CD workflow automation or contains any source code-based resources, you can deliver the seed code as a CodeCommit or third-party Git repository such as GitHub and Bitbucket. The project user owns the code and can customize it to implement their requirements.<\/p>\n<p>This solution delivers the seed code, which contains a SageMaker pipeline definition. The project also creates a CI\/CD workflow to build the SageMaker pipeline. Any commit to the source code repository launches the CodePipeline pipeline.<\/p>\n<h3>Project lifecycle<\/h3>\n<p>A project passes through distinct lifecycle stages: you create a project, use it and its resources, and delete the project when you don\u2019t need it anymore. Studio UX integrates end-to-end SageMaker projects including project resources, data lineage, and lifecycle control.<\/p>\n<h4>Create a project<\/h4>\n<p>You can provision a SageMaker project directly in your Studio IDE or via the <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_CreateProject.html\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker API.<\/a><\/p>\n<p>To create a new SageMaker project in Studio, complete the following steps:<\/p>\n<ol>\n<li>On the <strong>SageMaker resources<\/strong> page, choose <strong>Projects<\/strong> on the drop-down menu.<\/li>\n<li>Choose <strong>Create project<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image011-1.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29652\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image011-1.jpg\" alt=\"\" width=\"500\" height=\"540\"><\/a><\/li>\n<li>Choose <strong>Organization templates<\/strong>.<\/li>\n<li>Choose the template for the project you want to provision.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image013-2.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29659 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image013-2.jpg\" alt=\"\" width=\"800\" height=\"244\"><\/a><\/li>\n<li>Enter a name and optional description for your project.<\/li>\n<li>Under <strong>Project template parameters<\/strong>, provide your project-specific parameters.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image015new.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29658 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image015new.jpg\" alt=\"\" width=\"800\" height=\"659\"><\/a><\/li>\n<\/ol>\n<p>You can also use the Python SDK to create a project programmatically, as shown in this code snippet from the <code>01-feature-store-ingest-pipeline<\/code> notebook:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">sm = boto3.client(\"sagemaker\")\n\n# set project_parameters\n# project_parameters = [\n#    {\n#        'Key': 'PipelineDescription',\n#        'Value': 'Feature Store ingestion pipeline'\n#    },\n#       ...\n#]\n\nr = sm.create_project(\n    ProjectName=project_name,\n    ProjectDescription=\"Feature Store ingestion from S3\",\n    ServiceCatalogProvisioningDetails={\n        'ProductId': product_id,\n        'ProvisioningArtifactId': provisioning_artifact_ids,\n        'ProvisioningParameters': project_parameters\n    },\n)<\/code><\/pre>\n<\/p><\/div>\n<p>Each project is provisioned via an AWS Service Catalog and AWS CloudFormation process. Because you have the corresponding IAM access policy, for example <a href=\"https:\/\/console.aws.amazon.com\/iam\/home#\/policies\/arn:aws:iam::aws:policy\/AWSCloudFormationReadOnlyAccess\" target=\"_blank\" rel=\"noopener noreferrer\">AWSCloudFormationReadOnlyAccess<\/a>, you can observe the project deployment on the AWS CloudFormation console. As shown in the following screenshot, you can browse stack info, events, resources, outputs, parameters, and the template.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image017.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29648\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image017.png\" alt=\"\" width=\"2578\" height=\"1164\"><\/a><\/p>\n<h4>View project resources<\/h4>\n<p>After you provision the project, you can browse SageMaker-specific project resources in the Studio IDE.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image019-2.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29657 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image019-2.jpg\" alt=\"\" width=\"800\" height=\"312\"><\/a><\/p>\n<p>You can also see all the resources created by the project deployment process on the AWS CloudFormation console.<\/p>\n<p>Any resource created by the project is automatically tagged with two tags: <code>sagemaker:project-name<\/code> and <code>sagemaker:project-id<\/code>, allowing for data and resource lineage.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image021.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29650\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/22\/ML6147-image021.png\" alt=\"\" width=\"1816\" height=\"486\"><\/a><\/p>\n<p>You can add your own tags to project resources, for example, to fulfill your specific resource tagging and naming requirements.<\/p>\n<h4>Delete project<\/h4>\n<p>If you don\u2019t need the provisioned project any more, to stop incurring charges, you must delete it to clean up the resources created by the project.<\/p>\n<p>At the time of writing this post, you must use the <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_DeleteProject.html\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker API<\/a> to delete a project. A sample Python code looks like the following:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">import boto3\n\nsm_client=boto3.client(\"sagemaker\")\nsm_client.delete_project(ProjectName=\"MyProject\")<\/code><\/pre>\n<\/p><\/div>\n<p>Deleting the project also initiates the deletion of the CloudFormation stack with the project template.<\/p>\n<p>A project can create other resources, such as objects in S3 buckets, ML models, feature groups, inference endpoints, or CloudFormation stacks. These resources may not be removed upon project deletion. Refer to the specific project documentation for how to perform a full cleanup.<\/p>\n<p>This solution provides a <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components\/blob\/main\/notebooks\/99-clean-up.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">Studio notebook<\/a> to delete all the resources created by the project.<\/p>\n<h2>Deploy the solution<\/h2>\n<p>To deploy the solution, you must have administrator (or power user) permissions to package the CloudFormation templates, upload the templates in your S3 bucket, and run the deployment commands.<\/p>\n<p>To start working with the solution\u2019s notebooks, provision a project, and run a data transformation and ingestion pipeline, you must complete the following deployment steps from the solution\u2019s <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components#deployment\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub README file<\/a>:<\/p>\n<ol>\n<li>Clone the solution\u2019s<a href=\"https:\/\/gitlab.aws.dev\/ilyiny\/amazon-sagemaker-reusable-components\" target=\"_blank\" rel=\"noopener noreferrer\"> GitHub repo <\/a>to your local development environment.<\/li>\n<li>Create a Studio domain (instructions in the <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components#create-sagemaker-studio\" target=\"_blank\" rel=\"noopener noreferrer\">README file<\/a>).<\/li>\n<li>Deploy the SageMaker project portfolio (instructions in the <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components#deploy-sagemaker-project-portfolio\" target=\"_blank\" rel=\"noopener noreferrer\">README file<\/a>).<\/li>\n<li>Add custom permissions to the AWS Service Catalog launch and SageMaker execution IAM roles (instructions in the <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components#add-permissions-to-service-catalog-launch-and-sagemaker-execution-iam-roles\" target=\"_blank\" rel=\"noopener noreferrer\">README file<\/a>).<\/li>\n<li>Start Studio and clone the GitHub repository into your SageMaker environment (instructions in the <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components#start-studio\" target=\"_blank\" rel=\"noopener noreferrer\">README file<\/a>).<\/li>\n<\/ol>\n<h2>Solution walkthrough<\/h2>\n<p>The <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components\/tree\/main\/notebooks\" target=\"_blank\" rel=\"noopener noreferrer\">delivered notebooks<\/a> take you through the following solution steps:<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components\/blob\/main\/notebooks\/00-setup.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">Setup<\/a>:\n<ul>\n<li>Set up the working environment, create an S3 bucket for data upload, download and explore the test dataset<\/li>\n<li>Optionally, create a Data Wrangler flow for data transformation and feature ingestion<\/li>\n<li>Create a feature group in Feature Store where features are kept<\/li>\n<li>Query the data from the feature group<\/li>\n<\/ul>\n<\/li>\n<li><a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components\/blob\/main\/notebooks\/01-feature-store-ingest-pipeline.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">Feature Store ingestion pipeline<\/a>:\n<ul>\n<li>Provision a SageMaker project with a data pipeline<\/li>\n<li>Explore the project resources<\/li>\n<li>Test the data pipeline by uploading new data to the monitored S3 bucket<\/li>\n<li>Run the data pipeline on demand via Python SDK<\/li>\n<li>Query the data from the feature group<\/li>\n<\/ul>\n<\/li>\n<li><a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components\/blob\/main\/notebooks\/99-clean-up.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">Clean up<\/a>:\n<ul>\n<li>Delete the project and project\u2019s resources<\/li>\n<li>Delete the feature group<\/li>\n<li>Delete project-provisioned S3 buckets and S3 objects<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Clean up<\/h2>\n<p>To avoid charges, you must remove all project-provisioned and generated resources from your AWS account.<\/p>\n<p>Follow the instructions in the solution\u2019s <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-reusable-components#clean-up\" target=\"_blank\" rel=\"noopener noreferrer\">README file<\/a>.<\/p>\n<h2>Call to action<\/h2>\n<p>In this post, you learned how to create ML components for your modular architecture using SageMaker projects. SageMaker projects offer a convenient and AWS-native method to package and deliver reusable units to implement ML workflows. Integrating SageMaker projects with SageMaker Pipelines and CI\/CD CodePipeline automation gives you power tools to follow MLOps best practices and increase the speed and quality of your development work.<\/p>\n<p>Your ML workflows and pipelines may benefit from being encapsulated into a reusable and parametrizable component. Now you can implement this component using the described approach with SageMaker projects.<\/p>\n<h2>Additional references<\/h2>\n<p>For more hands-on examples of using SageMaker projects and pipelines for various use cases, see the following resources:<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/07\/20\/Yevgeniy-Ilyin-Author.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-26420 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/07\/20\/Yevgeniy-Ilyin-Author.jpg\" alt=\"\" width=\"120\" height=\"160\"><\/a>Yevgeniy Ilyin<\/strong>\u00a0is a Solutions Architect at AWS. He has over 20 years of experience working at all levels of software development and solutions architecture and has used programming languages from COBOL and Assembler to .NET, Java, and Python. He develops and codes cloud native solutions with a focus on big data, analytics, and data engineering.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/enhance-your-machine-learning-development-by-using-a-modular-architecture-with-amazon-sagemaker-projects\/<\/p>\n","protected":false},"author":0,"featured_media":1094,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1093"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1093"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1093\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1094"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1093"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1093"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1093"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}