{"id":1255,"date":"2021-11-24T08:26:48","date_gmt":"2021-11-24T08:26:48","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/11\/24\/build-mlops-workflows-with-amazon-sagemaker-projects-gitlab-and-gitlab-pipelines\/"},"modified":"2021-11-24T08:26:48","modified_gmt":"2021-11-24T08:26:48","slug":"build-mlops-workflows-with-amazon-sagemaker-projects-gitlab-and-gitlab-pipelines","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/11\/24\/build-mlops-workflows-with-amazon-sagemaker-projects-gitlab-and-gitlab-pipelines\/","title":{"rendered":"Build MLOps workflows with Amazon SageMaker projects, GitLab, and GitLab pipelines"},"content":{"rendered":"<div id=\"\">\n<p>Machine learning operations (MLOps) are key to effectively transition from an experimentation phase to production. The practice provides you the ability to create a repeatable mechanism to build, train, deploy, and manage machine learning models. To quickly adopt MLOps, you often require capabilities that use your existing toolsets and expertise. <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-projects-whatis.html\" target=\"_blank\" rel=\"noopener noreferrer\">Projects<\/a> in <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> give organizations the ability to easily set up and standardize developer environments for data scientists and CI\/CD (continuous integration, continuous delivery) systems for MLOps engineers. With SageMaker projects, MLOps engineers or organization administrators can define templates that bootstrap the ML workflow with source version control, automated ML pipelines, and a set of code to quickly start iterating over ML use cases. With projects, dependency management, code repository management, build reproducibility, and artifact sharing and management become easy for organizations to set up. SageMaker projects are provisioned using <a href=\"https:\/\/docs.aws.amazon.com\/servicecatalog\/latest\/dg\/what-is-service-catalog.html\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Service Catalog<\/a> products. Your organization can use project templates to provision projects for each of your users.<\/p>\n<p>In this post, you use a custom SageMaker project template to incorporate CI\/CD practices with GitLab and GitLab pipelines. You automate building a model using <a href=\"https:\/\/aws.amazon.com\/sagemaker\/pipelines\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Pipelines<\/a> for data preparation, model training, and model evaluation. SageMaker projects builds on Pipelines by implementing the model deployment steps and using<a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/model-registry.html\" target=\"_blank\" rel=\"noopener noreferrer\"> SageMaker Model Registry<\/a>, along with your existing CI\/CD tooling, to automatically provision a CI\/CD pipeline. In our use case, after the trained model is approved in the model registry, the model deployment pipeline is triggered via a GitLab pipeline.<\/p>\n<h2>Prerequisites<\/h2>\n<p>For this walkthrough, you should have the following prerequisites:<\/p>\n<p>This post provides a detailed explanation of the SageMaker projects, GitLab, and GitLab pipelines integration. We review the code and discuss the components of the solution. To deploy the solution, reference the <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-custom-project-templates\/tree\/main\/mlops-template-gitlab\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repo<\/a>, which provides step-by-step instructions for implementing a MLOps workflow using a SageMaker project template with GitLab and GitLab pipelines.<\/p>\n<h2>Solution overview<\/h2>\n<p>The following diagram shows the architecture we build using a custom SageMaker project template.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-arch-diagram-image001.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30833 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-arch-diagram-image001.png\" alt=\"\" width=\"1000\" height=\"508\"><\/a><\/p>\n<p>Let\u2019s review the components of this architecture to understand the end-to-end setup:<\/p>\n<ul>\n<li><strong>GitLab<\/strong> \u2013 Acts as our code repository and enables CI\/CD using GitLab pipelines. The custom SageMaker project <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-custom-project-templates\/blob\/main\/mlops-template-gitlab\/project.yml\" target=\"_blank\" rel=\"noopener noreferrer\">template<\/a> creates two repositories (model build and model deploy) in your GitLab account.\n<ul>\n<li>The first repository (model build) provides code to create a multi-step model building pipeline. This includes steps for data processing, model training, model evaluation, and conditional model registration based on accuracy. It trains a linear regression model using the XGBoost algorithm on the well-known <a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/abalone\" target=\"_blank\" rel=\"noopener noreferrer\">UCI Machine Learning Abalone dataset<\/a>.<\/li>\n<li>The second repository (model deploy) contains the code and configuration files for model deployment, as well as the test scripts required to pass the quality benchmark. These are code stubs that must be defined for your use case.<\/li>\n<li>Each repository also has a GitLab CI pipeline. The model build pipeline automatically triggers and runs the pipeline from end to end whenever a new commit is made to the model build repository. The model deploy pipeline is triggered whenever a new model version is added to the model registry, and the model version status is marked as Approved.<\/li>\n<\/ul>\n<\/li>\n<li><strong>SageMaker Pipelines <\/strong>\u2013 Contains the directed acyclic graph (DAG) that includes data preparation, model training, and model evaluation.<\/li>\n<li><strong>Amazon S3 <\/strong>\u2013 An <a href=\"https:\/\/aws.amazon.com\/s3\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) bucket stores the output model artifacts that are generated from the pipeline.<\/li>\n<li><strong>AWS Lambda<\/strong> \u2013 Two <a href=\"http:\/\/aws.amazon.com\/lambda\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Lambda<\/a> functions are created, which we review in more detail later in this post:\n<ul>\n<li>One function seeds the code into your two GitLab repositories.<\/li>\n<li>One function triggers the model deployment pipeline after the new model is registered in the model registry.<\/li>\n<\/ul>\n<\/li>\n<li><strong>SageMaker Model Registry<\/strong> \u2013 Tracks the model versions and respective artifacts, including the lineage and metadata. A model package group is created that contains the group of related model versions. The model registry also manages the approval status of the model version for downstream deployment.<\/li>\n<li><strong>Amazon EventBridge <\/strong>\u2013 <a href=\"https:\/\/aws.amazon.com\/eventbridge\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon EventBridge<\/a> monitors all changes to the model registry. It also contains a rule that triggers the Lambda function for the model deploy pipeline, when the model package version state changes from <code>PendingManualApproval<\/code> to <code>Approved<\/code> in the model registry.<\/li>\n<li><strong>AWS CloudFormation <\/strong>\u2013 <a href=\"https:\/\/aws.amazon.com\/cloudformation\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CloudFormation<\/a> deploys the model and creates the SageMaker endpoints when the model deploy pipeline is triggered by the approval of the trained model.<\/li>\n<li><strong>SageMaker hosting <\/strong>\u2013 Creates two HTTPS real-time endpoints to perform inference. The hosting option is configurable, for example, for batch transform or asynchronous inference. The staging endpoint is created when the model deploy pipeline is triggered by the approval of the trained model. This endpoint is used to evaluate the deployed model by confirming it\u2019s generating predictions that meet our target accuracy requirements. When the model is ready to be deployed in production, a production endpoint is provisioned by manually starting the job in the GitLab model deploy pipeline.<\/li>\n<\/ul>\n<h2>Use the new MLOps project template with GitLab and GitLab pipelines<\/h2>\n<p>In this section, we review the parameters required for the MLOps project template (see the following screenshot). This template allows you to utilize GitLab pipelines as your orchestrator.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-project-parameters-image002.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30846 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-project-parameters-image002-1024x700.png\" alt=\"\" width=\"1024\" height=\"700\"><\/a><\/p>\n<p>The template has the following parameters:<\/p>\n<ul>\n<li><strong>GitLab Server URL<\/strong> \u2013 The URL of the GitLab server in<code> https:\/\/<\/code> format. The GitLab accounts under your organization may contain a different customized server URL (domain). The server URL is required to authorize access to the <a href=\"https:\/\/python-gitlab.readthedocs.io\/en\/stable\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">python-gitlab API<\/a>. You use the <a href=\"https:\/\/docs.gitlab.com\/ee\/user\/profile\/personal_access_tokens.html\" target=\"_blank\" rel=\"noopener noreferrer\">personal access token<\/a> you created to allow permission to the Lambda functions to push the seed code into your GitLab repositories. We discuss the Lambda function code in more detail in the next section.<\/li>\n<li><strong>Base URL for your GitLab Repositories<\/strong> \u2013 The URL for your GitLab account to create the model build and deploy repositories in the format of <code>https:\/\/&lt;gitlab server&gt;\/&lt;username&gt;<\/code> or <code>https:\/\/&lt;gitlab server&gt;&lt;group&gt;\/&lt;project&gt;<\/code>. You must create a personal access token under your GitLab user account in order to authenticate with the GitLab API.<\/li>\n<li><strong>Model Build Repository Name <\/strong>\u2013 The name of the repository <code>mlops-gitlab-project-seedcode-model-build<\/code> of the model build and training seed code.<\/li>\n<li><strong>Model Deploy Repository Name<\/strong> \u2013 The name of the repository <code>mlops-gitlab-project-seedcode-model-deploy<\/code> of the model deploy seed code.<\/li>\n<li><strong>GitLab Group ID<\/strong> \u2013 GitLab groups are important for managing access and permissions for projects. Enter the ID of the group that repositories are created for. In this example, we enter None, because we\u2019re using the root group.<\/li>\n<li><strong>GitLab Secret Name (Secrets Manager) <\/strong>\u2013 The secret in <a href=\"https:\/\/aws.amazon.com\/secrets-manager\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Secrets Manager<\/a> contains the value of the GitLab personal access token that is used by the Lambda function to populate the seed code in the repositories. Enter the name of the secret you created in Secrets Manager.<\/li>\n<\/ul>\n<h2>Lambda functions code overview<\/h2>\n<p>As discussed earlier, we create two Lambda functions. The first function seeds the code into your GitLab repositories. The second function triggers your model deployment. Let\u2019s review these functions in more detail.<\/p>\n<h3>Seedcodecheckin Lambda function<\/h3>\n<p>This <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-custom-project-templates\/blob\/main\/mlops-template-gitlab\/lambda_functions\/lambda-seedcode-checkin-gitlab\/lambda_function.py\" target=\"_blank\" rel=\"noopener noreferrer\">function<\/a> helps create the GitLab projects and repositories and pushes the code files into these repositories. These files are needed to set up the ML CI\/CD pipelines.<\/p>\n<p>The Secrets Manager secret is created to allow the function to retrieve the stored GitLab personal access token. This token allows the function to communicate with GitLab to create repositories and push the seed code. It also allows the environment variables to be passed in through the <code>project.yml<\/code> file. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-yaml\">def get_secret():\n    ''' '''\n    secret_name = os.environ['SecretName']\n    region_name = os.environ['Region']\n    \n    session = boto3.session.Session()\n    client = session.client(\n        service_name='secretsmanager',\n        region_name=region_name\n    )<\/code><\/pre>\n<\/p><\/div>\n<p>The Secrets Manager secret was created when you ran the <code>init.sh<\/code> file earlier as part of the <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-custom-project-templates\/tree\/main\/mlops-template-gitlab\" target=\"_blank\" rel=\"noopener noreferrer\">code repo<\/a> prerequisites.<\/p>\n<p>The deployment package for the function contains several libraries, including <code>python-gitlab<\/code> and <code>cfn-response<\/code>. Because our function\u2019s source code is packaged as a .zip file and interacts with AWS CloudFormation, we use <code>cfn-response<\/code>. We use the <code>python-gitlab<\/code> API and the <a href=\"https:\/\/aws.amazon.com\/sdk-for-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SDK for Python<\/a> (Boto3) to download the seed code files and upload them to Amazon S3 to be pushed to our GitLab repositories. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">    # Configure SDKs for GitLab and S3\n    gl = gitlab.Gitlab(gitlab_server_uri, private_token=gitlab_private_token)\n    s3 = boto3.client('s3')\n \n    model_build_filename = f'\/tmp\/{str(uuid.uuid4())}-model-build-seed-code.zip'\n    model_deploy_filename = f'\/tmp\/{str(uuid.uuid4())}-model-deploy-seed-code.zip'\n    model_build_directory = f'\/tmp\/{str(uuid.uuid4())}-model-build'\n    model_deploy_directory = f'\/tmp\/{str(uuid.uuid4())}-model-deploy'\n\n    # Get Model Build Seed Code from S3 for Gitlab Repo\n    with open(model_build_filename, 'wb') as f:\n        s3.download_fileobj(sm_seed_code_bucket, model_build_sm_seed_code_object_name, f)\n\n    # Get Model Deploy Seed Code from S3 for Gitlab Repo\n    with open(model_deploy_filename, 'wb') as f:\n        s3.download_fileobj(sm_seed_code_bucket, model_deploy_sm_seed_code_object_name, f)\n<\/code><\/pre>\n<\/p><\/div>\n<p>Two projects (repositories) are created in GitLab, and the seed code files are pushed into the repositories (model build and model deploy) using the <code>python-gitlab<\/code> API:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\"># Create the GitLab Project\n    try:\n        if group_id is None:\n            build_project = gl.projects.create({'name': gitlab_project_name_build})\n        else:\n            build_project = gl.projects.create({'name': gitlab_project_name_build, 'namespace_id': int(group_id)})\n    ....\n    try:\n        if group_id is None:\n            deploy_project = gl.projects.create({'name': gitlab_project_name_deploy})\n        else:\n            deploy_project = gl.projects.create({'name': gitlab_project_name_deploy, 'namespace_id': int(group_id)})\n    ....\n    \n    # Commit to the above created Repo all the files that were in the seed code Zip\n    try:\n        build_project.commits.create(build_data)\n    except Exception as e:\n        logging.error(\"Code could not be pushed to the model build repo.\")\n        logging.error(e)\n        cfnresponse.send(event, context, cfnresponse.FAILED, response_data)\n        return { \n            'message' : \"GitLab seedcode checkin failed.\"\n        }\n\n    try:\n        deploy_project.commits.create(deploy_data)\n    except Exception as e:\n        logging.error(\"Code could not be pushed to the model deploy repo.\")\n        logging.error(e)\n        cfnresponse.send(event, context, cfnresponse.FAILED, response_data)\n        return { \n            'message' : \"GitLab seedcode checkin failed.\"\n        }<\/code><\/pre>\n<\/p><\/div>\n<p>The following screenshot shows the successful run of the Lambda function pushing the required seed code files into both projects in your GitLab account.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-project-seedcode-image003.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30847 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-project-seedcode-image003-1024x746.png\" alt=\"\" width=\"1024\" height=\"746\"><\/a><\/p>\n<h3>gitlab-trigger Lambda function<\/h3>\n<p>This Lambda function is triggered by EventBridge. The<code> project.yml<\/code> CloudFormation template contains an EventBridge rule that triggers the function when the model package state changes in the SageMaker model registry. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-yaml\">ModelDeploySageMakerEventRule:\n    Type: AWS::Events::Rule\n    Properties:\n      # Max length allowed: 64\n      Name: !Sub sagemaker-${SageMakerProjectName}-${SageMakerProjectId}-event-rule # max: 10+33+15+5=63 chars\n      Description: \"Rule to trigger a deployment when SageMaker Model registry is updated with a new model package. For example, a new model package is registered with Registry\"\n      EventPattern:\n        source:\n          - \"aws.sagemaker\"\n        detail-type:\n          - \"SageMaker Model Package State Change\"\n        detail:\n          ModelPackageGroupName:\n            - !Sub ${SageMakerProjectName}-${SageMakerProjectId}\n      State: \"ENABLED\"\n      Targets:\n        -\n          Arn: !GetAtt GitLabPipelineTriggerLambda.Arn\n          Id: !Sub sagemaker-${SageMakerProjectName}-trigger\n<\/code><\/pre>\n<\/p><\/div>\n<p>The following screenshot contains a subset of the function code that triggers the GitLab pipeline in the <code>.gitlab-ci.yml<\/code> file. It deploys the SageMaker model endpoints using the CloudFormation template <code>endpoint-config-template.yml<\/code> in your model deploy repository.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30839 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-lambda-triggers-image004-1024x530.png\" alt=\"\" width=\"1024\" height=\"530\"><\/p>\n<p>To better understand the solution, review the entire code for the functions as needed.<\/p>\n<h2>GitLab and GitLab pipelines overview<\/h2>\n<p>As described earlier, GitLab plays a key role as the source code repo and enabling CI\/CD pipelines in this solution. Let\u2019s look into our GitLab account to understand the components.<\/p>\n<p>After the project is successfully created, using our custom template in SageMaker projects per the steps in the <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-custom-project-templates\/tree\/main\/mlops-template-gitlab\" target=\"_blank\" rel=\"noopener noreferrer\">code repo<\/a>, navigate to your GitLab account to see two new repositories. Each repository has a GitLab CI pipeline associated with it that runs as soon as the project is created.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-gitlab-model-build-image005.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30837 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-gitlab-model-build-image005-1024x518.png\" alt=\"\" width=\"1024\" height=\"518\"><\/a><\/p>\n<p>The first run of each pipeline fails because GitLab doesn\u2019t have the AWS credentials. For each repository, navigate to <strong>Settings<\/strong>, <strong>CI\/CD<\/strong>, <strong>Variables<\/strong>. Create two new variables, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, with the associated information for your GitLab role.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-gitlab-variables-image006.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30838 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-gitlab-variables-image006.png\" alt=\"\" width=\"1276\" height=\"812\"><\/a><\/p>\n<h2>Model build pipeline in GitLab<\/h2>\n<p>Let\u2019s review the GitLab pipelines, starting with the model build pipeline. We define the pipelines in GitLab by creating the <code>.gitlab-ci.yml<\/code> file, where we define the various stages and related jobs. As shown in the following screenshot, this pipeline has only one stage (training) and the related script shows how a SageMaker pipeline file is triggered. (You can learn more about the SageMaker pipeline by exploring the <code>pipeline.py<\/code> file on <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-custom-project-templates\/blob\/main\/mlops-template-gitlab\/seedcode\/mlops-gitlab-project-seedcode-model-build\/pipelines\/abalone\/pipeline.py\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub<\/a>.)<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30836 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-gitlab-ci-image007-1024x784.png\" alt=\"\" width=\"1024\" height=\"784\"><\/p>\n<p>When this GitLab pipeline is triggered, it starts the Abalone SageMaker pipeline to build your model.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30843 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-pipeline-execution-image008-1024x276.png\" alt=\"\" width=\"1024\" height=\"276\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30842 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-pipeline-dag-image009-1024x679.png\" alt=\"\" width=\"1024\" height=\"679\"><\/p>\n<p>When the model build is complete, you can locate this model in the model registry in SageMaker Studio.<\/p>\n<h3>Use this template for your custom use case<\/h3>\n<p>The model build repository contains code for preprocessing, training, and evaluating the model for the <a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/abalone\" target=\"_blank\" rel=\"noopener noreferrer\">UCI Abalone dataset<\/a>. You need to modify the files to address your custom use case.<\/p>\n<ol>\n<li>Navigate to the <code>pipelines<\/code> folder in your model build repository.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30844 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-pipelines-steps-image010-1024x817.png\" alt=\"\" width=\"1024\" height=\"817\"><\/p>\n<ol start=\"2\">\n<li>Upload your dataset to a S3 bucket. Replace the bucket URL in this section of your <code>pipeline.py<\/code> file.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30834 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-dataset-image011-1024x240.png\" alt=\"\" width=\"1024\" height=\"240\"><\/p>\n<ol start=\"3\">\n<li>Navigate to <code>.gitlab-ci.yml<\/code> and modify this section with the folder and file of your use case.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30848 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-template-customization-image012-1024x600.png\" alt=\"\" width=\"1024\" height=\"600\"><\/p>\n<h2>Model deployment pipeline in GitLab<\/h2>\n<p>When the SageMaker pipeline that trains the model is complete, a model is added to the SageMaker model registry. If that model is approved, the GitLab pipeline in the model deploy repository starts and the model deployment process begins.<\/p>\n<p>To approve the model in the model registry, complete the following steps:<\/p>\n<ol>\n<li>Choose the <strong>Components and registries<\/strong> icon.<\/li>\n<li>Choose <strong>Model registry<\/strong>, and choose (right-click) the model version.<\/li>\n<li>Choose <strong>Update model version status<\/strong>.<\/li>\n<li>Change the status from <code>Pending<\/code> to <code>Approved<\/code>.<\/li>\n<\/ol>\n<p>This triggers the deploy pipeline.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-model-approval-image013.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30840 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-model-approval-image013-1024x339.png\" alt=\"\" width=\"1024\" height=\"339\"><\/a><\/p>\n<p>Now, let\u2019s review the <code>.gitlab-ci.yml<\/code> file in the model deploy repository. As shown in the following screenshot, this model deploy pipeline has four stages: build, staging deploy, test staging, and production deploy. This pipeline uses AWS CloudFormation to deploy the model and create the SageMaker endpoints.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-model-deploy-stages-image014.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30841 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-model-deploy-stages-image014-1024x724.png\" alt=\"\" width=\"1024\" height=\"724\"><\/a><\/p>\n<p>A manual step in the GitLab pipeline exists for model promotion from staging to production that creates an endpoint with the suffix <code>-prod<\/code>. If you choose <strong>manual<\/strong>, this job runs and upon completion deploys the SageMaker endpoint.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-production-endpoint-manual-image015.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30845 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-production-endpoint-manual-image015.png\" alt=\"\" width=\"1431\" height=\"187\"><\/a><\/p>\n<p>To verify that the endpoints were created, navigate to the <strong>Endpoints <\/strong>page on the SageMaker console. You should see two endpoints: <code>&lt;model_name&gt;-staging<\/code> and <code>&lt;model_name&gt;-prod<\/code>.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-endpoints-image016.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-30835 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/16\/ML-6664-endpoints-image016.png\" alt=\"\" width=\"1275\" height=\"255\"><\/a><\/p>\n<h2>GitLab implementation patterns<\/h2>\n<p>In this section, we discuss two patterns for implementing GitLab: hosting with <a href=\"https:\/\/aws.amazon.com\/vpc\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Virtual Private Cloud<\/a> (Amazon VPC), or with two-factor authentication.<\/p>\n<h3>Hosting GitLab in an Amazon VPC<\/h3>\n<p>You may choose to deploy GitLab in an Amazon VPC to use a private network and provide access to AWS resources. In this scenario, the Lambda functions also must be deployed in a VPC to access the GitLab API. We accomplish this by updating the <code>project.yml<\/code> file and the <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) role <code>AmazonSageMakerServiceCatalogProductsUseRole<\/code>.<\/p>\n<p>The IAM user that you used to create the VPC requires the following user permissions for Lambda to verify network resources:<\/p>\n<ul>\n<li><code>ec2:DescribeSecurityGroups<\/code><\/li>\n<li><code>ec2:DescribeSubnets<\/code><\/li>\n<li><code>ec2:DescribeVpcs<\/code><\/li>\n<\/ul>\n<p>The Lambda functions\u2019 execution role requires the following permissions to create and manage network interfaces:<\/p>\n<ul>\n<li><code>ec2:CreateNetworkInterface<\/code><\/li>\n<li><code>ec2:DescribeNetworkInterfaces<\/code><\/li>\n<li><code>ec2:DeleteNetworkInterface<\/code><\/li>\n<\/ul>\n<ol>\n<li>On the IAM console, search for <code>AmazonSageMakerServiceCatalogProductsUseRole<\/code>.<\/li>\n<li>Choose <strong>Attach policies<\/strong>.<\/li>\n<li>Search for the <code>AWSLambdaVPCAccessExecutionRole<\/code> managed policy.<\/li>\n<li>Choose <strong>Attach policy<\/strong>.<\/li>\n<\/ol>\n<p>Next, we update <code>project.yml<\/code> to configure the functions to deploy in a VPC by providing the VPC security groups and subnets.<\/p>\n<ol>\n<li>\n<ol>\n<li>Add the subnet IDs and security group IDs to the <code>Parameters<\/code> section, for example:\n<div class=\"hide-language\">\n<pre><code class=\"lang-yaml\">SubnetId1:\nType: AWS::EC2::Subnet::Id\nDescription: Subnet Id for Lambda function\n\nSubnetId2:\nType: AWS::EC2::Subnet::Id\nDescription: Subnet Id for Lambda function\n\nSecurityGroupId:\nType: AWS::EC2::SecurityGroup::Id\nDescription: Security Group Id for Lambda function to Execute\n<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Add the <code>VpcConfig<\/code> information under <code>Properties<\/code> for the <code>GitSeedCodeCheckinLambda<\/code> and <code>GitLabPipelineTriggerLambda<\/code> functions, for example:\n<div class=\"hide-language\">\n<pre><code class=\"lang-yaml\">SubnetId1:\nGitSeedCodeCheckinLambda:\nType: 'AWS::Lambda::Function'\nProperties:\nDescription: To trigger the codebuild project for the seedcode checkin\n.....\nVpcConfig:\nSecurityGroupIds:\n- !Ref SecurityGroupId\nSubnetIds:\n- !Ref SubnetId1\n- !Ref SubnetId2\n<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h3>Two-factor authentication enabled<\/h3>\n<p>If you enabled two-factor authentication on your GitLab account, you need to use your personal access token to clone the repositories in SageMaker Studio. The token requires the <code>read_repository<\/code> and <code>write_repository<\/code> flags. To clone the model build and model deploy repositories, enter the following commands:<\/p>\n<div class=\"hide-language\">\n<div class=\"hide-language\">\n<div class=\"hide-language\">\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">git clone https:\/\/oauth2:PERSONAL_ACCESS_TOKEN@gitlab.com\/username\/gitlab-project-seedcode-model-build-&lt;project-id&gt;\ngit clone https:\/\/oauth2:PERSONAL_ACCESS_TOKEN@gitlab.com\/username\/gitlab-project-seedcode-model-deploy-&lt;project-id&gt;\n<\/code><\/pre>\n<p>Because you previously created a secret for your personal access token, no changes are required to the code when two-factor authentication is enabled.<\/p>\n<p>In this post, we walked through using a custom SageMaker MLOps project template to automatically build and configure a CI\/CD pipeline. This pipeline incorporated your existing CI\/CD tooling with SageMaker features for data preparation, model training, model evaluation, and model deployment. In our use case, we focused on using GitLab and GitLab pipelines with SageMaker projects and pipelines. For more detailed implementation information, review the <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-custom-project-templates\/tree\/main\/mlops-template-gitlab\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repo<\/a>. Try it out and let us know if you have any questions in the comments section!<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-27201 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/08\/12\/Kirit-Thadaka.jpg\" alt=\"\" width=\"100\" height=\"133\"><strong>Kirit Thadaka<\/strong>\u00a0is an ML Solutions Architect working in the Amazon SageMaker Service SA team. Prior to joining AWS, Kirit spent time working in early stage AI startups followed by some time in consulting in various roles in AI research, MLOps, and technical leadership.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/Lauren-Mullennex-1.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-30949 size-full alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/Lauren-Mullennex-1.jpg\" alt=\"\" width=\"100\" height=\"150\"><\/a><strong>Lauren Mullennex<\/strong> is a Solutions Architect based in Denver, CO. She works with customers to help them architect solutions on AWS. In her spare time, she enjoys hiking and cooking Hawaiian cuisine.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/Indrajit.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-30948 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/Indrajit.png\" alt=\"\" width=\"100\" height=\"113\"><\/a>Indrajit Ghosalkar<\/strong> is a Sr. Solutions Architect at Amazon Web Services based in Singapore. He loves helping customers achieve their business outcomes through cloud adoption and realize their data analytics and ML goals through adoption of DataOps \/ MLOps practices and solutions. In his spare time, he enjoys playing with his son, traveling and meeting new people.<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/build-mlops-workflows-with-amazon-sagemaker-projects-gitlab-and-gitlab-pipelines\/<\/p>\n","protected":false},"author":0,"featured_media":1256,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1255"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1255"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1255\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1256"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1255"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1255"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1255"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}