{"id":648,"date":"2020-12-02T17:45:57","date_gmt":"2020-12-02T17:45:57","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/12\/02\/configuring-autoscaling-inference-endpoints-in-amazon-sagemaker\/"},"modified":"2020-12-02T17:45:57","modified_gmt":"2020-12-02T17:45:57","slug":"configuring-autoscaling-inference-endpoints-in-amazon-sagemaker","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/12\/02\/configuring-autoscaling-inference-endpoints-in-amazon-sagemaker\/","title":{"rendered":"Configuring autoscaling inference endpoints in Amazon SageMaker"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> is a fully managed service that provides every developer and data scientist with the ability to quickly build, train, and deploy machine learning (ML) models at scale. Amazon SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models. You can one-click deploy your ML models for making low latency inferences in real-time on fully managed inference endpoints. Autoscaling is an out-of-the-box feature that monitors your workloads and dynamically adjusts the capacity to maintain steady and predictable performance at the possible lowest cost. When the workload increases, autoscaling brings more instances online. When the workload decreases, autoscaling removes unnecessary instances, helping you reduce your compute cost.<\/p>\n<p>The following diagram is a sample architecture that showcases how a model is invoked for inference using an Amazon SageMaker endpoint.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-19042\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/26\/Configuring-autoscaling-inference-1.jpg\" alt=\"\" width=\"800\" height=\"372\"><\/p>\n<p>Amazon SageMaker automatically attempts to distribute your instances across Availability Zones. So, we strongly recommend that you deploy multiple instances for each production endpoint for high availability. If you\u2019re using a VPC, configure at least two subnets in different Availability Zones so Amazon SageMaker can distribute your instances across those Availability Zones.<\/p>\n<p>Amazon SageMaker supports four different ways to implement horizontal scaling of Amazon SageMaker endpoints. You can configure some of these policies using the Amazon SageMaker console, the <a href=\"http:\/\/aws.amazon.com\/cli\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Command Line Interface<\/a> (AWS CLI), or the AWS SDK\u2019s Application Auto Scaling API for the advanced options. In this post, we showcase how to configure using the boto3 SDK for Python and outline different scaling policies and patterns.<strong>\u00a0<\/strong><\/p>\n<h2>Prerequisites<\/h2>\n<p>This post assumes that you have a functional Amazon SageMaker endpoint deployed. Models are hosted within an Amazon SageMaker endpoint; you can have multiple model versions being served via the same endpoint. Each model is referred to as a <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_ProductionVariant.html\" target=\"_blank\" rel=\"noopener noreferrer\">production variant<\/a>.<\/p>\n<p>If you\u2019re new to Amazon SageMaker and have not created an endpoint yet, complete the steps in <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/identifying-bird-species-on-the-edge-using-the-amazon-sagemaker-built-in-object-detection-algorithm-and-aws-deeplens\/\" target=\"_blank\" rel=\"noopener noreferrer\">Identifying bird species on the edge using the Amazon SageMaker built-in Object Detection algorithm and AWS DeepLens<\/a> until the section <strong>Testing the model<\/strong> to develop and host an object detection model.<\/p>\n<p>If you want to get started directly with this post, you can also fetch a model from the <a href=\"https:\/\/mxnet-bing.readthedocs.io\/en\/latest\/model_zoo\/\" target=\"_blank\" rel=\"noopener noreferrer\">MXNet model zoo<\/a>. For example, if you plan to use ResidualNet152, you need the <a href=\"http:\/\/data.dmlc.ml\/models\/imagenet\/resnet\/152-layers\/resnet-152-symbol.json\" target=\"_blank\" rel=\"noopener noreferrer\">model definition<\/a> and the <a href=\"http:\/\/data.dmlc.ml\/models\/imagenet\/resnet\/152-layers\/resnet-152-0000.params\">model weights<\/a> inside a tarball. You can also create custom models that can be hosted as an Amazon SageMaker endpoint. For instructions on building a tarball with Gluon and Apache MXNet, see <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/deploying-custom-models-built-with-gluon-and-apache-mxnet-on-amazon-sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Deploying custom models built with Gluon and Apache MXNet on Amazon SageMaker<\/a>.<\/p>\n<h2>Configuring autoscaling<\/h2>\n<p>The following are the high-level steps for creating a model and applying a scaling policy:<\/p>\n<ol>\n<li>Use Amazon SageMaker to create a model or bring a custom model.<\/li>\n<li>Deploy the model.<\/li>\n<\/ol>\n<p>If you use the MXNet estimator to train the model, you can call deploy to create an Amazon SageMaker endpoint:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\"># Train my estimator\r\nmxnet_estimator = MXNet('train.py',\r\n                framework_version='1.6.0',\r\n                py_version='py3',\r\n                instance_type='ml.p2.xlarge',\r\n                instance_count=1)\r\n\r\nmxnet_estimator.fit('s3:\/\/my_bucket\/my_training_data\/')\r\n\r\n# Deploy my estimator to an Amazon SageMaker endpoint and get a Predictor\r\npredictor = mxnet_estimator.deploy(instance_type='ml.m5.xlarge',\r\n                initial_instance_count=1)#Instance_count=1 is not recommended for production use. Use this only for experimentation.<\/code><\/pre>\n<\/div>\n<p>If you use a pretrained model like ResidualNet152, you can create an <code>MXNetModel<\/code> object and call <code>deploy<\/code> to create the Amazon SageMaker endpoint:<code><\/code><\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">mxnet_model = MXNetModel(model_data='s3:\/\/my_bucket\/pretrained_model\/model.tar.gz',\r\n                         role=role,\r\n                         entry_point='inference.py',\r\n                         framework_version='1.6.0',\r\n                         py_version='py3')\r\npredictor = mxnet_model.deploy(instance_type='ml.m5.xlarge',#\r\n                               initial_instance_count=1)<\/code><\/pre>\n<\/div>\n<ol start=\"3\">\n<li>Create a scaling policy and apply the scaling policy to the endpoint. The following section discusses your scaling policy options.<\/li>\n<\/ol>\n<h2>Scaling options<\/h2>\n<p>You can define minimum, desired, and maximum number of instances per endpoint and, based on the autoscaling configurations, instances are managed dynamically. The following diagram illustrates this architecture.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-19043\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/26\/Configuring-autoscaling-inference-2.jpg\" alt=\"\" width=\"310\" height=\"224\"><\/p>\n<p>To scale the deployed Amazon SageMaker endpoint, first fetch its details:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">import pprint\r\nimport boto3\r\nfrom sagemaker import get_execution_role\r\nimport sagemaker\r\nimport json\r\n\r\npp = pprint.PrettyPrinter(indent=4, depth=4)\r\nrole = get_execution_role()\r\nsagemaker_client = boto3.Session().client(service_name='sagemaker')\r\nendpoint_name = 'name-of-the-endpoint'\r\nresponse = sagemaker_client.describe_endpoint(EndpointName=endpoint_name)\r\npp.pprint(response)\r\n\r\n#Let us define a client to play with autoscaling options\r\nclient = boto3.client('application-autoscaling') # Common class representing Application Auto Scaling for SageMaker amongst other services<\/code><\/pre>\n<\/div>\n<h3>Simple scaling or TargetTrackingScaling<\/h3>\n<p>Use this option when you want to scale based on a specific <a href=\"http:\/\/aws.amazon.com\/cloudwatch\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon CloudWatch<\/a> metric. You can do this by choosing a specific metric and setting threshold values. The recommended metrics for this option are average <code>CPUUtilization<\/code> or <code>SageMakerVariantInvocationsPerInstance<\/code>.<\/p>\n<p><code>SageMakerVariantInvocationsPerInstance<\/code> is the average number of times per minute that each instance for a variant is invoked. <code>CPUUtilization<\/code> is the sum of work handled by a CPU.<em>\u00a0<\/em><\/p>\n<p>The following code snippets show how to scale using these metrics. You can also push custom metrics to CloudWatch or use other metrics. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/monitoring-cloudwatch.html\" target=\"_blank\" rel=\"noopener noreferrer\">Monitor Amazon SageMaker with Amazon CloudWatch<\/a>.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">resource_id='endpoint\/' + endpoint_name + '\/variant\/' + 'AllTraffic' # This is the format in which application autoscaling references the endpoint\r\n\r\nresponse = client.register_scalable_target(\r\n    ServiceNamespace='sagemaker', #\r\n    ResourceId=resource_id,\r\n    ScalableDimension='sagemaker:variant:DesiredInstanceCount',\r\n    MinCapacity=1,\r\n    MaxCapacity=2\r\n)\r\n\r\n#Example 1 - SageMakerVariantInvocationsPerInstance Metric\r\nresponse = client.put_scaling_policy(\r\n    PolicyName='Invocations-ScalingPolicy',\r\n    ServiceNamespace='sagemaker', # The namespace of the AWS service that provides the resource. \r\n    ResourceId=resource_id, # Endpoint name \r\n    ScalableDimension='sagemaker:variant:DesiredInstanceCount', # SageMaker supports only Instance Count\r\n    PolicyType='TargetTrackingScaling', # 'StepScaling'|'TargetTrackingScaling'\r\n    TargetTrackingScalingPolicyConfiguration={\r\n        'TargetValue': 10.0, # The target value for the metric. - here the metric is - SageMakerVariantInvocationsPerInstance\r\n        'PredefinedMetricSpecification': {\r\n            'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance', # is the average number of times per minute that each instance for a variant is invoked. \r\n        },\r\n        'ScaleInCooldown': 600, # The cooldown period helps you prevent your Auto Scaling group from launching or terminating \r\n                                # additional instances before the effects of previous activities are visible. \r\n                                # You can configure the length of time based on your instance startup time or other application needs.\r\n                                # ScaleInCooldown - The amount of time, in seconds, after a scale in activity completes before another scale in activity can start. \r\n        'ScaleOutCooldown': 300 # ScaleOutCooldown - The amount of time, in seconds, after a scale out activity completes before another scale out activity can start.\r\n        \r\n        # 'DisableScaleIn': True|False - ndicates whether scale in by the target tracking policy is disabled. \r\n                            # If the value is true , scale in is disabled and the target tracking policy won't remove capacity from the scalable resource.\r\n    }\r\n)\r\n\r\n#Example 2 - CPUUtilization metric\r\nresponse = client.put_scaling_policy(\r\n    PolicyName='CPUUtil-ScalingPolicy',\r\n    ServiceNamespace='sagemaker',\r\n    ResourceId=resource_id,\r\n    ScalableDimension='sagemaker:variant:DesiredInstanceCount',\r\n    PolicyType='TargetTrackingScaling',\r\n    TargetTrackingScalingPolicyConfiguration={\r\n        'TargetValue': 90.0,\r\n        'CustomizedMetricSpecification':\r\n        {\r\n            'MetricName': 'CPUUtilization',\r\n            'Namespace': '\/aws\/sagemaker\/Endpoints',\r\n            'Dimensions': [\r\n                {'Name': 'EndpointName', 'Value': endpoint_name },\r\n                {'Name': 'VariantName','Value': 'AllTraffic'}\r\n            ],\r\n            'Statistic': 'Average', # Possible - 'Statistic': 'Average'|'Minimum'|'Maximum'|'SampleCount'|'Sum'\r\n            'Unit': 'Percent'\r\n        },\r\n        'ScaleInCooldown': 600,\r\n        'ScaleOutCooldown': 300\r\n    }\r\n)<\/code><\/pre>\n<\/div>\n<p>With the <em>scale-in cooldown period<\/em>, the intention is to scale-in conservatively to protect your application\u2019s availability, so scale-in activities are blocked until the cooldown period has expired. With the <em>scale-out cooldown period<\/em>, the intention is to continuously (but not excessively) scale out. After <a href=\"https:\/\/docs.aws.amazon.com\/autoscaling\/application\/userguide\/application-auto-scaling-target-tracking.html\" target=\"_blank\" rel=\"noopener noreferrer\">Application Auto Scaling<\/a> successfully scales out using a target tracking scaling policy, it starts to calculate the cooldown time.<\/p>\n<h3>Step scaling<\/h3>\n<p>This is an advanced type of scaling where you define additional policies to dynamically adjust the number of instances to scale based on size of the alarm breach. This helps you configure a more aggressive response when demand reaches a certain level. The following code is an example of a step scaling policy based on the <code>OverheadLatency<\/code> metric:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">#Example 3 - OverheadLatency metric and StepScaling Policy\r\nresponse = client.put_scaling_policy(\r\n    PolicyName='OverheadLatency-ScalingPolicy',\r\n    ServiceNamespace='sagemaker',\r\n    ResourceId=resource_id,\r\n    ScalableDimension='sagemaker:variant:DesiredInstanceCount',\r\n    PolicyType='StepScaling', \r\n    StepScalingPolicyConfiguration={\r\n        'AdjustmentType': 'ChangeInCapacity', # 'PercentChangeInCapacity'|'ExactCapacity' Specifies whether the ScalingAdjustment value in a StepAdjustment \r\n                                              # is an absolute number or a percentage of the current capacity.\r\n        'StepAdjustments': [ # A set of adjustments that enable you to scale based on the size of the alarm breach.\r\n            {\r\n                'MetricIntervalLowerBound': 0.0, # The lower bound for the difference between the alarm threshold and the CloudWatch metric.\r\n                 # 'MetricIntervalUpperBound': 100.0, # The upper bound for the difference between the alarm threshold and the CloudWatch metric.\r\n                'ScalingAdjustment': 1 # The amount by which to scale, based on the specified adjustment type. \r\n                                       # A positive value adds to the current capacity while a negative number removes from the current capacity.\r\n            },\r\n        ],\r\n        # 'MinAdjustmentMagnitude': 1, # The minimum number of instances to scale. - only for 'PercentChangeInCapacity'\r\n        'Cooldown': 120,\r\n        'MetricAggregationType': 'Average', # 'Minimum'|'Maximum'\r\n    }\r\n)<\/code><\/pre>\n<\/div>\n<h3>Scheduled scaling<\/h3>\n<p>You can use this option when you know that the demand follows a particular schedule in the day, week, month, or year. This helps you specify a one-time schedule or a recurring schedule or cron expressions along with start and end times, which form the boundaries of when the autoscaling action starts and stops. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">#Example 4 - Scaling based on a certain schedule.\r\nresponse = client.put_scheduled_action(\r\n    ServiceNamespace='sagemaker',\r\n    Schedule='at(2020-10-07T06:20:00)', # yyyy-mm-ddThh:mm:ss You can use one-time schedule, cron, or rate\r\n    ScheduledActionName='ScheduledScalingTest',\r\n    ResourceId=resource_id,\r\n    ScalableDimension='sagemaker:variant:DesiredInstanceCount',\r\n    #StartTime=datetime(2020, 10, 7), #Start date and time for when the schedule should begin\r\n    #EndTime=datetime(2020, 10, 8), #End date and time for when the recurring schedule should end\r\n    ScalableTargetAction={\r\n        'MinCapacity': 2,\r\n        'MaxCapacity': 3\r\n    }\r\n)<\/code><\/pre>\n<\/div>\n<h3>On-demand scaling<\/h3>\n<p>Use this option only when you want to increase or decrease the number of instances manually. This updates the endpoint weights and capacities without defining a trigger. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">response = client.update_endpoint_weights_and_capacities(EndpointName=endpoint_name,\r\n                            DesiredWeightsAndCapacities=[\r\n                                {\r\n                                    'VariantName': 'string',\r\n                                    'DesiredWeight': ...,\r\n                                    'DesiredInstanceCount': 123\r\n                                }\r\n                            ])\r\n<\/code><\/pre>\n<\/div>\n<h3>Comparing scaling methods<\/h3>\n<p>Each of these methods, when successfully applied, results in the addition of instances to an already deployed Amazon SageMaker endpoint. When you make a request to update your endpoint with autoscaling configurations, the status of the endpoint moves to <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_DescribeEndpoint.html#API_DescribeEndpoint_ResponseSyntax\" target=\"_blank\" rel=\"noopener noreferrer\">Updating<\/a>. While the endpoint is in this state, other update operations on this endpoint fail. You can monitor the state by using the <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_DescribeEndpoint.html#API_DescribeEndpoint_ResponseSyntax\" target=\"_blank\" rel=\"noopener noreferrer\">DescribeEndpoint API<\/a>. There is no traffic interruption while instances are being added to or removed from an endpoint.<\/p>\n<p>When creating an endpoint, we specify <code>initial_instance_count<\/code><em>; this<\/em> value is only used at endpoint creation time. That value is ignored afterward, and autoscaling or on-demand scaling uses the change in <code>desiredInstanceCount<\/code> to set the instance count behind an endpoint.<\/p>\n<p>Finally, if you do use <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_UpdateEndpoint.html\" target=\"_blank\" rel=\"noopener noreferrer\">UpdateEndpoint<\/a> to deploy a new <code>EndpointConfig<\/code> to an endpoint, to retain the current number of instances, you should set <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_UpdateEndpoint.html#API_UpdateEndpoint_RequestSyntax\" target=\"_blank\" rel=\"noopener noreferrer\">RetainAllVariantProperties<\/a> to true.<\/p>\n<h2>Considerations for designing an autoscaling policy to scale your ML workload<\/h2>\n<p>You should consider the following when designing an efficient autoscaling policy to minimize traffic interruptions and be cost-efficient:<\/p>\n<ul>\n<li>\n<strong>Traffic patterns and metrics \u2013<\/strong> Especially consider traffic patterns that involve invoking the inference logic. Then determine which metrics these traffic patterns affect the most. Or what metric is the inference logic sensitive to (such as <code>GPUUtilization<\/code>, <code>CPUUtilization<\/code>, <code>MemoryUtilization<\/code>, or <code>Invocations<\/code>) per instance? Is the inference logic GPU bound, memory bound, or CPU bound?<\/li>\n<li>\n<strong>Custom metrics \u2013 <\/strong>If it\u2019s a custom metric that needs to be defined based on the problem domain, we have the option of deploying a custom metrics collector. With a custom metrics collector, you have an additional option of fine-tuning the granularity of metrics collection and publishing.<\/li>\n<li>\n<strong>Threshold <\/strong>\u2013 After we decide on our metrics, we need to decide on the threshold. In other words, how to detect the increase in load, based on the preceding metric, within a time window that allows for the addition of an instance and for your inference logic to be ready to serve inference. This consideration also governs the measure of the scale-in and scale-out cooldown period.<\/li>\n<li>\n<strong>Autoscaling <\/strong>\u2013 Depending on the application logic\u2019s tolerance to autoscaling, there should be a balance between over-provisioning and autoscaling. Depending on the workload, if you select a specialized instance such as Inferentia, the throughput gains might alleviate the need to autoscale to a certain degree.<\/li>\n<li>\n<strong>Horizontal scaling <\/strong>\u2013 When we have these estimations, it\u2019s time to consider one or more strategies that we enlist in this post to deploy for horizontal scaling. Some work particularly well in certain situations. For example, we strongly recommend that you use a target tracking scaling policy to scale on a metric such as average CPU utilization or the <code>SageMakerVariantInvocationsPerInstance<\/code> metric. But a good guideline is to empirically derive an apt scaling policy based on your particular workload and above factors. You can start with a simple target tracking scaling policy, and you still have the option to use step scaling as an additional policy for a more advanced configuration. For example, you can configure a more aggressive response when demand reaches a certain level.<\/li>\n<\/ul>\n<h2>Retrieving your scaling activity log<\/h2>\n<p>When you want to see all the scaling policies attached to your Amazon SageMaker endpoint, you can use <code>describe_scaling_policies<\/code>, which helps you understand and debug the different scaling configurations\u2019 behavior:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">response = client.describe_scaling_policies(\r\n    ServiceNamespace='sagemaker'\r\n)\r\n\r\nfor i in response['ScalingPolicies']:\r\n    print('')\r\n    pp.pprint(i['PolicyName'])\r\n    print('')\r\n    if('TargetTrackingScalingPolicyConfiguration' in i):\r\n        pp.pprint(i['TargetTrackingScalingPolicyConfiguration']) \r\n    else:\r\n        pp.pprint(i['StepScalingPolicyConfiguration'])\r\n    print('')<\/code><\/pre>\n<\/div>\n<h2>Conclusion<\/h2>\n<p>For models facing unpredictable traffic, Amazon SageMaker autoscaling helps economically respond to the demand and removes the undifferentiated heavy lifting of managing the inference infrastructure. One of the best practices of model deployment is to perform load testing. Determine the appropriate thresholds for your scaling policies and choose metrics based on load testing. For more information about load testing, see <a href=\"https:\/\/aws.amazon.com\/ec2\/testing\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon EC2 Testing Policy<\/a> and <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/load-test-and-optimize-an-amazon-sagemaker-endpoint-using-automatic-scaling\/\" target=\"_blank\" rel=\"noopener noreferrer\">Load test and optimize an Amazon SageMaker endpoint using automatic scaling<\/a>.<\/p>\n<h3>References<\/h3>\n<p>For additional references, see the following:<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-19047 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/26\/Chaitanya-Hazarey.jpg\" alt=\"\" width=\"100\" height=\"132\"><strong>Chaitanya Hazarey<\/strong> is a Machine Learning Solutions Architect with the Amazon SageMaker Product Management team. He focuses on helping customers design and deploy end-to-end ML pipelines in production on AWS. He has set up multiple such workflows around problems in the areas of NLP, Computer Vision, Recommender Systems, and AutoML Pipelines.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-19048 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/26\/Pavan-Kumar-Sunder.jpg\" alt=\"\" width=\"100\" height=\"150\"><strong>Pavan Kumar Sunder<\/strong> is a Senior R&amp;D Engineer with Amazon Web Services. He provides technical guidance and helps customers accelerate their ability to innovate through showing the art of the possible on AWS. He has built multiple prototypes around AI\/ML, IoT, and Robotics for our customers.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-18205 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/10\/Rama-Thamman.jpg\" alt=\"\" width=\"100\" height=\"127\"><strong>Rama Thamman<\/strong> is a Software Development Manager with the AI Platforms team, leading the ML Migrations team.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/configuring-autoscaling-inference-endpoints-in-amazon-sagemaker\/<\/p>\n","protected":false},"author":0,"featured_media":649,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/648"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=648"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/648\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/649"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=648"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=648"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=648"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}