{"id":1251,"date":"2021-11-23T08:29:42","date_gmt":"2021-11-23T08:29:42","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/11\/23\/bring-your-amazon-sagemaker-model-into-amazon-redshift-for-remote-inference\/"},"modified":"2021-11-23T08:29:42","modified_gmt":"2021-11-23T08:29:42","slug":"bring-your-amazon-sagemaker-model-into-amazon-redshift-for-remote-inference","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/11\/23\/bring-your-amazon-sagemaker-model-into-amazon-redshift-for-remote-inference\/","title":{"rendered":"Bring Your Amazon SageMaker model into Amazon Redshift for remote inference"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/aws.amazon.com\/redshift\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Redshift<\/a>, a fast, fully managed, widely used cloud data warehouse, natively integrates with <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> for machine learning (ML). Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytics workloads. Data analysts and database developers want to use this data to train ML models, which can then be used to generate insights for use cases such as forecasting revenue, predicting customer churn, and detecting anomalies.<\/p>\n<p><a href=\"https:\/\/aws.amazon.com\/redshift\/features\/redshift-ml\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Redshift ML<\/a> makes it easy for SQL users to create, train, and deploy ML models using familiar SQL commands. In a previous <a href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/create-train-and-deploy-machine-learning-models-in-amazon-redshift-using-sql-with-amazon-redshift-ml\/\" target=\"_blank\" rel=\"noopener noreferrer\">post<\/a>, we covered how Amazon Redshift ML allows you to use your data in Amazon Redshift with SageMaker, a fully managed ML service, without requiring you to become an expert in ML. We also discussed how Amazon Redshift ML enables ML experts to create XGBoost or MLP models in an earlier <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/build-xgboost-models-with-amazon-redshift-ml\/\" target=\"_blank\" rel=\"noopener noreferrer\">post<\/a>. Additionally, Amazon Redshift ML allows data scientists to either import existing SageMaker models into Amazon Redshift for in-database inference or remotely invoke a SageMaker endpoint.<\/p>\n<p>This post shows how you can enable your data warehouse users to use SQL to invoke a remote SageMaker endpoint for prediction. We first train and deploy a Random Cut Forest model in SageMaker, and demonstrate how you can create a model with SQL to invoke that SageMaker predictions remotely. Then, we show how end users can invoke the model.<\/p>\n<h2>Prerequisites<\/h2>\n<p>To get started, we need an Amazon Redshift cluster with the Amazon Redshift ML feature enabled. For an introduction to Amazon Redshift ML and instructions on setting it up, see <a href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/create-train-and-deploy-machine-learning-models-in-amazon-redshift-using-sql-with-amazon-redshift-ml\/\" target=\"_blank\" rel=\"noopener noreferrer\">Create, train, and deploy machine learning models in Amazon Redshift using SQL with Amazon Redshift ML<\/a>.<\/p>\n<p>You also have to make sure that the SageMaker model is deployed and you have the endpoint. You can use the following <a href=\"http:\/\/aws.amazon.com\/cloudformation\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CloudFormation<\/a> <a href=\"https:\/\/console.aws.amazon.com\/cloudformation\/home?region=us-east-1#\/stacks\/create\/template?stackName=RedshiftBYOM&amp;templateURL=https:\/\/redshift-ml-multiclass.s3.amazonaws.com\/redshift-byom-blog.yaml\" target=\"_blank\" rel=\"noopener noreferrer\">template<\/a> to provision all the required resources in your AWS accounts automatically.<\/p>\n<h2>Solution overview<\/h2>\n<p>Amazon Redshift ML supports text and CSV inference formats. For more information about various SageMaker algorithms and their inference formats, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/randomcutforest.html\" target=\"_blank\" rel=\"noopener noreferrer\">Random Cut Forest (RCF) Algorithm<\/a>.<\/p>\n<p>Amazon SageMaker Random Cut Forest (RCF) is an algorithm designed to detect anomalous data points within a dataset. Examples of anomalies that are important to detect include when website activity uncharacteristically spikes, when temperature data diverges from a periodic behavior, or when changes to public transit ridership reflect the occurrence of a special event.<\/p>\n<p>In this post, we use the SageMaker RCF algorithm to train an RCF model using the Notebook generated by the CloudFormation template on the Numenta Anomaly Benchmark (NAB) NYC Taxi dataset.<\/p>\n<p>We <a href=\"https:\/\/raw.githubusercontent.com\/numenta\/NAB\/master\/data\/realKnownCause\/nyc_taxi.csv\" target=\"_blank\" rel=\"noopener noreferrer\">downloaded<\/a> the data and stored it in an <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) bucket. The data consists of the number of New York City taxi passengers over the course of 6 months aggregated into 30-minute buckets. We naturally expect to find anomalous events occurring during the NYC marathon, Thanksgiving, Christmas, New Year\u2019s Day, and on the day of a snowstorm.<\/p>\n<p>We then use this model to predict anomalous events by generating an anomaly score for each data point.<\/p>\n<p>The following figure illustrates how we use Amazon Redshift ML to create a model using the SageMaker endpoint.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/1-5967-Model.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-31166 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/1-5967-Model.jpg\" alt=\"\" width=\"800\" height=\"450\"><\/a><\/p>\n<h2>Deploy the model<\/h2>\n<p>To deploy the model, go to the SageMaker console and open the notebook that was created by the CloudFormation template.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/2-5967-Notebook-instances.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-31167 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/2-5967-Notebook-instances.jpg\" alt=\"\" width=\"800\" height=\"202\"><\/a><\/p>\n<p>Then choose <code>bring-your-own-model-remote-inference.ipynb<\/code>.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/3-5967-jupyter.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-31168 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/3-5967-jupyter.jpg\" alt=\"\" width=\"800\" height=\"53\"><\/a><\/p>\n<p>Set up parameters as shown in the following screenshot and then run all cells.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/4-5967-bring-your-own-model.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-31169 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/4-5967-bring-your-own-model.jpg\" alt=\"\" width=\"800\" height=\"313\"><\/a><\/p>\n<h2>Get the SageMaker model endpoint<\/h2>\n<p>On the Amazon SageMaker console, under <strong>Inference <\/strong>in the navigation pane, choose <strong>Endpoints<\/strong> to find your model name. You use this when you create the remote inference model in Amazon Redshift.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/5-5967.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-31170 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/5-5967.jpg\" alt=\"\" width=\"800\" height=\"283\"><\/a><\/p>\n<h2>Prepare data to create a remote inference model using Amazon Redshift ML<\/h2>\n<p>Create the schema and load the data in Amazon Redshift using the following SQL:<\/p>\n<div class=\"hide-language\">\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-sql\">DROP TABLE IF EXISTS public.rcf_taxi_data CASCADE;\nCREATE TABLE public.rcf_taxi_data\n(\nride_timestamp timestamp,\nnbr_passengers int\n);\nCOPY public.rcf_taxi_data\nFROM 's3:\/\/sagemaker-sample-files\/datasets\/tabular\/anomaly_benchmark_taxi\/NAB_nyc_taxi.csv'\niam_role 'arn:aws:iam:::&lt;accountid&gt;:role\/RedshiftML' ignoreheader 1 csv delimiter ',';<\/code><\/pre>\n<\/p><\/div>\n<\/p><\/div>\n<p>Amazon Redshift now supports attaching the default IAM role. If you have enabled the default IAM role in your cluster, you can use the default IAM role as follows.<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-sql\">COPY public.rcf_taxi_data\nFROM 's3:\/\/sagemaker-sample-files\/datasets\/tabular\/anomaly_benchmark_taxi\/NAB_nyc_taxi.csv'\niam_role default ignoreheader 1 csv delimiter ',';<\/code><\/pre>\n<\/p><\/div>\n<p>You can use the <a href=\"https:\/\/docs.aws.amazon.com\/redshift\/latest\/mgmt\/query-editor-v2-using.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Redshift query editor v2<\/a> to run these commands.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/6-5967-redshift-query.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-31171 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/6-5967-redshift-query.jpg\" alt=\"\" width=\"800\" height=\"285\"><\/a><\/p>\n<h2>Create a model<\/h2>\n<p>Create a model in Amazon Redshift ML using the SageMaker endpoint you previously captured:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-sql\">CREATE MODEL public.remote_random_cut_forest\nFUNCTION remote_fn_rcf(int)\nRETURNS decimal(10,6)\nSAGEMAKER 'randomcutforest-xxxxxxxxx'\nIAM_ROLE 'arn:aws:iam::&lt;accountid&gt;:role\/RedshiftML';\n<\/code><\/pre>\n<\/p><\/div>\n<div id=\"1637614755.008900\" class=\"c-virtual_list__item\" role=\"listitem\" data-qa=\"virtual-list-item\">\n<div class=\"c-message_kit__background p-message_pane_message__message c-message_kit__message\" role=\"presentation\" data-qa=\"message_container\" data-qa-unprocessed=\"false\" data-qa-placeholder=\"false\">\n<div class=\"c-message_kit__hover\" role=\"document\" data-qa-hover=\"true\">\n<div class=\"c-message_kit__actions c-message_kit__actions--above\">\n<div class=\"c-message_kit__gutter\">\n<div class=\"c-message_kit__gutter__right\" role=\"presentation\" data-qa=\"message_content\">\n<div class=\"c-message_kit__blocks c-message_kit__blocks--rich_text\">\n<div class=\"c-message__message_blocks c-message__message_blocks--rich_text\">\n<div class=\"p-block_kit_renderer\" data-qa=\"block-kit-renderer\">\n<div class=\"p-block_kit_renderer__block_wrapper p-block_kit_renderer__block_wrapper--first\">\n<div class=\"p-rich_text_block\" dir=\"auto\">\n<p>\n                    You can also use the default IAM role with your CREATE MODEL command as follows:\n                  <\/p>\n<div>\n<div class=\"hide-language\">\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-sql\">CREATE MODEL public.remote_random_cut_forest\nFUNCTION remote_fn_rcf(int)\nRETURNS decimal(10,6)\nSAGEMAKER 'randomcutforest-xxxxxxxxx'\nIAM_ROLE  default;<\/code><\/pre>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<h2>Check model status<\/h2>\n<p>You can use the <code>show model<\/code> command to view the status of the model:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-code\">show model public.remote_random_cut_forest<\/code><\/pre>\n<\/p><\/div>\n<p>You get output like the following screenshot, which shows the endpoint and function name.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/7-5967.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-31172 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/7-5967.jpg\" alt=\"\" width=\"800\" height=\"343\"><\/a><\/p>\n<h2>Compute anomaly scores across the entire taxi dataset<\/h2>\n<p>Now, run the inference query using the function name from the <code>create model<\/code> statement:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-sql\">select ride_timestamp, nbr_passengers, public.remote_fn_rcf(nbr_passengers) as score\nfrom public.rcf_taxi_data;\n<\/code><\/pre>\n<\/p><\/div>\n<p>The following screenshot shows our results.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/8-5967-query-editor-v2.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-31173 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/8-5967-query-editor-v2.jpg\" alt=\"\" width=\"800\" height=\"324\"><\/a><\/p>\n<p>Now that we have our anomaly scores, we need to check for higher-than-normal anomalies.<\/p>\n<p>Amazon Redshift ML has batching optimizations to minimize the communication cost with SageMaker and offers high-performance remote inference.<\/p>\n<h2>Check for high anomalies<\/h2>\n<p>The following code runs a query for any data points with scores greater than three standard deviations (approximately 99.9th percentile) from the mean score:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">with score_cutoff as\n(select stddev(public.remote_fn_rcf(nbr_passengers)) as std, avg(public.remote_fn_rcf(nbr_passengers)) as mean, ( mean + 3 * std ) as score_cutoff_value\nfrom public.rcf_taxi_data)\n\nselect ride_timestamp, nbr_passengers, public.remote_fn_rcf(nbr_passengers) as score\nfrom public.rcf_taxi_data\nwhere score &gt; (select score_cutoff_value from score_cutoff)\norder by 2 desc;\n<\/code><\/pre>\n<\/p><\/div>\n<p>The data in the following screenshot shows that the biggest spike in ridership occurs on November 2, 2014, which was the annual NYC marathon. We also see spikes on Labor Day weekend, New Year\u2019s Day and the July 4th holiday weekend.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/9-5967.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-31174 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/22\/9-5967.jpg\" alt=\"\" width=\"800\" height=\"351\"><\/a><\/p>\n<h2>Conclusion<\/h2>\n<p>In this post, we used SageMaker Random Cut Forest to detect anomalous data points in a taxi ridership dataset. In this data, the anomalies occurred when ridership was uncharacteristically high or low. However, the RCF algorithm is also capable of detecting when, for example, data breaks periodicity or uncharacteristically changes global behavior.<\/p>\n<p>We then used Amazon Redshift ML to demonstrate how you can make inferences on unsupervised algorithms (such as Random Cut Forest). This allows you to democratize ML by making predictions with Amazon Redshift SQL commands.<\/p>\n<p>For more information about building different models with Amazon Redshift ML see the\u00a0<a href=\"https:\/\/aws.amazon.com\/redshift\/features\/redshift-ml\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Redshift<\/a>\u00a0ML documentation.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><b data-stringify-type=\"bold\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-25269 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/06\/10\/Phil-Bates.jpg\" alt=\"\" width=\"100\" height=\"133\">Phil Bates\u00a0<\/b>is a Senior Analytics Specialist Solutions Architect at AWS with over 25 years of data warehouse experience.<\/p>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-25023 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/06\/03\/Debu-Panda.jpg\" alt=\"\" width=\"100\" height=\"135\"><\/strong><strong>Debu Panda<\/strong>, a principal product manager at AWS, is an industry leader in analytics, application platform, and database technologies and has more than 25 years of experience in the IT world.<\/p>\n<p><b data-stringify-type=\"bold\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-25270 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/06\/10\/Nikos-Koulouris.jpg\" alt=\"\" width=\"100\" height=\"135\"><\/b><strong>N<\/strong><b data-stringify-type=\"bold\"><strong>ikos Koulouris\u00a0<\/strong><\/b>is a Software Development Engineer at AWS. He received his PhD from University of California, San Diego and he has been working in the areas of databases and analytics.<\/p>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-11159 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/02\/20\/Murali-100.jpg\" alt=\"\" width=\"100\" height=\"134\">Murali Narayanaswamy\u00a0<\/strong>is a principal machine learning scientist in AWS. He received his PhD from Carnegie Mellon University and works at the intersection of ML, AI, optimization, learning and inference to combat uncertainty in real-world applications including personalization, forecasting, supply chains and large scale systems.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/bring-your-amazon-sagemaker-model-into-amazon-redshift-for-remote-inference\/<\/p>\n","protected":false},"author":0,"featured_media":1252,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1251"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1251"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1251\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1252"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1251"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1251"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1251"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}