{"id":1938,"date":"2022-03-09T19:56:49","date_gmt":"2022-03-09T19:56:49","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/03\/09\/amazon-sagemaker-autopilot-now-supports-time-series-data\/"},"modified":"2022-03-09T19:56:49","modified_gmt":"2022-03-09T19:56:49","slug":"amazon-sagemaker-autopilot-now-supports-time-series-data","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/03\/09\/amazon-sagemaker-autopilot-now-supports-time-series-data\/","title":{"rendered":"Amazon SageMaker Autopilot now supports time series data"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/aws.amazon.com\/sagemaker\/autopilot\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Autopilot<\/a> automatically builds, trains, and tunes the best machine learning (ML) models based on your data, while allowing you to maintain full control and visibility. We have recently announced <a href=\"https:\/\/aws.amazon.com\/about-aws\/whats-new\/2021\/10\/amazon-sagemaker-autopilot-time-series-data\/\" target=\"_blank\" rel=\"noopener noreferrer\">support for time series data in Autopilot<\/a>. You can use Autopilot to tackle regression and classification tasks on time series data, or sequence data in general. Time series data is a special type of sequence data where data points are collected at even time intervals.<\/p>\n<p>Manually preparing the data, selecting the right ML model, and optimizing its parameters is a complex task, even for an expert practitioner. Although automated approaches exist that can find the best models and their parameters, these typically can\u2019t handle data that comes as sequences, such as network traffic, electricity consumption, or household expenses recorded over time. Because this data takes the form of observations acquired at different time points, consecutive observations can\u2019t be treated as independent of each other and need to be processed as a whole. You can use Autopilot for a wide range of problems dealing with sequential data. For example, you can classify network traffic recorded over time to identify malicious activities, or determine if individuals qualify for a mortgage based on their credit history. You provide a dataset containing time series data and Autopilot handles the rest, processing the sequential data through specialized feature transforms and finding the best model on your behalf.<\/p>\n<p>Autopilot eliminates the heavy lifting of building ML models, and helps you automatically build, train, and tune the best ML model based on your data. Autopilot runs several algorithms on your data and tunes their hyperparameters on a fully managed compute infrastructure. In this post, we demonstrate how you can use <a class=\"c-link\" href=\"https:\/\/aws.amazon.com\/sagemaker\/autopilot\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/aws.amazon.com\/sagemaker\/autopilot\/\" data-sk=\"tooltip_parent\" data-remove-tab-index=\"true\">Autopilot<\/a>\u00a0to solve classification and regression problems on time series data. For instructions on creating and training an Autopilot model, see\u00a0<a class=\"c-link\" href=\"https:\/\/github.com\/awslabs\/amazon-sagemaker-examples\/blob\/master\/autopilot\/autopilot_customer_churn.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/github.com\/awslabs\/amazon-sagemaker-examples\/blob\/master\/autopilot\/autopilot_customer_churn.ipynb\" data-sk=\"tooltip_parent\" data-remove-tab-index=\"true\">Customer Churn Prediction with Amazon SageMaker Autopilot<\/a>.<\/p>\n<h2>Time series data classification using Autopilot<\/h2>\n<p>As a running example, we consider a multi-class problem on the time series <a href=\"http:\/\/www.timeseriesclassification.com\/description.php?Dataset=UWaveGestureLibraryX\" target=\"_blank\" rel=\"noopener noreferrer\">dataset<\/a> <a href=\"https:\/\/ieeexplore.ieee.org\/document\/4912759\" target=\"_blank\" rel=\"noopener noreferrer\">UWaveGestureLibraryX<\/a>, containing equidistant readings of accelerometer sensors while performing one of eight predefined hand gestures. For simplicity, we consider only X dimension of the accelerometer. The task is to build a classification model to map the time series data from the sensor readings to the predefined gestures. The following figure shows the first rows of the dataset in CSV format. The entire table consists of 896 rows and two columns: the first column is a gesture label and the second column is a time series of sensor readings.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/01\/ML-7246-image001.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33644\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/ML-7246-image001.png\" alt=\"\" width=\"986\" height=\"231\"><\/a><\/p>\n<h2>Convert data to the right format with Amazon SageMaker Data Wrangler<\/h2>\n<p>On top of accepting numerical, categorical, and standard text columns, Autopilot now also accepts a sequence input column. If your time series data doesn\u2019t follow this format, you can easily convert it through <a href=\"https:\/\/aws.amazon.com\/sagemaker\/data-wrangler\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Data Wrangler<\/a>. Data Wrangler reduces the time it takes to aggregate and prepare data for ML from weeks to minutes. With Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface. For instance, consider the same dataset but in a different input format: each gesture (specified by ID) is a sequence of equidistant measurements of the accelerometer. When stored vertically, each row contains a timestamp and one value. The following figure compares this data in its original format and a sequence format.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/01\/ML-7246-image003.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33645\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/ML-7246-image003.png\" alt=\"\" width=\"1214\" height=\"526\"><\/a><\/p>\n<p>To convert this dataset to the format described earlier using Data Wrangler, load the dataset from <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3). Then use the <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/data-wrangler-transform.html\" target=\"_blank\" rel=\"noopener noreferrer\">time series Group by transform<\/a>, as shown in the following screenshot, and export the data back to Amazon S3 in CSV format.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/01\/ML-7246-image005.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33646\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/ML-7246-image005.png\" alt=\"\" width=\"568\" height=\"561\"><\/a><\/p>\n<p>When the dataset is in its designated format, you can proceed with Autopilot. To check out other time series transformers of Data Wrangler refer to <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/prepare-time-series-data-with-amazon-sagemaker-data-wrangler\/\" target=\"_blank\" rel=\"noopener noreferrer\">Prepare time series data with Amazon SageMaker Data Wrangler<\/a>.<\/p>\n<h2>Launch an AutoML job<\/h2>\n<p>As with other input types supported by Autopilot, each row of the dataset is a different observation and each column is a feature. In this example, we have a single column containing time series data, but you can have multiple time series columns. You can also have multiple columns with different input types, such as time series, text, and numerical.<\/p>\n<p>To <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/autopilot-automate-model-development-create-experiment.html\" target=\"_blank\" rel=\"noopener noreferrer\">create an Autopilot experiment<\/a>, place the dataset in an S3 bucket and create a new experiment within <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/studio.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Studio<\/a>. As shown in the following screenshot, you must specify the name of experiment, S3 location of the dataset, S3 location for the output artifacts, and the column name to predict.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/01\/ML-7246-image007.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33647\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/ML-7246-image007.png\" alt=\"\" width=\"2414\" height=\"1183\"><\/a><\/p>\n<p>Autopilot analyzes the data, generates ML pipelines, and runs a default 250 iterations of hyperparameter optimization on this classification task. As shown in the following model leaderboard, Autopilot reaches 0.821 accuracy, and you can deploy the best model in just one click.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/01\/ML-7246-image009.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33648\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/ML-7246-image009.png\" alt=\"\" width=\"1993\" height=\"908\"><\/a><\/p>\n<p>In addition, Autopilot generates a <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/autopilot-data-exploration-report.html\" target=\"_blank\" rel=\"noopener noreferrer\">data exploration report<\/a>, where you can visualize and explore your data.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/01\/ML-7246-image011.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33649\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/ML-7246-image011.png\" alt=\"\" width=\"1910\" height=\"1054\"><\/a><\/p>\n<p>Transparency is foundational for Autopilot. You can inspect and modify generated ML pipelines within the candidate definition notebook. The following screenshot demonstrates how Autopilot recommends a range of pipelines, combining the time series transformer <code>TSFeatureExtractor<\/code> with different ML algorithms, such as gradient boosted decision trees and linear models. The <code>TSFeatureExtractor<\/code> extracts hundreds of time series features for you, which are then fed to the downstream algorithms to make predictions. For the full list of time series features, refer to <a href=\"https:\/\/tsfresh.readthedocs.io\/en\/latest\/text\/list_of_features.html\" target=\"_blank\" rel=\"noopener noreferrer\">Overview on extracted features<\/a>.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/01\/ML-7246-image013.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33650\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/ML-7246-image013.png\" alt=\"\" width=\"2099\" height=\"1150\"><\/a><\/p>\n<h2>Conclusion<\/h2>\n<p>In this post, we demonstrated how to use SageMaker Autopilot to solve time series classification and regression problems in just a few clicks.<\/p>\n<p>For more information about Autopilot, see <a href=\"https:\/\/aws.amazon.com\/sagemaker\/autopilot\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Autopilot<\/a>. To explore related features of SageMaker, see <a href=\"https:\/\/aws.amazon.com\/sagemaker\/data-wrangler\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Data Wrangler<\/a>.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/14\/Nikita-headshot-1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-33054 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/14\/Nikita-headshot-1.png\" alt=\"\" width=\"100\" height=\"134\"><\/a><strong>Nikita Ivkin\u00a0<\/strong><\/strong>is an Applied Scientist, Amazon SageMaker Data Wrangler.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/Anne-Milbert.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-33659 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/Anne-Milbert.png\" alt=\"\" width=\"100\" height=\"133\"><\/a><strong>Anne Milbert<\/strong> is a Software Development engineer working on Amazon SageMaker Automatic Model Tuning.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignleft size-full wp-image-32490\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/26\/vperrone.png\" alt=\"\" width=\"100\" height=\"133\"><strong><a class=\"c-link\" href=\"https:\/\/www.linkedin.com\/in\/valerio-perrone-391731132\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/www.linkedin.com\/in\/valerio-perrone-391731132\/\" data-sk=\"tooltip_parent\" data-remove-tab-index=\"true\" aria-describedby=\"sk-tooltip-7235\">Valerio Perrone<\/a><\/strong> is an Applied Science Manager working on Amazon SageMaker Automatic Model Tuning and Autopilot.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/megsatis.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-33660 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/megsatis.png\" alt=\"\" width=\"100\" height=\"133\"><\/a><strong>Meghana Satish<\/strong> is a Software Development engineer working on Amazon SageMaker Automatic Model Tuning.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/ali-takbiri.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-33658 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/02\/ali-takbiri.png\" alt=\"\" width=\"100\" height=\"133\"><\/a> <strong>Ali Takbiri<\/strong> is an AI\/ML specialist Solutions Architect, and helps customers by using Machine Learning to solve their business challenges on the AWS Cloud.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/amazon-sagemaker-autopilot-now-supports-time-series-data\/<\/p>\n","protected":false},"author":0,"featured_media":1939,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1938"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1938"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1938\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1939"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1938"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1938"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1938"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}