{"id":1930,"date":"2022-03-04T16:43:14","date_gmt":"2022-03-04T16:43:14","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/03\/04\/build-a-cold-start-time-series-forecasting-engine-using-autogluon\/"},"modified":"2022-03-04T16:43:14","modified_gmt":"2022-03-04T16:43:14","slug":"build-a-cold-start-time-series-forecasting-engine-using-autogluon","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/03\/04\/build-a-cold-start-time-series-forecasting-engine-using-autogluon\/","title":{"rendered":"Build a cold start time series forecasting engine using AutoGluon"},"content":{"rendered":"<div id=\"\">\n<p>Whether you\u2019re allocating resources more efficiently for web traffic, forecasting patient demand for staffing needs, or anticipating sales of a company\u2019s products, forecasting is an essential tool across many businesses. One particular use case, known as <em>cold start forecasting<\/em>, builds forecasts for a time series that has little or no existing historical data, such as a new product that just entered the market in the retail industry. Traditional time series forecasting methods such as autoregressive integrated moving average (ARIMA) or exponential smoothing (ES) rely heavily on historical time series of each individual product, and therefore aren\u2019t effective for cold start forecasting.<\/p>\n<p>In this post, we demonstrate how to build a cold start forecasting engine using <a href=\"http:\/\/autogluon-staging.s3-website-us-west-2.amazonaws.com\/PR-1198\/55\/tutorials\/forecasting\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">AutoGluon AutoML for time series forecasting<\/a>, an open-source Python package to automate machine learning (ML) on image, text, tabular, and time series data. AutoGluon provides an end-to-end automated machine learning (AutoML) pipeline for beginners to experienced ML developers, making it the most accurate and easy-to-use fully automated solution. We use the free <a href=\"https:\/\/studiolab.sagemaker.aws\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Studio Lab<\/a> service for this demonstration.<\/p>\n<h2>Introduction to AutoGluon time series<\/h2>\n<p><a href=\"https:\/\/github.com\/awslabs\/autogluon\/\" target=\"_blank\" rel=\"noopener noreferrer\">AutoGluon<\/a> is a leading open-source library for AutoML for text, image, and tabular data, allowing you to produce highly accurate models from raw data with just one line of code. Recently, the team has been working to extend these capabilities to time series data, and has developed an automated forecasting module that is publicly available on <a href=\"https:\/\/github.com\/awslabs\/autogluon\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub<\/a>. The <code>autogluon.forecasting<\/code> module automatically processes raw time series data into the appropriate format, and then trains and tunes various state-of-the-art deep learning models to produce accurate forecasts. In this post, we demonstrate how to use <code>autogluon.forecasting<\/code> and apply it to cold start forecasting tasks.<\/p>\n<h2>Solution overview<\/h2>\n<p>Because AutoGluon is an open-source Python package, you can implement this solution <a href=\"https:\/\/auto.gluon.ai\/stable\/install.html#installation-faq\" target=\"_blank\" rel=\"noopener noreferrer\">locally<\/a> on your laptop or on Amazon SageMaker Studio Lab. We walk through the following steps:<\/p>\n<ol>\n<li>Set up AutoGluon for Amazon SageMaker Studio Lab.<\/li>\n<li>Prepare the dataset.<\/li>\n<li>Define training parameters using AutoGluon.<\/li>\n<li>Train a cold start forecasting engine for time series forecasting.<\/li>\n<li>Visualize cold start forecasting predictions.<\/li>\n<\/ol>\n<p>The key assumption of cold start forecasting is that items with similar characteristics should have similar time series trajectories, which is what allows cold start forecasting to make predictions on items without historical data, as illustrated in the following figure.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/ML-7552-image001.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33261\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/ML-7552-image001.png\" alt=\"\" width=\"984\" height=\"714\"><\/a><\/p>\n<p>In our walkthrough, we use a synthetic dataset based on electricity consumption, which consists of the hourly time series for 370 items, each with an <code>item_id<\/code> from 0\u2013369. Within this synthetic dataset, each <code>item_id<\/code> is also associated with a static feature (a feature that doesn\u2019t change over time). We train a <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/deepar.html\" target=\"_blank\" rel=\"noopener noreferrer\">DeepAR<\/a> model using AutoGluon to learn the typical behavior of similar items, and transfer such behavior to make predictions on new items (<code>item_id<\/code> 370\u2013373) that don\u2019t have historical time series data. Although we\u2019re demonstrating the cold start forecasting approach with only one static feature, in practice, having informative and high-quality static features is the key for a good cold start forecast.<\/p>\n<p>The following diagram provides a high-level overview of our solution. The open-source code is available on the <a href=\"https:\/\/github.com\/whosivan\/amazon-sagemaker-studio-lab-cold-start-forecasting-using-autogluon\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repo<\/a>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/ML-7552-image003.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33262\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/ML-7552-image003.png\" alt=\"\" width=\"944\" height=\"208\"><\/a><\/p>\n<h2>Prerequisites<\/h2>\n<p>For this walkthrough, you should have the following prerequisites:<\/p>\n<p>Log in to your Amazon SageMaker Studio Lab account and set up the environment using the terminal:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">cd sagemaker-studiolab-notebooks\/ \ngit clone https:\/\/github.com\/whosivan\/amazon-sagemaker-studio-lab-cold-start-forecasting-using-autogluon\nconda env create -f autogluon.yml\nconda activate autogluon\ngit clone https:\/\/github.com\/yx1215\/autogluon.git\ncd autogluon\/\ngit checkout --track origin\/add_forecasting_predictor<\/code><\/pre>\n<\/p><\/div>\n<p>These instructions should also work from your laptop if you don\u2019t have access to Amazon SageMaker Studio Lab (we recommend installing Anaconda on your laptop first).<\/p>\n<p>When you have the virtual environment fully set up, launch the notebook <code>AutoGluon-cold-start-demo.ipynb<\/code> and select the custom environment <code>.conda-autogluon:Python<\/code> kernel.<\/p>\n<h2>Prepare the target time series and item meta dataset<\/h2>\n<p>Download the following datasets to your notebook instance if they\u2019re not included, and save them under the directory <code>data\/<\/code>. You can find these datasets on our <a href=\"https:\/\/github.com\/whosivan\/amazon-sagemaker-studio-lab-cold-start-forecasting-using-autogluon\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repo<\/a>:<\/p>\n<ul>\n<li>Test.csv.gz<\/li>\n<li>coldStartTargetData.csv<\/li>\n<li>itemMetaData.csv<\/li>\n<\/ul>\n<p>Run the following snippet to load the target time series dataset into the kernel:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">zipLocalFilePath = \"data\/test.csv.gz\"\nlocalFilePath = \"data\/test.csv\"\nutil.extract_gz(zipLocalFilePath, localFilePath)\n\ntdf = pd.read_csv(zipLocalFilePath, dtype = object)\ntdf['target_value'] = tdf['target_value'].astype('float')\ntdf.head()<\/code><\/pre>\n<\/p><\/div>\n<p>AutoGluon time series requires static features to be represented in numerical format. This can be achieved through applying <code>LabelEncoder()<\/code> on our static feature <code>type<\/code>, where we encode A=0, B=1, C=2, D=3 (see the following code). By default, AutoGluon infers the static feature to be either ordinal or categorical. You can also overwrite this by converting the static feature column to be the object\/string data type for categorical features, or integer\/float data type for ordinal features.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">localItemMetaDataFilePath = \"data\/itemMetaData.csv\"\nimdf = pd.read_csv(localItemMetaDataFilePath, dtype = object)\n\nlabelencoder = LabelEncoder()\nimdf['type'] = labelencoder.fit_transform(imdf['type'])\n\nimdf_without_coldstart_item['type'] = imdf_without_coldstart_item['type'].astype(str)\n\nimdf_without_coldstart_item = imdf[imdf.item_id.isin(tdf.item_id.tolist())]\nimdf_without_coldstart_item.to_csv('data\/itemMetaDatawithoutColdstart.csv', index=False)\n\nimdf_with_coldstart_item = imdf[~imdf.item_id.isin(tdf.item_id.tolist())]\nimdf_with_coldstart_item.to_csv('data\/itemMetaDataOnlyColdstart.csv', index=False)\n<\/code><\/pre>\n<\/p><\/div>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/ML-7552-image007.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33256\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/ML-7552-image007.png\" alt=\"\" width=\"183\" height=\"230\"><\/a><\/p>\n<h2>Set up and start AutoGluon model training<\/h2>\n<p>We need to specify <code>save_path = \u2018autogluon-coldstart-demo\u2019<\/code> as the model artifact folder name (see the following code). We also set our <code>eval_metric<\/code> as <a href=\"https:\/\/www.statisticshowto.com\/mean-absolute-percentage-error-mape\/\" target=\"_blank\" rel=\"noopener noreferrer\">mean absolute percentage error<\/a>, or <code>\u2018MAPE\u2019<\/code> for short, where we defined <code>prediction_length<\/code> as 24 hours. If not specified, AutoGluon by default produces probabilistic forecasts and scores them via the <a href=\"https:\/\/docs.aws.amazon.com\/forecast\/latest\/dg\/metrics.html#metrics-wQL\" target=\"_blank\" rel=\"noopener noreferrer\">weighted quantile loss<\/a>. We only look at the <a href=\"https:\/\/arxiv.org\/abs\/1704.04110\" target=\"_blank\" rel=\"noopener noreferrer\">DeepAR model<\/a> in our demo, because we know the DeepAR algorithm allows cold start forecasting by design. We set one of the DeepAR hyperparameters arbitrarily and pass that hyperparameter to the <code>ForecastingPredictor().fit()<\/code> call. This allows AutoGluon to look only into the specified model. For a full list of tunable hyperparameters, refer to <a href=\"https:\/\/ts.gluon.ai\/api\/gluonts\/gluonts.model.deepar.html\" target=\"_blank\" rel=\"noopener noreferrer\">gluonts.model.deepar package<\/a>.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">save_path = 'autogluon-coldstart-demo'\neval_metric = 'MAPE'\ndeepar_params = {\n    \"scaling\":True\n}\n\nag_predictor = ForecastingPredictor(path=save_path, \neval_metric=eval_metric).fit(tdf, static_features = imdf_without_coldstart_item,\nprediction_length=24, #how far out in the future we wish to forecast                                                                  index_column=\"item_id\",                             \ntarget_column=\"target_value\",                                          \ntime_column=\"timestamp\",\nquantiles=[0.1, 0.5, 0.9],                                                                \nhyperparameters={\"DeepAR\": deepar_params})\n<\/code><\/pre>\n<\/p><\/div>\n<p>The training takes 30\u201345 minutes. You can get the model summary by calling the following function:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">ag_predictor.fit_summary()<\/code><\/pre>\n<\/p><\/div>\n<h2>Forecast on the cold start item<\/h2>\n<p>Now we\u2019re ready to generate forecasts for the cold start item. We recommend having at least five rows for each <code>item_id<\/code>. Therefore, for the <code>item_id<\/code> that has fewer than five observations, we fill in with NaNs. In our demo, both <code>item_id<\/code> 370 and 372 have zero observation, a pure cold start problem, whereas the other two have five target values.<\/p>\n<p>Load in the cold start target time series dataset with the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">localColdStartDataFilePath = \"data\/coldStartTargetData.csv\"\ncstdf = pd.read_csv(localColdStartDataFilePath, dtype = object)\ncstdf.head(20)<\/code><\/pre>\n<\/p><\/div>\n<p>We feed the cold start target time series into our AutoGluon model, along with the item meta dataset for the cold start <code>item_id<\/code>:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">cold_start_prediction = ag_predictor.predict(cstdf, static_features=imdf_with_coldstart_item)<\/code><\/pre>\n<\/p><\/div>\n<h2>Visualize the predictions<\/h2>\n<p>We can create a plotting function to generate a visualization on the cold start forecasting, as shown in the following graph.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/ML-7552-image009.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-33257\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/ML-7552-image009.png\" alt=\"\" width=\"913\" height=\"387\"><\/a><\/p>\n<h2>Clean up<\/h2>\n<p>To optimize resource usage, consider stopping the runtime on Amazon SageMaker Studio Lab after you have fully explored the notebook.<\/p>\n<h2>Conclusion<\/h2>\n<p>In this post, we showed how to build a cold start forecasting engine using AutoGluon AutoML for time series data on Amazon SageMaker Studio Lab. For those of you who are wondering the difference between <a href=\"https:\/\/aws.amazon.com\/forecast\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Forecast<\/a> and AutoGluon (time series), Amazon Forecast is a fully managed and supported service that uses machine learning (ML) to generate highly accurate forecasts without requiring any prior ML experience. While AutoGluon is an open-source project that is community supported with the latest research contributions. We walked through an end-to-end example to demonstrate what AutoGluon for time series is capable of, and provided a dataset and use case.<\/p>\n<p>AutoGluon for time series data is an open-source Python package, and we hope that this post, together with our code example, gives you a straightforward solution to tackle challenging cold start forecasting problems. You can access the entire example on our <a href=\"https:\/\/github.com\/whosivan\/amazon-sagemaker-studio-lab-cold-start-forecasting-using-autogluon\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repo<\/a>. Try it out, and let us know what you think!<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/Ivan-Cui.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-33259 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/Ivan-Cui.jpg\" alt=\"\" width=\"100\" height=\"110\"><\/a> Ivan Cui <\/strong>is a Data Scientist with AWS Professional Services, where he helps customers build and deploy solutions using machine learning on AWS. He has worked with customers across diverse industries, including software, finance, pharmaceutical, and healthcare. In his free time, he enjoys reading, spending time with his family, and maximizing his stock portfolio.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/Jonas-Mueller.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-33260 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/Jonas-Mueller.png\" alt=\"\" width=\"100\" height=\"97\"><\/a>Jonas Mueller<\/strong> is a Senior Applied Scientist in the AI Research and Education group at AWS, where he develops new algorithms to improve deep learning and develop automated machine learning. Before joining AWS to democratize ML, he completed his PhD at the MIT Computer Science and Artificial Intelligence Lab. In his free time, he enjoys exploring mountains and the outdoors.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/Wenming-Ye.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-33258 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/18\/Wenming-Ye.jpg\" alt=\"\" width=\"100\" height=\"120\"><\/a>Wenming Ye<\/strong> is a Research Product Manager at AWS AI. He is passionate about helping researchers and enterprise customers rapidly scale their innovations through open-source and state-of-the-art machine learning technology. Wenming has diverse R&amp;D experience from Microsoft Research, the SQL engineering team, and successful startups.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/build-a-cold-start-time-series-forecasting-engine-using-autogluon\/<\/p>\n","protected":false},"author":0,"featured_media":1931,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1930"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1930"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1930\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1931"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1930"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1930"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1930"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}