{"id":1501,"date":"2022-01-26T18:46:20","date_gmt":"2022-01-26T18:46:20","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/01\/26\/how-logz-io-accelerates-ml-recommendations-and-anomaly-detection-solutions-with-amazon-sagemaker\/"},"modified":"2022-01-26T18:46:20","modified_gmt":"2022-01-26T18:46:20","slug":"how-logz-io-accelerates-ml-recommendations-and-anomaly-detection-solutions-with-amazon-sagemaker","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/01\/26\/how-logz-io-accelerates-ml-recommendations-and-anomaly-detection-solutions-with-amazon-sagemaker\/","title":{"rendered":"How Logz.io accelerates ML recommendations and anomaly detection solutions with Amazon SageMaker"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/logz.io\/\" target=\"_blank\" rel=\"noopener noreferrer\">Logz.io<\/a> is an AWS Partner Network (APN) Advanced Technology Partner with <a href=\"https:\/\/partners.amazonaws.com\/partners\/001E000001BvhxXIAR\/Logz.io\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Competencies in DevOps, Security, and Data &amp; Analytics<\/a>. Logz.io offers a software as a service (SaaS) observability platform based on best-in-class open-source software solutions for log, metric, and tracing analytics. Customers are sending an increasing amount of data to Logz.io from various data sources to manage the health and performance of their applications and services. It can be overwhelming for new users who are looking to navigate across the various dashboards built over time, process different alert notifications, and connect the dots when troubleshooting production issues.<\/p>\n<p>Mean time to detect (MTTD) and mean time to resolution (MTTR) are key metrics for our customers. They\u2019re calculated by measuring the time a user in our platform starts to investigate an issue (such as production service down) to the point when they stop doing actions in the platform that are related to the specific investigation.<\/p>\n<p>To help customers reduce MTTD and MTTR, Logz.io is turning to machine learning (ML) to provide recommendations for relevant dashboards and queries and perform anomaly detection via self-learning. As a result, the average user is equipped with the aggregated experience of their entire company, leveraging the wisdom of many. We found that our solution can reduce MTTR by up to 20%.<\/p>\n<p>As MTTD decreases, users can identify the problem and resolve it faster. Our data semantic layer contains semantics for starting and stopping an investigation, and the popularity of each action the user is doing with respect to a specific alert.<\/p>\n<p>In this post, we share how Logz.io used <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> to reduce the time and effort for our proof of concept (POC), experiments from research to production evaluation, and how we reduced our production inference cost.<\/p>\n<h2>The challenge<\/h2>\n<p>Until Logz.io used SageMaker, the time between research to POC testing and experiments on production was quite lengthy. This was because we needed to create Spark jobs to collect, clean, and normalize the data. DevOps required this work to read each data source. DevOps and data engineering skills aren\u2019t part of our ML team, and this caused a high dependency between the teams.<\/p>\n<p>Another challenge was to provide an ML inference service to our products while achieving optimal cost vs. performance ratio. Our optimal scenario is supporting as many models as possible for a computing unit, while providing high concurrency from customers with many models. We had flexibility on our inference time, because our initial window of the data stream for the inference service is 5 minutes bucket of logs.<\/p>\n<h2>Research phase<\/h2>\n<p>Data science is an iterative process that requires an interactive development environment for research, validating the data output on every iteration and data processing. Therefore, we encourage our ML researchers to use notebooks.<\/p>\n<p>To accelerate the iteration cycle, we wanted to test our notebooks\u2019 code on real production data, while running it at scale. Furthermore, we wanted to avoid the bottleneck of DevOps and data engineering during the initial test in production, while having the ability to view the outputs and trying to estimate the code runtime.<\/p>\n<p>To implement this, we wanted to provide our data science team full control and end-to-end responsibility from research to initial test on production. We needed them to easily pull data, while preserving data access management and monitoring this access. They also needed to easily deploy their custom POC notebooks into production in a scalable manner, while monitoring the runtime and expected costs.<\/p>\n<h2>Evaluation phase<\/h2>\n<p>During this phase, we evaluated a few ML platforms in order to support both training and serving requirements. We found that SageMaker is the most appropriate for our use cases because it supports both training and inference. Furthermore, it\u2019s customizable, so we can tailor it according to our preferred research process.<\/p>\n<p>Initially, we started from local notebooks, testing various libraries. We ran into problems with pulling massive data from production. Later, we were stuck in a point of the modeling phase that took many hours on a local machine.<\/p>\n<p>We evaluated many solutions and finally chose the following architecture:<\/p>\n<ul>\n<li><strong>DataPlate <\/strong>\u2013 The open-source version of <a href=\"https:\/\/github.com\/Dataplate\/dataplate\" target=\"_blank\" rel=\"noopener noreferrer\">DataPlate<\/a> helped us pull and join our data easily by utilizing our Spark <a href=\"http:\/\/aws.amazon.com\/emr\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon EMR<\/a> clusters with a simple SQL, while monitoring the data access<\/li>\n<li><strong>SageMaker notebook instance and processing jobs<\/strong> \u2013 This helped us with the scalability of runtime and flexibility of machine types and ML frameworks, while collaborating our code via a Git connection<\/li>\n<\/ul>\n<h2>Research phase solution architecture<\/h2>\n<p>The following diagram illustrates the solution architecture of the research phase, and consists of the following components:<\/p>\n<ul>\n<li><strong>SageMaker notebooks<\/strong> \u2013 Data scientists use these <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/nbi.html\" target=\"_blank\" rel=\"noopener noreferrer\">notebooks<\/a> to conduct their research.<\/li>\n<li><strong>AWS Lambda function<\/strong> \u2013 <a href=\"https:\/\/aws.amazon.com\/lambda\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Lambda<\/a> is a serverless solution that runs a processing job on demand. The job uses a Docker container with the notebook we want to run during our experiment, together with all our common files that need to support the notebook (<code>requirements.txt<\/code> and the multi-processing functions code in a separate notebook).<\/li>\n<li><strong>Amazon ECR<\/strong> \u2013 <a href=\"https:\/\/aws.amazon.com\/ecr\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Container Registry<\/a> (Amazon ECR) stores our Docker container.<\/li>\n<li><strong>SageMaker Processing job<\/strong> \u2013 We can run this <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/processing-job.html\" target=\"_blank\" rel=\"noopener noreferrer\">data processing job<\/a> on any ML machine, and it runs our notebook with parameters.<\/li>\n<li><strong>DataPlate<\/strong> \u2013 This service helps us use SQL and join several data sources easily. It translates it to Spark code and optimizes it, while monitoring data access and helping reduce data breaches. The Xtra version provided even more capabilities.<\/li>\n<li><strong>Amazon EMR<\/strong> \u2013 This service runs our data extractions as workloads over Spark, contacting all our data resources.<\/li>\n<\/ul>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image001.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32371\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image001.png\" alt=\"\" width=\"1183\" height=\"666\"><\/a><\/p>\n<p>With the SageMaker notebook instance lifecycle, we can control the maximum notebook instance runtime, using the <code>autostop.py<\/code> <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-notebook-instance-lifecycle-config-samples\/blob\/master\/scripts\/auto-stop-idle\/autostop.py\" target=\"_blank\" rel=\"noopener noreferrer\">template<\/a> script.<\/p>\n<p>After testing the ML frameworks, we chose the SageMaker MXNet kernel for our clustering and ranking phases.<\/p>\n<p>To test the notebook code on our production data, we ran the notebook by encapsulating it via Docker in Amazon ECS and ran it as a processing job to validate the maximum runtime on different types of machines.<\/p>\n<p>The Docker container also helps us share resources among notebooks\u2019 tests. In some cases, a notebook calls other notebooks to utilize a multi-process by splitting big data frames into smaller data frames, which can run simultaneously on each vCPU in a large machine type.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image004.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32373\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image004.png\" alt=\"\" width=\"936\" height=\"436\"><\/a><\/p>\n<h2>The real-time production inference solution<\/h2>\n<p>In the research phase, we used Parquet <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) files to maintain our recommendations. These are consumed once a day from our engineering pipeline to attach the recommendations to our alerts\u2019 mechanism.<\/p>\n<p>However, our roadmap requires a higher refresh rate solution and pulling once a day isn\u2019t enough in the long term, because we want to provide recommendations even during the investigation.<\/p>\n<p>To implement this solution at scale, we tested most of the SageMaker endpoint solutions in our anomaly-detection research. We tested 500 of the pre-built models with a single endpoint machine of various types and used concurrent multi-threaded clients to perform requests to the endpoint. We measured the response time, CPU, memory, and other metrics (for more information, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/monitoring-cloudwatch.html\" target=\"_blank\" rel=\"noopener noreferrer\">Monitor Amazon SageMaker with Amazon CloudWatch<\/a>). We found that the multi-model endpoint is a perfect fit for our use cases.<\/p>\n<p>A multi-model endpoint can reduce our costs dramatically in comparison to a single endpoint or even Kubernetes to use Flask (or other Python) web services. Our first assumption was that we must provide a single endpoint, using a 4-vCPU small machine, for each customer, and on average query four dedicated models, because each vCPU serves one model. With the multi-model endpoint, we could aggregate more customers on a single multi-endpoint machine.<\/p>\n<p>We had a model and encoding files per customer, and after doing load tests, we determined that we could serve 50 customers, each using 10 models and even using the smallest ml.t2.medium instance for our solutions.<\/p>\n<p>In this stage, we considered using <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/multi-model-endpoints.html\" target=\"_blank\" rel=\"noopener noreferrer\">multi-model endpoints<\/a>. Multi-model endpoints provide a scalable and cost-effective solution to deploy a large number of models, enabling you to host multiple models with a single inference container. This reduces hosting costs by improving endpoint utilization compared to using multiple small single-model endpoints that each serve a single customer. It also reduces deployment overhead because SageMaker manages loading models in memory and scaling them based on the traffic patterns to them.<\/p>\n<p>Furthermore, the multi-model endpoint advantage is that if you have a high inference rate from specific customers, its framework preserves the last serving models in memory for better performance.<\/p>\n<p>After we estimated costs using multi-model endpoints vs. standard endpoints, we found out that it could potentially lead to cost reduction of approximately 80%.<\/p>\n<h2>The outcome<\/h2>\n<p>In this section, we review the steps and the outcome of the process.<\/p>\n<p>We use the lifecycle notebook configuration to enable running the notebooks as processing jobs, by encapsulating the notebook in a Docker container in order to validate the code faster and use the autostop mechanism:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">#!\/bin\/bash\n\n# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"). You\n# may not use this file except in compliance with the License. A copy of\n# the License is located at\n#\n#     http:\/\/aws.amazon.com\/apache2.0\/\n#\n# or in the \"license\" file accompanying this file. This file is\n# distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF\n# ANY KIND, either express or implied. See the License for the specific\n# language governing permissions and limitations under the License.\n\nset -e\n\n# OVERVIEW\n# This script installs the sagemaker_run_notebook extension package in SageMaker Notebook Instance\n#\n# There are two parameters you need to set:\n# 1. S3_LOCATION is the place in S3 where you put the extension tarball\n# 2. TARBALL is the name of the tar file that you uploaded to S3. You should just need to check\n#    that you have the version right.\nsudo -u ec2-user -i &lt;&lt;'EOF'\n# PARAMETERS\nVERSION=0.18.0\nEXTENSION_NAME=sagemaker_run_notebook\n# Set up the user setting and workspace directories\nmkdir -p \/home\/ec2-user\/SageMaker\/.jupyter-user\/{workspaces,user-settings}\n# Run in the conda environment that the Jupyter server uses so that our changes are picked up\nsource \/home\/ec2-user\/anaconda3\/bin\/activate JupyterSystemEnv\n# Install the extension and rebuild JupyterLab so it picks up the new UI\naws s3 cp s3:\/\/aws-emr-resources-11111111-us-east-1\/infra-sagemaker\/sagemaker_run_notebook-0.18.0-Logz-latest.tar.gz .\/sagemaker_run_notebook-0.18.0-Logz-latest.tar.gz\npip install sagemaker_run_notebook-0.18.0-Logz-latest.tar.gz\n\njupyter lab build\nsource \/home\/ec2-user\/anaconda3\/bin\/deactivate\nEOF\n\n# sudo -u ec2-user -i &lt;&lt;'EOF'\n# PARAMETERS\nfor PACKAGE in pandas dataplate awswrangler==2.0.0 ipynb==0.5.1 prison==0.1.3 PyMySQL==0.10.1 requests==2.25.0 scipy==1.5.4 dtaidistance joblib sagemaker_run_notebook-0.18.0-Logz-latest.tar.gz fuzzywuzzy==0.18.0; do\n  echo $PACKAGE\n\n  # Note that \"base\" is special environment name, include it there as well.\n  for env in base \/home\/ec2-user\/anaconda3\/envs\/*; do\n      source \/home\/ec2-user\/anaconda3\/bin\/activate $(basename \"$env\")\n      if [ $env = 'JupyterSystemEnv' ]; then\n          continue\n      fi\n      pip install --upgrade \"$PACKAGE\"\n      source \/home\/ec2-user\/anaconda3\/bin\/deactivate\n  done\ndone\njupyter lab build\n\n# Tell Jupyter to use the user-settings and workspaces directory on the EBS\n# volume.\necho \"export JUPYTERLAB_SETTINGS_DIR=\/home\/ec2-user\/SageMaker\/.jupyter-user\/user-settings\" &gt;&gt; \/etc\/profile.d\/jupyter-env.sh\necho \"export JUPYTERLAB_WORKSPACES_DIR=\/home\/ec2-user\/SageMaker\/.jupyter-user\/workspaces\" &gt;&gt; \/etc\/profile.d\/jupyter-env.sh\n\n# The Jupyter server needs to be restarted to pick up the server part of the\n# extension. This needs to be done as root.\ninitctl restart jupyter-server --no-wait\n\n# OVERVIEW\n# This script stops a SageMaker notebook once it's idle for more than 2 hour (default time)\n# You can change the idle time for stop using the environment variable below.\n# If you want the notebook the stop only if no browsers are open, remove the --ignore-connections flag\n#\n# Note that this script will fail if either condition is not met\n#   1. Ensure the Notebook Instance has internet connectivity to fetch the example config\n#   2. Ensure the Notebook Instance execution role permissions to SageMaker:StopNotebookInstance to stop the notebook\n#       and SageMaker:DescribeNotebookInstance to describe the notebook.\n# PARAMETERS\nIDLE_TIME=3600\n\necho \"Fetching the autostop script\"\nwget https:\/\/raw.githubusercontent.com\/aws-samples\/amazon-sagemaker-notebook-instance-lifecycle-config-samples\/master\/scripts\/auto-stop-idle\/autostop.py\n\necho \"Starting the SageMaker autostop script in cron\"\n\n(crontab -l 2&gt;\/dev\/null; echo \"*\/5 * * * * \/usr\/bin\/python $PWD\/autostop.py --time $IDLE_TIME --ignore-connections\") | crontab -<\/code><\/pre>\n<\/p><\/div>\n<p>We clone the <a href=\"https:\/\/github.com\/aws-samples\/sagemaker-run-notebook\" target=\"_blank\" rel=\"noopener noreferrer\">sagemaker-run-notebook<\/a> GitHub project, and add the following to the container:<\/p>\n<ul>\n<li>Our pip requirements<\/li>\n<li>The ability to run notebooks from within a notebook, which enables us multi-processing behavior to utilize all the\u00a0ml.m5.12xlarge\u00a0instance cores<\/li>\n<\/ul>\n<p>This enables us to run workflows that consist of many notebooks running as processing jobs in a line of code, while defining the instance type to run on.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image006.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32374\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image006.png\" alt=\"\" width=\"699\" height=\"84\"><\/a><\/p>\n<p>Because we can add parameters to the notebook, we can scale our processing by running simultaneously at different hours, days, or months to pull and process data.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image008.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32375\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image008.png\" alt=\"\" width=\"705\" height=\"388\"><\/a><\/p>\n<p>We can also create scheduling jobs that run notebooks (and even limit the run time).<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image010.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32376\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image010.png\" alt=\"\" width=\"704\" height=\"215\"><\/a><\/p>\n<p>We also can observe the last runs and their details, such as processing time.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image012.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32377\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image012.png\" alt=\"\" width=\"1407\" height=\"321\"><\/a><\/p>\n<p>With the papermill that is used in the container, we can view the output of every run, which helps us debug in production.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image014.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32378\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image014.png\" alt=\"\" width=\"704\" height=\"76\"><\/a><\/p>\n<p>Our notebook output review is in the form of a standard read-only notebook.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image016.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32379\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/ML-6266-image016.png\" alt=\"\" width=\"702\" height=\"334\"><\/a><\/p>\n<p>Multi-processing utilization helps us scale on each notebook processing and utilize all its cores. We generated functions in other notebooks that can do heavy processing, such as the following:<\/p>\n<ul>\n<li>Explode JSONs<\/li>\n<li>Find relevant rows in a DataFrame while the main notebook splits the DataFrame in <code>#cpu-cores<\/code> elements<\/li>\n<li>Run clustering per alert type actions simultaneously<\/li>\n<\/ul>\n<p>We then add these functional notebooks into the container that runs the notebook as a processing job. See the following Docker file (notice the COPY commands):<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">ARG BASE_IMAGE=need_an_image\nFROM $BASE_IMAGE\n\nENV JUPYTER_ENABLE_LAB yes\nENV PYTHONUNBUFFERED TRUE\n\nCOPY requirements.txt \/tmp\/requirements.txt\nRUN pip install papermill jupyter nteract-scrapbook boto3 requests==2.20.1\nRUN pip install -r \/tmp\/requirements.txt\n\nENV PYTHONUNBUFFERED=TRUE\nENV PATH=\"\/opt\/program:${PATH}\"\n\n# Set up the program in the image\nCOPY multiprocessDownloadNormalizeFunctions.ipynb \/tmp\/multiprocessDownloadNormalizeFunctions.ipynb\nCOPY multiprocessFunctions.ipynb \/tmp\/multiprocessFunctions.ipynb\nCOPY run_notebook execute.py \/opt\/program\/\nENTRYPOINT [\"\/bin\/bash\"]\n\n# because there is a bug where you have to be root to access the directories\nUSER root<\/code><\/pre>\n<\/p><\/div>\n<h2>Results<\/h2>\n<p>During the research phase, we evaluated the option to run our notebooks as is to experiment and evaluate how our code performs on all our relevant data, not just a sample of data. We found that encapsulating our notebooks using processing jobs can be a great fit for us, because we don\u2019t need to rewrite code and we can utilize the power of AWS\u00a0compute optimized and memory optimized\u00a0instances and follow the status of the process easily.<\/p>\n<p>During the inference assessment, we evaluated various SageMaker endpoint solutions. We found that using a multi-model endpoint can help us serve approximately 50 customers, each having multiple (approximately 10) models in a single instance, which can meet our low-latency constraints, and therefore save us up to 80% of the cost.<\/p>\n<p>With this solution architecture, we were able to reduce the MTTR of our customers, which is a main metric for measuring success using our platform. It reduces the total time from the point of responding to our alert link, which describes an issue in your systems, to when you\u2019re done investigating the problem using our platform. During the investigation phase, we measure the users\u2019 actions with and without our ML recommendation solution. This helps us provide recommendations for the best action to resolve the specific issue faster and pinpoint anomalies to identify the actual cause of the problem.<\/p>\n<h2>Conclusion and next steps<\/h2>\n<p>In this post, we shared how Logz.io used SageMaker to improve MTTD and MTTR.<\/p>\n<p>As a next step, we\u2019re considering expanding the solution with the following features:<\/p>\n<p>We encourage you to try out <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/nbi.html\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker notebooks<\/a>. For more examples, check out the <a href=\"https:\/\/github.com\/aws\/amazon-sagemaker-examples\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker examples GitHub repo<\/a>.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/Amit-Gross.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-32370 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/20\/Amit-Gross.jpg\" alt=\"\" width=\"100\" height=\"104\"><\/a><strong>Amit Gross<\/strong> is leading the Research department of Logz.io, which is responsible for the AI solutions of all Logz.io products, from the research phase to the integration phase. Prior to Logz.io Amit has managed both Data Science and Security Research Groups at Here inc. and Cellebrite inc. Amit has M.Sc in computer science from Tel-Aviv University.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/11\/Yaniv-Vaknin.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-29222 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/11\/Yaniv-Vaknin.jpg\" alt=\"\" width=\"100\" height=\"142\"><\/a>Yaniv Vaknin<\/strong>\u00a0is a Machine Learning Specialist at Amazon Web Services. Prior to AWS, Yaniv held leadership positions with AI startups and Enterprise including co-founder and CEO of Dipsee.ai. Yaniv works with AWS customers to harness the power of Machine Learning to solve real world tasks and derive value. In his spare time, Yaniv enjoys playing soccer with his boys.<\/p>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-20347 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/04\/23\/Eitan-Sela.jpg\" alt=\"\" width=\"100\" height=\"133\">Eitan Sela<\/strong>\u00a0is a Machine Learning Specialist Solutions Architect with Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them build and operate machine learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/how-logz-io-accelerates-ml-recommendations-and-anomaly-detection-solutions-with-amazon-sagemaker\/<\/p>\n","protected":false},"author":0,"featured_media":1502,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1501"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1501"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1501\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1502"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1501"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1501"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1501"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}