{"id":1039,"date":"2021-10-15T08:38:26","date_gmt":"2021-10-15T08:38:26","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/10\/15\/how-imperva-expedites-ml-development-and-collaboration-via-amazon-sagemaker-notebooks\/"},"modified":"2021-10-15T08:38:26","modified_gmt":"2021-10-15T08:38:26","slug":"how-imperva-expedites-ml-development-and-collaboration-via-amazon-sagemaker-notebooks","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/10\/15\/how-imperva-expedites-ml-development-and-collaboration-via-amazon-sagemaker-notebooks\/","title":{"rendered":"How Imperva expedites ML development and collaboration via Amazon SageMaker notebooks"},"content":{"rendered":"<div id=\"\">\n<p><em>This is a guest post by Imperva, a solutions provider for cybersecurity.\u00a0<\/em><\/p>\n<p><a href=\"https:\/\/www.imperva.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Imperva<\/a> is a cybersecurity leader, headquartered in California, USA, whose mission is to protect data and all paths to it. In the last few years, we\u2019ve been working on integrating machine learning (ML) into our products. This includes detecting malicious activities in databases, automatically configuring security policies, and clustering security events into meaningful stories.<\/p>\n<p>As we\u2019re pushing to advance our detection capabilities, we\u2019re investing in ML models for our solutions. For example, Imperva provides an API Security service. This service aims to protect all APIs from various attacks, including attacks that traditional WAF can\u2019t easily stop, such as those described in the <a href=\"https:\/\/owasp.org\/www-project-api-security\/\" target=\"_blank\" rel=\"noopener noreferrer\">OWASP top 10<\/a>. This is a significant investment area for us, so we took steps to expedite our ML development process in order to cover more ground, efficiently research API attacks, and expedite our ability to deliver value for our customers.<\/p>\n<p>In this post, we share how we expedited ML development and collaboration via <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> notebooks.<\/p>\n<h2>Jupyter Notebooks: The common research ground<\/h2>\n<p>Data science research processes raised the attention of big tech companies and the development community to new heights. It\u2019s now easier than ever to kick off a data-driven project using managed ML services. A great example for this is the rise of citizen data scientists, which according to <a href=\"https:\/\/blogs.gartner.com\/carlie-idoine\/2018\/05\/13\/citizen-data-scientists-and-why-they-matter\/\" target=\"_blank\" rel=\"noopener noreferrer\">Gartner<\/a> are \u201c\u2018power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more expertise.\u201d<\/p>\n<p>With the expected growth of ML users, sharing experiments across teams becomes a critical parameter in the development velocity. Among the many common steps, one of the most important steps for data scientists kicking off a project would be to open up a new Jupyter notebook and dive into the challenge ahead.<\/p>\n<p>Jupyter notebooks are a cross between an IDE and a document. It provides the researcher with an easy and interactive way to test different approaches, plot the results, present and export them, while using a language and interface of their choice such as Python, R, Spark, Bash, or others.<\/p>\n<p>Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML. SageMaker includes this exact capability and more as part of its SageMaker notebooks feature.<\/p>\n<p>Anyone who has tried to use Jupyter Notebooks in a team has probably reached a point where they attempted to use a notebook belonging to someone else, only to find out it\u2019s not as easy as it sounds. Often, you just don\u2019t have access to the required notebook. On other occasions, notebooks are used locally for research, and so the code is often littered with hardcoded paths and isn\u2019t committed to any repository. Even if the code is committed to a repository of some sorts, (hopefully) the data it requires isn\u2019t committed. To sum things up, it ain\u2019t easy to collaborate with Jupyter Notebooks.<\/p>\n<p>In this post, we show you how we share data science research code at Imperva, and how we use SageMaker notebooks with additional features we\u2019ve added to support our custom requirements and enhance collaboration. We also share how all these efforts have led to a significant reduction in costs and time spent on housekeeping. Although this architecture is a good fit for us, you can choose different configurations, such as complete resource isolation with a separate file system for each user.<\/p>\n<h2>How we expedited our ML development<\/h2>\n<p>Our workflow is pretty standard, we take a subset of data, load it into a Jupyter notebook, and start exploring the data. After we have a decent understanding of the data, we start experimenting and combining different algorithms until we come up with a decent initial solution. When we have a good enough proof of concept (POC), we proceed to validate the results over time, experimenting and adjusting the algorithm as we go. Eventually, when we reach a high level of confidence, <a href=\"https:\/\/www.youtube.com\/watch?v=0dUv-jCt2aw&amp;ab_channel=AmazonWebServices\" target=\"_blank\" rel=\"noopener noreferrer\">we deliver the model<\/a> and continue to validate the results.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/13\/ML-4841-image001-1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-29294 size-full aligncenter\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/13\/ML-4841-image001-1.png\" alt=\"\" width=\"400\" height=\"189\"><\/a><\/p>\n<p>At first this process made perfect sense. We had small projects that didn\u2019t require much computing power, and we had enough time to work on them solo until we reached a POC. The projects were simple enough for us to deploy, serve, and monitor the model ourselves, or in other cases, deliver the model as a Docker container. When performance and scale were important, we would pass ownership of the model to a dev team using a specification document with pseudo-code. But times are changing, and as the team and projects grew and developed, we needed a better way to do things. We had to scale our projects when massive computing resources were required, and find a better way to pass ownership without using dull and extensive specification documents.<\/p>\n<p>Furthermore, when everyone is using some remote virtual machine or <a href=\"http:\/\/aws.amazon.com\/ec2\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Compute Cloud<\/a> (Amazon EC2) instance to run their Jupyter notebooks, their projects tend to lack documentation and get messy.<\/p>\n<h2>SageMaker notebooks<\/h2>\n<p>In comes SageMaker notebooks: a managed Jupyter Notebooks platform hosted on AWS, where you can easily create a notebook instance\u2014an EC2 (virtual computer) instance that runs a Jupyter Notebooks server. Besides the notebook now being in the cloud and accessible from everywhere, you can easily rescale the notebook instance, giving it as much computing resources as you require.<\/p>\n<p>Having unlimited computing resources is great, but it wasn\u2019t why we decided to start using SageMaker notebooks. We can summarize the objectives we wanted to achieve into three main points:<\/p>\n<ul>\n<li><strong>Making research easier <\/strong>\u2013 Creating an easy, user-friendly work environment that can be quickly accessed and shared within the research team.<\/li>\n<li><strong>Organizing data and code<\/strong> \u2013 Cutting the mess by making it easier to access data and creating a structured way to keep code.<\/li>\n<li><strong>Delivering projects<\/strong> \u2013 Creating a better way to separate research playground and production, and finding a better way to share our ideas with development teams without using extensive, dull documents.<\/li>\n<\/ul>\n<h2>Easier research<\/h2>\n<p>SageMaker notebooks reside in the cloud, making it inherently accessible from almost anywhere. Starting a Jupyter notebook takes just a few minutes and all your output from the previous run is saved, making it very simple to jump right back into it. However, our research requirements included a few additional aspects that needed a solution:<\/p>\n<ul>\n<li><strong>Quick views<\/strong> \u2013 Having the notebooks available at all times in order to review results of previous runs. If the instance where you keep your code is down, you have to start it just to look at the output. This can be frustrating, especially if you\u2019re using an expensive instance and you just want to look at your results. This cut down the time each team member had to spend waiting for the instance to start from 5\u201315 minutes to 0.<\/li>\n<li><strong>Shared views<\/strong> \u2013 Having the ability to explore cross-instance notebooks. SageMaker notebook instances are provided with dedicated storage by default. We wanted to break this wall and enable the team to work together.<\/li>\n<li><strong>Persistent libraries<\/strong> \u2013 Libraries are stored temporarily in SageMaker notebook instances. We wanted to change that to cut down the time it takes to fully install all the required libraries and shorten it by 100%, from approximately 5 minutes down to 0.<\/li>\n<li><strong>Cost-effective service<\/strong> \u2013 Optimizing costs while minimizing researchers\u2019 involvement. By default, turning an instance on and off is done manually. This could lead to unnecessary charges caused by human error.<\/li>\n<\/ul>\n<p>To bridge the gap between the default SageMaker configuration and what we were looking for, we used just two main ingredients: <a href=\"https:\/\/aws.amazon.com\/efs\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic File System<\/a> (Amazon EFS) and <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-notebook-instance-lifecycle-config-samples\" target=\"_blank\" rel=\"noopener noreferrer\">lifecycle configuration<\/a> in SageMaker. The first, as the name implies, is a file system, and the second is basically a piece of code that runs when the notebook is started or first created.<\/p>\n<h3>Shared and quick views<\/h3>\n<p>We connected this file system to all our notebook instances so that they all have a shared file system. This way we can save our code in Amazon EFS, instead of using the notebook instance\u2019s file system, and access it from any notebook instance.<\/p>\n<p>This made things easier because we can now create a read-only, small, super cheap notebook instance (for this post, let\u2019s call it the viewer instance) that always stays on, and use it to easily access code and results without needing to start the notebook instance that ran the code. Furthermore, we can now easily share code between ourselves because it\u2019s stored in a shared location instead of being kept in multiple different notebook instances.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/13\/ML-4841-image003-1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-29295 size-full aligncenter\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/13\/ML-4841-image003-1.png\" alt=\"\" width=\"400\" height=\"191\"><\/a><\/p>\n<p>So, how do you actually connect a file system to a notebook instance?<\/p>\n<p>We created a lifecycle configuration that connects an EFS to a notebook instance, and attached this configuration to every notebook instance we wanted to be part of the shared environment.<\/p>\n<p>In this section, we walk you through the lifecycle configuration script we wrote, or to be more accurate, stole shamelessly from the examples provided by AWS and mashed them together.<\/p>\n<p>The following script prefix is standard boilerplate:<\/p>\n<p>Now we connect the notebook to an EFS make sure you know the EFS instance\u2019s name:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">EFS_NAME=EFS_INSTANCE_NAME.efs.REGION.amazonaws.com\nmkdir -p \/home\/ec2-user\/SageMaker\/efs\nsudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport $EFS_NAME:\/ \/home\/ec2-user\/SageMaker\/efs\nsudo chmod go+rw \/home\/ec2-user\/SageMaker\/efs<\/code><\/pre>\n<\/p><\/div>\n<h3>Persistent and cost-effective service<\/h3>\n<p>After we connected the file system, we started thinking about working with notebooks. Because AWS charges for every hour the instance is running, we decided it would be good practice to automatically shut down the SageMaker notebook if it\u2019s idle for a while. We started with a default value of 1 hour, but by using the instance\u2019s tags, users could set any value that suits them from the SageMaker GUI. Applying the default 1-hour configuration could be defined as <em>global lifecycle configuration<\/em>, and overriding it can be defined as <em>local lifecycle configuration<\/em>. This policy effectively prevented researchers from accidentally leaving on unused instances, reducing the cost of SageMaker instances by 25%.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/13\/ML-4841-image005-1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-29296 size-full aligncenter\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/13\/ML-4841-image005-1.png\" alt=\"\" width=\"400\" height=\"158\"><\/a><\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\"># get instance tags location\nNOTEBOOK_ARN=$(jq '.ResourceArn' \/opt\/ml\/metadata\/resource-metadata.json --raw-output)\n\n# extract idle time parameter value from tags list\nIDLE_TIME=$(aws sagemaker list-tags --resource-arn $NOTEBOOK_ARN | jq '.Tags[] | select(.Key==\"idle\") | .Value')\n\n# in case idle time not specified set to one hour (3600 sec) \n[[ -z \"$IDLE_TIME\" ]] &amp;&amp; IDLE_TIME=3600\n\n# fetch the auto stop script from AWS samples repo\nwget https:\/\/raw.githubusercontent.com\/aws-samples\/amazon-sagemaker-notebook-instance-lifecycle-config-samples\/master\/scripts\/auto-stop-idle\/autostop.py\n\n# starting the SageMaker autostop script in cron\n(crontab -l 2&gt;\/dev\/null; echo \"*\/5 * * * * \/usr\/bin\/python $PWD\/autostop.py --time $IDLE_TIME --ignore-connections\") | crontab -\n\nsudo -u ec2-user -i &lt;&lt;'EOF'\nunset SUDO_UID<\/code><\/pre>\n<\/p><\/div>\n<p>So now the notebook is connected to Amazon EFS and automatically shuts down when idle. But this raised another issue\u2014by default, Python libraries in SageMaker notebook instances are installed in the ephemeral storage, meaning they get deleted when the instance is stopped and have to be reinstalled the next time the instance is started. This means we have to reinstall libraries at least once a day, which isn\u2019t the best experience and can take anywhere between a few seconds to a few minutes per package. We decided to add a script that changes this behavior and causes all library installations to be persistent by changing the Python library installation path to the notebook instance\u2019s storage (<a href=\"http:\/\/aws.amazon.com\/ebs\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Block Store<\/a>), effectively eliminated any time wasted on reinstalling packages.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/13\/ML-4841-image007-1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-29297 size-full aligncenter\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/13\/ML-4841-image007-1.png\" alt=\"\" width=\"400\" height=\"142\"><\/a><\/p>\n<p>This script runs every time the notebook instance starts, installs miniconda and some basic Python libraries in the persistent storage, and activates miniconda:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\"># use an address within the notebook instance\u2019s file system\nWORKING_DIR=\/home\/ec2-user\/SageMaker\/custom-miniconda\n# if this is the first time the lifecycle config is running - install miniconda\nif [ ! -d \"$WORKING_DIR\" ]; then\n    mkdir -p \"$WORKING_DIR\"\n    # download miniconda\n    wget https:\/\/repo.anaconda.com\/miniconda\/Miniconda3-latest-Linux-x86_64.sh -O \"$WORKING_DIR\/miniconda.sh\"\n    # install miniconda\n    bash \"$WORKING_DIR\/miniconda.sh\" -b -u -p \"$WORKING_DIR\/miniconda\"\n    # delete miniconda installer\n    rm -rf \"$WORKING_DIR\/miniconda.sh\"\n    # create a custom conda environment\n    source \"$WORKING_DIR\/miniconda\/bin\/activate\"\n    KERNEL_NAME=\"custom_python\"\n    PYTHON=\"3.9\"\n    conda create --yes --name \"$KERNEL_NAME\" python=\"$PYTHON\"\n    conda activate \"$KERNEL_NAME\"\n    pip install --quiet ipykernel\n    \n    conda install --yes numpy\n    pip install --quiet boto3 pandas matplotlib sklearn dill\n    EOF\nfi\n# activate miniconda\nsource \"$WORKING_DIR\/miniconda\/bin\/activate\"\nfor env in $WORKING_DIR\/miniconda\/envs\/*; do\n    BASENAME=$(basename \"$env\")\n    source activate \"$BASENAME\"\n    python -m ipykernel install --user --name \"$BASENAME\" --display-name \"Custom ($BASENAME)\"\nDone\n\n# disable SageMaker-provided Conda functionality, leaving in only what we've installed\necho \"c.EnvironmentKernelSpecManager.use_conda_directly = False\" &gt;&gt; \/home\/ec2-user\/.jupyter\/jupyter_notebook_config.py\nrm \/home\/ec2-user\/.condarc\nEOF<\/code><\/pre>\n<\/p><\/div>\n<p>Quick restart and we\u2019re done!<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\"># restart the Jupyter server\nrestart jupyter-server<\/code><\/pre>\n<\/p><\/div>\n<h2>Data and code organization<\/h2>\n<p>Remember the EFS that we just talked about? It\u2019s here for more.<\/p>\n<p>After storing all our code in the same location, we thought it might be better to organize it a bit.<\/p>\n<p>We decided that each team member should create their own notebook instance that only they use. However, instead of using the instance\u2019s file system, we use Amazon EFS and implement the following hierarchy:<\/p>\n<p>\u2014\u2014\u2013Team member<\/p>\n<p>\u2014\u2014\u2014\u2014\u2014-Project<\/p>\n<p>\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014code<\/p>\n<p>\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2013resources<\/p>\n<p>This way we can all easily access each other\u2019s code, but we still know what belongs to whom.<\/p>\n<p>But what about completed projects? We decided to add an additional branch for projects that have been fully documented and delivered:<\/p>\n<p>\u2014\u2014\u2013Team member<\/p>\n<p>\u2014\u2014\u2014\u2014\u2014-Project<\/p>\n<p>\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014code<\/p>\n<p>\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2013resources<\/p>\n<p>\u2014\u2014\u2013Completed projects<\/p>\n<p>\u2014\u2014\u2014\u2014\u2014-Project<\/p>\n<p>\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014code<\/p>\n<p>\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2013resources<\/p>\n<p>So now that our code is organized neatly, how do we access our data?<\/p>\n<p>We keep our data in <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) and access it via <a href=\"http:\/\/aws.amazon.com\/athena\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Athena<\/a>. This made it very easy to set a role for our notebook instances with permissions to access Athena and Amazon S3. This way, by simply using a few lines of code, and without messing around with credentials, we can easily query Athena and pull data to work on.<\/p>\n<p>On top of that, we created a dedicated network using <a href=\"http:\/\/aws.amazon.com\/vpc\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Virtual Private Cloud<\/a> (Amazon VPC), which gave the notebook instances access to our internal Git repository and private PyPI repository. This made it easy to access useful internal code and packages. The following diagram shows how it all looks in our notebooks platform.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/11\/ML-4841-image009.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29209\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/11\/ML-4841-image009.png\" alt=\"\" width=\"866\" height=\"436\"><\/a><\/p>\n<h2>Delivery<\/h2>\n<p>Finally, how do we utilize these notebooks to easily deliver projects?<\/p>\n<p>One of the great things about Jupyter notebooks is that, in addition to writing code and displaying the output, you can easily add text and headlines, thereby creating an interactive document.<\/p>\n<p>In the next few lines, we describe our delivery processes when we hand over the model to a dev team, and when we deploy the model ourselves.<\/p>\n<p>On projects where scale, performance, and reliability are a high priority, we hand over the model to be rewritten by a dev team. After we reach a mature POC, we share the notebook with the developers assigned to the project using the previously mentioned read-only notebook instance.<\/p>\n<p>The developers can now read the document, see the input and output for each block of code, and have a better understanding of how it works and why, which makes it easier for them to implement. In the past, we had to write a specification document for these types of cases, which basically means rewriting the code as pseudo code with lots of comments and explanations. Now we could simply integrate our comments and explanation into the SageMaker notebook, which saved many days of work for each project.<\/p>\n<p>On projects that don\u2019t require a dev team to rewrite the code, we reorganize the code inside a Docker container, and deploy it in a Kubernetes cluster. Although it might seem like a hassle to transform code from a notebook into a Dockerized, standard Python project, this process has its own benefits:<\/p>\n<ul>\n<li><strong>Explainability and visibility<\/strong> \u2013 Instead of explaining what your algorithm does by diving through your messy project, you can just use the notebook you worked on during the research phase.<\/li>\n<li><strong>Purpose separation<\/strong> \u2013 The research code is in the notebook, and the production code is in the Python project. You can keep researching without touching the production code and only update it when you\u2019ve had a breakthrough.<\/li>\n<li><strong>Debuggability<\/strong> \u2013 If your model runs into trouble, you can easily debug it in the notebook.<\/li>\n<\/ul>\n<h2>What\u2019s next<\/h2>\n<p>Jupyter notebooks provide a great playground for data scientists. On a smaller scale, it\u2019s very convenient to use on your local machine. However, when you start working on larger projects in larger teams, there are many advantages to moving to a managed Jupyter Notebooks server. The great thing about SageMaker notebooks is that you can customize your notebook instances, such as instance size, code sharing, and automation scripts, kernel selection, and more, which helps you save tremendous amounts of time and money<\/p>\n<p>Simply put, we created a process that expedites ML development and collaboration while reducing the cost of SageMaker notebooks by at least 25%, and reducing the overhead time researchers spend on installations and waiting for instances to be ready to work.<\/p>\n<p>Our current SageMaker notebooks environment contains the following:<\/p>\n<ul>\n<li>Managed Jupyter notebook instances<\/li>\n<li>Separate, customizable computing instances for each user<\/li>\n<li>Shared file system used to organize projects and easily share code with peers<\/li>\n<li>Lifecycle configurations that reduce costs and make it easier to start working<\/li>\n<li>Connection to data sources, code repositories, and package indexes<\/li>\n<\/ul>\n<p>We plan on making this environment even better by adding a few additional features:<\/p>\n<ul>\n<li><strong>Cost monitoring<\/strong> \u2013 To monitor our budget, we\u2019ll add a special tag to each instance in order to track their cost.<\/li>\n<li><strong>Auto save state<\/strong> \u2013 We\u2019ll create a lifecycle configuration that automatically saves a notebook\u2019s state, allowing users to easily restore the notebook\u2019s state even after it was shut down.<\/li>\n<li><strong>Restricted permissions system<\/strong> \u2013 We want to enable users from different groups to participate in our research and explore our data by letting them create notebook instances and access our data, but under predefined restrictions. For example, they\u2019ll only be able to create small, inexpensive notebook instances, and access only a part of the data.<\/li>\n<\/ul>\n<p>As a next step, we encourage you to try out <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/nbi.html\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker notebooks<\/a>. For more examples, check out the <a href=\"https:\/\/github.com\/aws\/amazon-sagemaker-examples\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker examples GitHub repo<\/a>.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><strong> <a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/11\/Matan-Lion.jpeg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-29221 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/11\/Matan-Lion.jpeg\" alt=\"\" width=\"100\" height=\"159\"><\/a> Matan Lion<\/strong> is Data Science team leader at Imperva\u2019s Threat Research Group. His team is responsible for delivering data-driven solutions and cyber security innovation across the company products portfolio, including application and data security frontlines, leveraging big data and machine learning<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/11\/Jonathan-Azaria.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-29220 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/11\/Jonathan-Azaria.png\" alt=\"\" width=\"100\" height=\"110\"><\/a>Johnathan Azaria<\/strong> is Data Scientist and a member of Imperva Research Labs, a premier research organization for security analysis, vulnerability discovery and compliance expertise. Prior to the data science role, Johnathan was a security researcher specialized in network and application based attacks. Johnathan holds a\u00a0<a title=\"http:\/\/B.Sc\" href=\"http:\/\/b.sc\/\">B.Sc<\/a>\u00a0and an\u00a0<a title=\"http:\/\/M.Sc\" href=\"http:\/\/m.sc\/\">M.Sc<\/a>\u00a0in Bioinformatics from Bar Ilan University.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/11\/Yaniv-Vaknin.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-29222 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/11\/Yaniv-Vaknin.jpg\" alt=\"\" width=\"100\" height=\"142\"><\/a>Yaniv Vaknin<\/strong> is a Machine Learning Specialist at Amazon Web Services. Prior to AWS, Yaniv held leadership positions with AI startups and Enterprise including co-founder and CEO of Dipsee.ai. Yaniv works with AWS customers to harness the power of Machine Learning to solve real world tasks and derive value. In his spare time, Yaniv enjoys playing soccer with his boys.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/how-imperva-expedites-ml-development-and-collaboration-via-amazon-sagemaker-notebooks\/<\/p>\n","protected":false},"author":0,"featured_media":1040,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1039"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1039"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1039\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1040"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1039"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1039"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1039"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}