{"id":1521,"date":"2022-02-02T18:57:22","date_gmt":"2022-02-02T18:57:22","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/02\/02\/launch-processing-jobs-with-a-few-clicks-using-amazon-sagemaker-data-wrangler\/"},"modified":"2022-02-02T18:57:22","modified_gmt":"2022-02-02T18:57:22","slug":"launch-processing-jobs-with-a-few-clicks-using-amazon-sagemaker-data-wrangler","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/02\/02\/launch-processing-jobs-with-a-few-clicks-using-amazon-sagemaker-data-wrangler\/","title":{"rendered":"Launch processing jobs with a few clicks using Amazon SageMaker Data Wrangler"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/data-wrangler-getting-started.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Data Wrangler<\/a> makes it faster for data scientists and engineers to prepare data for machine learning (ML) applications by using a visual interface. Previously, when you created a Data Wrangler data flow, you could choose different export options to easily integrate that data flow into your data processing pipeline. Data Wrangler offers export options to <a href=\"https:\/\/aws.amazon.com\/s3\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3), <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/pipelines-sdk.html\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker Pipelines<\/a>, and <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/feature-store-getting-started.html\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker Feature Store<\/a>, or as Python code. The export options create a Jupyter notebook and require you to run the code to start a processing job facilitated by <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/processing-job.html\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker Processing<\/a>.<\/p>\n<p>We\u2019re excited to announce the general release of destination nodes and the Create Job feature in Data Wrangler. This feature gives you the ability to export all the transformations that you made to a dataset to a destination node with just a few clicks. This allows you to create data processing jobs and export to Amazon S3 purely via the visual interface without having to generate, run, or manage Jupyter notebooks, thereby enhancing the low-code experience. To demonstrate this new feature, we use the <a href=\"https:\/\/www.openml.org\/d\/40945\" target=\"_blank\" rel=\"noopener noreferrer\">Titanic dataset<\/a> and show how to export your transformations to a destination node.<\/p>\n<h2>Prerequisites<\/h2>\n<p>Before we learn how to use destination nodes with Data Wrangler, you should already understand how to <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/data-wrangler-getting-started.html\" target=\"_blank\" rel=\"noopener noreferrer\">access and get started with Data Wrangler<\/a>. You also need to know what a <em>data flow<\/em> means with context to Data Wrangler and how to create one by importing your data from the different data sources Data Wrangler supports.<\/p>\n<h2>Solution overview<\/h2>\n<p>Consider the following data flow named <code>example-titanic.flow<\/code>:<\/p>\n<ul>\n<li>It imports the Titanic dataset three times. You can see these different imports as separate branches in the data flow.<\/li>\n<li>For each branch, it applies a set of transformations and visualizations.<\/li>\n<li>It joins the branches into a single node with all the transformations and visualizations.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image001.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32547\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image001.png\" alt=\"\" width=\"1276\" height=\"753\"><\/a><\/li>\n<\/ul>\n<p>With this flow, you might want to process and save parts of your data to a specific branch or location.<\/p>\n<p>In the following steps, we demonstrate how to create destination nodes, export them to Amazon S3, and create and launch a processing job.<\/p>\n<h2>Create a destination node<\/h2>\n<p>You can use the following procedure to create destination nodes and export them to an S3 bucket:<\/p>\n<ol>\n<li>Determine what parts of the flow file (transformations) you want to save.<\/li>\n<li>Choose the plus sign next to the nodes that represent the transformations that you want to export. (If it\u2019s a collapsed node, you must select the options icon (three dots) for the node).<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image003.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32548\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image003.png\" alt=\"\" width=\"1276\" height=\"911\"><\/a><\/li>\n<li>Hover over <strong>Add destination<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image005.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32549\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image005.png\" alt=\"\" width=\"1276\" height=\"749\"><\/a><\/li>\n<li>Choose <strong>Amazon S3<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image007.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32550\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image007.png\" alt=\"\" width=\"1276\" height=\"751\"><\/a><\/li>\n<li>Specify the fields as shown in the following screenshot.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image009.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32551\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image009.png\" alt=\"\" width=\"1430\" height=\"839\"><\/a><\/li>\n<li>For the second join node, follow the same steps to add Amazon S3 as a destination and specify the fields.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-32546 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-.png\" alt=\"\" width=\"1276\" height=\"671\"><\/a><\/li>\n<\/ol>\n<p>You can repeat these steps as many times as you need for as many nodes you want in your data flow. Later on, you pick which destination nodes to include in your processing job.<\/p>\n<h2>Launch a processing job<\/h2>\n<p>Use the following procedure to create a processing job and choose the destination node where you want to export to:<\/p>\n<ol>\n<li>On the <strong>Data Flow<\/strong> tab, choose <strong>Create job<\/strong>.<\/li>\n<li>For <strong>Job name<\/strong>\u00b8 enter the name of the export job.<\/li>\n<li>Select the destination nodes you want to export.<\/li>\n<li>Optionally, specify the <a href=\"http:\/\/aws.amazon.com\/kms\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Key Management Service<\/a> (AWS KMS) key ARN.<\/li>\n<\/ol>\n<p>The KMS key is a cryptographic key that you can use to protect your data. For more information about KMS keys, see the <a href=\"https:\/\/docs.aws.amazon.com\/kms\/latest\/developerguide\/overview.html\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Key Developer Guide<\/a>.<\/p>\n<ol start=\"5\">\n<li>Choose <strong>Next, 2. Configure job<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image013.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32553\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image013.png\" alt=\"\" width=\"1431\" height=\"825\"><\/a><\/li>\n<li>Optionally, you can configure the job as per your needs by changing the instance type or count, or adding any tags to associate with the job.<\/li>\n<li>Choose <strong>Run<\/strong> to run the job.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image015.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32554\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image015.png\" alt=\"\" width=\"1276\" height=\"944\"><\/a><\/li>\n<\/ol>\n<p>A success message appears when the job is successfully created.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image017.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32555\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image017.png\" alt=\"\" width=\"1276\" height=\"913\"><\/a><\/p>\n<h2>View the final data<\/h2>\n<p>Finally, you can use the following steps to view the exported data:<\/p>\n<ol>\n<li>After you create the job, choose the provided link.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image019.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32556\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image019.png\" alt=\"\" width=\"1276\" height=\"668\"><\/a><\/li>\n<\/ol>\n<p>A new tab opens showing the processing job on the SageMaker console.<\/p>\n<ol start=\"2\">\n<li>When the job is complete, review the exported data on the Amazon S3 console.<\/li>\n<\/ol>\n<p>You should see a new folder with the job name you chose.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image021.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32557\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image021.png\" alt=\"\" width=\"1276\" height=\"432\"><\/a><\/p>\n<ol start=\"3\">\n<li>Choose the job name to view a CSV file (or multiple files) with the final data.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image023.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32558\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image023.png\" alt=\"\" width=\"1276\" height=\"401\"><\/a><\/li>\n<\/ol>\n<h2>FAQ<\/h2>\n<p>In this section, we address a few frequently asked questions about this new feature:<\/p>\n<ul>\n<li><strong>What happened to the Export tab? <\/strong>With this new feature, we removed the <strong>Export<\/strong> tab from Data Wrangler. You can still facilitate the export functionality via the Data Wrangler generated Jupyter notebooks from any nodes you created in the data flow with the following steps:<\/li>\n<\/ul>\n<ol>\n<li>\n<ol>\n<li>Choose the plus sign next to the node that you want to export.<\/li>\n<li>Choose <strong>Export to<\/strong>.<\/li>\n<li>Choose <strong>Amazon S3 (via Jupyter Notebook)<\/strong>.<\/li>\n<li>Run the Jupyter notebook.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image025.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32559\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/ML-7976-image025.png\" alt=\"\" width=\"1276\" height=\"715\"><\/a><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<ul>\n<li><strong>How many destinations nodes can I include in a job? <\/strong>There is a maximum of 10 destinations per processing job.<\/li>\n<li><strong>How many destination nodes can I have in a flow file? <\/strong>You can have as many destination nodes as you want.<\/li>\n<li><strong>Can I add transformations after my destination nodes? <\/strong>No, the idea is destination nodes are terminal nodes that have no further steps afterwards.<\/li>\n<li><strong>What are the supported sources I can use with destination nodes? <\/strong>As of this writing, we only support Amazon S3 as a destination source. Support for more destination source types will be added in the future. Please reach out if there is a specific one you would like to see.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this post, we demonstrated how to use the newly launched destination nodes to create processing jobs and save your transformed datasets directly to Amazon S3 via the Data Wrangler visual interface. With this additional feature, we have enhanced the tool-driven low-code experience of Data Wrangler.<\/p>\n<p>As next steps, we recommend you try the example demonstrated in this post. If you have any questions or want to learn more, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/data-wrangler-data-export.html\" target=\"_blank\" rel=\"noopener noreferrer\">Export<\/a> or leave a question in the comment section.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><strong> <a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/Alfonso-Austin-Rivera.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-32543 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/Alfonso-Austin-Rivera.png\" alt=\"\" width=\"100\" height=\"128\"><\/a>Alfonso Austin-Rivera<\/strong> is a Front End Engineer at Amazon SageMaker Data Wrangler. He is passionate about building intuitive user experiences that spark joy. In his spare time, you can find him fighting gravity at a rock-climbing gym or outside flying his drone.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/Parsa-Shahbodaghi.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-32560 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/Parsa-Shahbodaghi.jpg\" alt=\"\" width=\"100\" height=\"122\"><\/a>Parsa Shahbodaghi<\/strong> is a Technical Writer in AWS specializing in machine learning and artificial intelligence. He writes the technical documentation for Amazon SageMaker Data Wrangler and Amazon SageMaker Feature Store. In his free time, he enjoys meditating, listening to audiobooks, weightlifting, and watching stand-up comedy. He will never be a stand-up comedian, but at least his mom thinks he\u2019s funny.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/Balaji-Tummala.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-32545 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/Balaji-Tummala.png\" alt=\"\" width=\"100\" height=\"111\"><\/a>Balaji Tummala<\/strong> is a Software Development Engineer at Amazon SageMaker. He helps support Amazon SageMaker Data Wrangler and is passionate about building performant and scalable software. Outside of work, he enjoys reading fiction and playing volleyball.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/Arunprasath-Shankar.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-32544 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/01\/31\/Arunprasath-Shankar.jpg\" alt=\"\" width=\"100\" height=\"124\"><\/a>Arunprasath Shankar<\/strong> is an Artificial Intelligence and Machine Learning (AI\/ML) Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/launch-processing-jobs-with-a-few-clicks-using-amazon-sagemaker-data-wrangler\/<\/p>\n","protected":false},"author":0,"featured_media":1522,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1521"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1521"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1521\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1522"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1521"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1521"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1521"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}