{"id":1083,"date":"2021-10-27T08:42:36","date_gmt":"2021-10-27T08:42:36","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/10\/27\/train-models-faster-with-an-automated-data-profiler-for-amazon-fraud-detector\/"},"modified":"2021-10-27T08:42:36","modified_gmt":"2021-10-27T08:42:36","slug":"train-models-faster-with-an-automated-data-profiler-for-amazon-fraud-detector","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/10\/27\/train-models-faster-with-an-automated-data-profiler-for-amazon-fraud-detector\/","title":{"rendered":"Train models faster with an automated data profiler for Amazon Fraud Detector"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/aws.amazon.com\/fraud-detector\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Fraud Detector<\/a> is a fully managed service that makes it easy to identify potentially fraudulent online activities, such as the creation of fake accounts or online payment fraud. Amazon Fraud Detector uses machine learning (ML) under the hood and is based on over 20 years of fraud detection expertise from Amazon. It automatically identifies potentially fraudulent activity in milliseconds\u2014with no ML expertise required.<\/p>\n<p>To train a model in Amazon Fraud Detector, you need to supply a historical dataset. Amazon Fraud Detector doesn\u2019t require any data science knowledge to use; however, it does have <a href=\"https:\/\/docs.aws.amazon.com\/frauddetector\/latest\/ug\/online-fraud-insights.html#preparing-training-data\" target=\"_blank\" rel=\"noopener noreferrer\">certain requirements<\/a> on the data quality and formats to ensure the robustness of the ML models. You may sometimes encounter model training errors due to simple format and validation errors, which lead to extra time and effort to re-prepare the data and retrain the model. In addition, Amazon Fraud Detector requires you to define a <a href=\"https:\/\/docs.aws.amazon.com\/frauddetector\/latest\/ug\/create-a-variable.html\" target=\"_blank\" rel=\"noopener noreferrer\">variable type<\/a> for each variable in the dataset during model creation. It may be helpful to have suggestions on selecting Amazon Fraud Detector variable types based on your data statistics.<\/p>\n<p>In this post, we present an automated data profiler for Amazon Fraud Detector. It can generate an intuitive and comprehensive report of your dataset, which includes suggested Amazon Fraud Detector variable types for each variable in the dataset, and data quality issues that may potentially fail model training or hurt model performance. The data profiler also provides an option to reformat and transform the dataset to satisfy requirements in Amazon Fraud Detector, which can avoid some potential validation errors in model training. This automated data profiler is built with an <a href=\"http:\/\/aws.amazon.com\/cloudformation\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CloudFormation<\/a> stack, which you can easily launch with a few clicks, and it doesn\u2019t require any data science or programming knowledge.<\/p>\n<h2>Overview of solution<\/h2>\n<p>The following diagram illustrates the architecture of the automated data profiler, which uses <a href=\"https:\/\/aws.amazon.com\/glue\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Glue<\/a>, <a href=\"http:\/\/aws.amazon.com\/lambda\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Lambda<\/a>, <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3), and AWS CloudFormation.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image001.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29887 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image001.png\" alt=\"\" width=\"2098\" height=\"1052\"><\/a><\/p>\n<p>You can launch the data profiler with the quick launch feature of AWS CloudFormation. The stack creates and triggers a Lambda function, which automatically triggers an AWS Glue job. The AWS Glue job reads your CSV data file, profiles and reformats your data, and saves the HTML report file and formatted copy of the CSV to an S3 bucket.<\/p>\n<p>The following screenshot shows a sample profiling report. You can also view the <a href=\"https:\/\/github.com\/aws-samples\/aws-fraud-detector-samples\/blob\/master\/profiler\/CloudFormationSolution\/report_BlogExample.html\" target=\"_blank\" rel=\"noopener noreferrer\">full sample report<\/a>.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image003.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29888 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image003.png\" alt=\"\" width=\"1428\" height=\"1790\"><\/a><\/p>\n<p>The sample report, synthetic dataset, and codes of the automated data profiler are available on <a href=\"https:\/\/github.com\/aws-samples\/aws-fraud-detector-samples\/tree\/master\/profiler\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub<\/a>.<\/p>\n<h2>Launch the data profiler<\/h2>\n<p>Follow these steps to launch the profiler:<\/p>\n<ol>\n<li>Choose the following AWS CloudFormation <a href=\"https:\/\/us-west-2.console.aws.amazon.com\/cloudformation\/home?region=us-west-2#\/stacks\/create\/review?templateURL=https:\/\/amazon-frauddetector-cfn-templates.s3.amazonaws.com\/AFD_Data_Cleaner\/afd_data_analyzer_cfn_template.yaml\" target=\"_blank\" rel=\"noopener noreferrer\">quick launch link<\/a>.<br \/><a href=\"https:\/\/us-west-2.console.aws.amazon.com\/cloudformation\/home?region=us-west-2#\/stacks\/create\/review?templateURL=https:\/\/amazon-frauddetector-cfn-templates.s3.amazonaws.com\/AFD_Data_Cleaner\/afd_data_analyzer_cfn_template.yaml\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15948 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/16\/2-LaunchStack.jpg\" alt=\"\" width=\"107\" height=\"20\"><\/a><\/li>\n<\/ol>\n<p>This opens an AWS CloudFormation quick launch page.<\/p>\n<ol start=\"2\">\n<li>Choose your Region to create all the resources in that Region.<\/li>\n<li>For <strong>CSVFilePath<\/strong>, enter S3 path to your CSV file.<\/li>\n<\/ol>\n<p>The output profiling report and formatted CSV file are saved under the same bucket.<\/p>\n<ol start=\"4\">\n<li>For<strong> EventTimestampColumn<\/strong>, enter the header name of the event timestamp column.<\/li>\n<\/ol>\n<p>This is a mandatory column required by Amazon Fraud Detector. The data formatter converts this header name to <code>EVENT_TIMESTAMP<\/code>.<\/p>\n<ol start=\"5\">\n<li>For <strong>LabelColumn<\/strong>, enter the header name of the label column.<\/li>\n<\/ol>\n<p>This is a mandatory column required by Amazon Fraud Detector. The data formatter converts this header name to <code>EVENT_LABEL<\/code>.<\/p>\n<ol start=\"6\">\n<li>For <strong>FileDelimiter<\/strong>, enter the delimiter of your CSV file (by default, this is a comma).<\/li>\n<li>For <strong>FormatCSV<\/strong>, choose whether you want to format the CSV file to the Amazon Fraud Detector required format (by default, this is <strong>Yes<\/strong>).<\/li>\n<\/ol>\n<p>This transforms the header names, timestamp formats, and label formats. The formatted copy of your CSV data is saved in the same bucket as the input CSV.<\/p>\n<ol start=\"8\">\n<li>For <strong>DropTimestampMissingRows<\/strong>, choose whether you want to drop rows with missing timestamp in the formatted copy of the CSV.<\/li>\n<\/ol>\n<p>Events with a missing timestamp aren\u2019t used by Amazon Fraud Detector, and may cause validation errors, so we suggest setting this to <strong>Yes<\/strong>.<\/p>\n<ol start=\"9\">\n<li>For <strong>DropLabelMissingRows<\/strong>, choose whether you want to drop rows with missing labels.<\/li>\n<li>For <strong>ProfileCSV<\/strong>, choose whether you want to profile the CSV file (by default, this is <strong>Yes<\/strong>).<\/li>\n<\/ol>\n<p>This generates a profiling report of your CSV data and saves it in the same bucket as the input CSV.<\/p>\n<ol start=\"11\">\n<li>For <strong>ReportSuffix (Optional)<\/strong>, specify a suffix for the report (the report is named <code>report_<span>&lt;ReportSuffix&gt;<\/span>.html<\/code>).<\/li>\n<li>For <strong>FeatureCorr<\/strong>, choose whether you want to show pair-wise feature correlation in the profiling report.<\/li>\n<\/ol>\n<p>The correlation shows for each pair of features, how much one feature depends on the other. Note that computing pair-wise feature correlation takes an additional 10\u201320 minutes, so the option is set to <strong>No<\/strong> by default.<\/p>\n<ol start=\"13\">\n<li>For <strong>FraudLabels (Optional)<\/strong>, specify which label values should be considered as fraud.<\/li>\n<\/ol>\n<p>The report shows the distribution of mapped labels, namely fraud and non-fraud. You can specify multiple label values by separating with a comma, for example, <code>suspicious<\/code>, <code>fraud<\/code>. If you leave this option blank, the report shows the distribution of the original label values.<\/p>\n<p>The following example plots illustrate using <code>FraudLabels=\u2019suspicious,fraud\u2019<\/code> (left) and empty <code>FraudLabels<\/code> (right).<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image006-new.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29889 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image006-new.png\" alt=\"\" width=\"1127\" height=\"395\"><\/a><\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image007.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29783\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image007.png\" alt=\"\" width=\"1598\" height=\"1880\"><\/a><\/p>\n<p>Wait a few minutes for the following resources to be created:<\/p>\n<ul>\n<li><strong>DataAnalyzerGlueJob<\/strong> \u2013 The AWS Glue job that profiles and formats your data.<\/li>\n<li><strong>AWSGlueJobRole <\/strong>\u2013 The <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) role for the AWS Glue job with <code>AWSGlueServiceRole<\/code> and <code>AWSGlueConsoleFullAccess<\/code> policies. It also has a customer managed policy with permissions to read and write files to the bucket defined in <code>CSVFilePath<\/code>.<\/li>\n<li><strong>S3CustomResource<\/strong> <strong>and<\/strong> <strong>AWSLambdaFunction<\/strong> \u2013 The helper Lambda function and AWS CloudFormation resource to trigger the AWS Glue job.<\/li>\n<li><strong>AWSLambdaExecutionRole <\/strong>\u2013 The IAM role for the Lambda function to trigger the AWS Glue job with <code>AWSGlueServiceNotebookRole<\/code>, <code>AWSGlueServiceRole<\/code>, and <code>AWSLambdaExecute<\/code> policies.<\/li>\n<\/ul>\n<ol start=\"14\">\n<li>When the AWS Glue job is complete, which is typically a few minutes after the stack creation, open the output S3 bucket.<\/li>\n<\/ol>\n<p>If your input file S3 path is <code>s3:\/\/my_bucket\/my_file.csv<\/code>, the output files are saved under the folder <code>s3:\/\/my_bucket\/afd_data_my_file<\/code>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image012.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29890 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image012.png\" alt=\"\" width=\"1430\" height=\"457\"><\/a><\/p>\n<h2>Examine the data profiler report<\/h2>\n<p>The data profiler generates an HTML report that lists your data statistics. We use a synthetic dataset to walk you through each section of the report.<\/p>\n<h3>Overview<\/h3>\n<p>This section describes the overall statistics of your data, such as record count and data range.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image014.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29891 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image014.png\" alt=\"\" width=\"1431\" height=\"448\"><\/a><\/p>\n<h3>Field summary<\/h3>\n<p>This section describes the basic statistics of each your feature. The inferred variable type is provided as a reference for mapping variables in your data to a list of <a href=\"https:\/\/docs.aws.amazon.com\/frauddetector\/latest\/ug\/create-a-variable.html#variable-types\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Fraud Detector predefined variable types<\/a>. The inferred variable type is based on data statistics. We recommend choosing variable types based on your own domain knowledge wherever possible, and refer to the suggested variable type if you\u2019re unsure.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image013.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29786\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image013.png\" alt=\"\" width=\"1319\" height=\"230\"><\/a><\/p>\n<h3>Field warnings<\/h3>\n<p>This section shows the warning messages from basic data validation of Amazon Fraud Detector, including number of unique values and number of missing values. You can refer to <a href=\"https:\/\/docs.aws.amazon.com\/frauddetector\/latest\/ug\/troubleshoot.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Fraud Detector troubleshoot<\/a> for suggested solutions.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image015.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29787\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image015.png\" alt=\"\" width=\"2640\" height=\"404\"><\/a><\/p>\n<h3>Data and label maturity<\/h3>\n<p>This section shows the fraud distribution of your data over time. The chart is interactive (see the following screenshot for an example): scrolling the pointer over the plot allows you to zoom in or out; dragging the plot left or right changes the x-axis ranges; and toggling the legend can hide or show corresponding bars or curves. You can click <strong>Reset zoom<\/strong> to reset the chart.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/interactive_demo.gif\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29796\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/interactive_demo.gif\" alt=\"\" width=\"1309\" height=\"382\"><\/a><\/p>\n<p>You should check that there is enough time for label maturity. The maturity period is dependent on your business, and can take anywhere from 2 weeks to 90 days. For example, if your label maturity is 30 days, make sure that the latest records in your dataset are at least 30 days old.<\/p>\n<p>You should also check that the label distribution is relatively stable over time. Make sure that events of different label classes are from the same time period.<\/p>\n<h3>Categorical feature analysis<\/h3>\n<p>This section shows the label distribution across categories for each categorical feature. You can see the number of records of each label class within a category and corresponding percentages. By default, it displays the top 100 categories, and you can drag the plot and scroll to see up to 500 categories in total.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/categorical_demo.gif\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29797\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/categorical_demo.gif\" alt=\"\" width=\"1321\" height=\"623\"><\/a><\/p>\n<p>You can choose from several sorting options to use the one that best fits your needs:<\/p>\n<ul>\n<li><strong>Sort by most records<\/strong> \u2013 Shows the categories with the most records, which reflects the general distribution of categories.<\/li>\n<li><strong>Sort by most records of label=NON-FRAUD<\/strong> \u2013 Shows the categories with the most records of the NON-FRAUD class. Those categories contribute to most legitimate population.<\/li>\n<li><strong>Sort by most records of label\u2260NON-FRAUD<\/strong> \u2013 Shows the categories with the most records of the FRAUD class. Those categories contribute to most fraud population.<\/li>\n<li><strong>Sort by lowest percentage of label=NON-FRAUD<\/strong> \u2013 Shows the categories with the highest FRAUD rate, which are the risky categories.<\/li>\n<\/ul>\n<p>You can choose which data to plot on the <strong>Data Showing Options<\/strong> menu. Toggling the legends can also show or hide the corresponding bars or curves.<\/p>\n<h3>Numeric feature analysis<\/h3>\n<p>This section shows the label distribution of each numeric feature. The numerical values are partitioned into bins, and you can see the number of records of each label class, as well as percentage, within each bin.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image021.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29790\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image021.png\" alt=\"\" width=\"1293\" height=\"585\"><\/a><\/p>\n<h3>Feature and label correlation<\/h3>\n<p>This section shows the correlation between each feature and the label in one plot. You can combine this correlation plot with the <a href=\"https:\/\/docs.aws.amazon.com\/frauddetector\/latest\/ug\/model-variable-importance.html\" target=\"_blank\" rel=\"noopener noreferrer\">model variable importance<\/a> values generated by Amazon Fraud Detector after model training to identify potential label leakage. For example, if a feature has over 0.99 correlation with label and it has significantly higher variable importance than other features, there\u2019s a risk of label leakage on that feature. Label leakage happens when the label is fully dependent on one feature. As a result, the model is heavily overfitted on that feature and doesn\u2019t learn the actual fraud pattern. Features with label leakage should be excluded in model training.<\/p>\n<p>The following plot shows an example of correlation between features and <code>EVENT_LABEL<\/code>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image023.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29791\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image023.png\" alt=\"\" width=\"1278\" height=\"516\"><\/a><\/p>\n<p>If <code>FeatureCorr<\/code> is set to <strong>Yes<\/strong> in the CloudFormation stack configuration, you have a second plot showing pair-wise feature correlations. Darker colors indicate higher correlation. For features with high correlation, you should double-check if that is expected in your business. If two features have a correlation equal to 1, you can consider removing either of them to reduce model complexity. However, this isn\u2019t required because Amazon Fraud Detector model is robust to feature collinearity.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image025-1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29803 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image025-1.png\" alt=\"\" width=\"800\" height=\"810\"><\/a><\/p>\n<h2>Data cleaning<\/h2>\n<p>The data profiler also has an option to convert your CSV file to comply with the <a href=\"https:\/\/docs.aws.amazon.com\/frauddetector\/latest\/ug\/online-fraud-insights.html#preparing-training-data\" target=\"_blank\" rel=\"noopener noreferrer\">data format requirements<\/a> of Amazon Fraud Detector:<\/p>\n<ul>\n<li><strong>Header name transformation<\/strong> \u2013 Transforms the event timestamp and label column headers to <code>EVENT_TIMESTAMP<\/code> and <code>EVENT_LABEL<\/code>. All other headers are converted to lowercase alphanumeric with only _ as a special character. Make sure when you create an event type, the variables are defined as those transformed values.<\/li>\n<li><strong>Timestamp transformation<\/strong> \u2013 Transforms the <code>EVENT_TIMESTAMP<\/code> column to ISO 8601 standard in UTC.<\/li>\n<li><strong>Event label transformation<\/strong> \u2013 Converts your label values to all lowercase alphanumeric with only _ as a special character. Make sure when you create an event type, the labels are defined as those transformed values.<\/li>\n<\/ul>\n<p>The following screenshots compare original data to formatted data, where <code>DropTimestampMissingRows<\/code> and <code>DropLabelMissingRows<\/code> are set to <strong>Yes<\/strong>.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image029-combined.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29798\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML-6290-image029-combined.png\" alt=\"\" width=\"1099\" height=\"230\"><\/a><\/p>\n<h2>Clean up the resources<\/h2>\n<p>You can use AWS CloudFormation to clean up all the resources created for data profiler.<\/p>\n<ol>\n<li>On the AWS CloudFormation console, choose <strong>Stacks<\/strong> in the navigation pane.<\/li>\n<li>Select the CloudFormation stack and choose <strong>Delete<\/strong>.<a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image034.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29894 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/26\/ML-6290-image034.png\" alt=\"\" width=\"1430\" height=\"239\"><\/a><\/li>\n<\/ol>\n<p>All the resources, including IAM roles, AWS Glue job, and Lambda function, are removed. Note that the profiling report and reformatted data are not deleted.<\/p>\n<h2>Conclusion<\/h2>\n<p>This post walks through the automated data profiler and cleaner for Amazon Fraud Detector. This is a convenient and useful tool for preparing your data for Amazon Fraud Detector. The next steps are to build an end-to-end fraud detector via the Amazon Fraud Detector console. For more information, see the <a href=\"https:\/\/docs.aws.amazon.com\/frauddetector\/latest\/ug\/get-started.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Fraud Detector User Guide<\/a> and <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/category\/artificial-intelligence\/amazon-fraud-detector\/\" target=\"_blank\" rel=\"noopener noreferrer\">related blog posts<\/a>.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><strong> <a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/Hao-Zhu.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-29801 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/Hao-Zhu.jpg\" alt=\"\" width=\"100\" height=\"133\"><\/a>Hao Zhou<\/strong> is a Research Scientist with Amazon Fraud Detector. He holds a PhD in electrical engineering from Northwestern University, USA. He is passionate about applying machine learning techniques to combat fraud and abuse.<\/p>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/Anqi-Cheng.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-29800 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/Anqi-Cheng.jpg\" alt=\"\" width=\"100\" height=\"133\"><\/a>Anqi Cheng<\/strong> is a research scientist in Amazon Fraud Detector (AFD) team. She holds a Ph.D. in physics and joined Amazon in 2017. She has been actively working on various aspects of AFD since its very early days from exploring start-of-art machine learning algorithms, productionizing machine learning workflow, and improving the robustness and explainability of machine learning models.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/train-models-faster-with-an-automated-data-profiler-for-amazon-fraud-detector\/<\/p>\n","protected":false},"author":0,"featured_media":1084,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1083"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1083"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1083\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1084"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1083"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1083"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1083"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}