{"id":436,"date":"2020-10-21T22:47:10","date_gmt":"2020-10-21T22:47:10","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/10\/21\/performing-batch-fraud-predictions-using-amazon-fraud-detector-amazon-s3-and-aws-lambda\/"},"modified":"2020-10-21T22:47:10","modified_gmt":"2020-10-21T22:47:10","slug":"performing-batch-fraud-predictions-using-amazon-fraud-detector-amazon-s3-and-aws-lambda","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/10\/21\/performing-batch-fraud-predictions-using-amazon-fraud-detector-amazon-s3-and-aws-lambda\/","title":{"rendered":"Performing batch fraud predictions using Amazon Fraud Detector, Amazon S3, and AWS Lambda"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/aws.amazon.com\/fraud-detector\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Fraud Detector<\/a> is a fully managed service that makes it easy to identify potentially fraudulent online activities, such as the creation of fake accounts or online payment fraud. Unlike general-purpose machine learning (ML) packages, Amazon Fraud Detector is designed specifically to detect fraud. Amazon Fraud Detector combines your data, the latest in ML science, and more than 20 years of fraud detection experience from <a href=\"http:\/\/amazon.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon.com<\/a> and AWS to build ML models tailor-made to detect fraud in your business.<\/p>\n<p>This post walks you through how to use Amazon Fraud Detector with <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) and <a href=\"http:\/\/aws.amazon.com\/lambda\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Lambda<\/a> to perform a batch of fraud predictions on event records (such as account registrations and transactions) in a CSV file. This architecture enables you to trigger a batch of predictions automatically upon uploading your CSV file to Amazon S3 and retrieve the fraud prediction results in a newly generated CSV also stored in Amazon S3.<\/p>\n<h2>Solution overview<\/h2>\n<p>Amazon Fraud Detector can perform low-latency fraud predictions, enabling your company to dynamically adjust the customer experience in your applications based on real-time fraud risk detection. But suppose you want to generate fraud predictions for a batch of events after the fact; perhaps you don\u2019t need a low-latency response and want to evaluate events on an hourly or daily schedule. How do you accomplish this using Amazon Fraud Detector? One approach is to use an Amazon S3 event notification to trigger a Lambda function that processes a CSV file of events stored in Amazon S3 when the file is uploaded to an input S3 bucket. The function runs each event through Amazon Fraud Detector to generate predictions using a detector (ML model and rules) and uploads the prediction results to an S3 output bucket. The following diagram illustrates this architecture.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-17296 size-full\" title=\"Solution architecture\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/20\/1-Architecture-1.jpg\" alt=\"\" width=\"900\" height=\"405\"><\/p>\n<p>To create this Lambda-based batch prediction system, you complete the following high-level steps:<\/p>\n<ol>\n<li>Create and publish a detector version containing a fraud detection model and rules, or simply a ruleset.<\/li>\n<li>Create two S3 buckets. The first bucket is used to land your CSV file, and the second bucket is where your Lambda function writes the prediction results to.<\/li>\n<li>Create an <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) role to use as the execution role in the Lambda function.<\/li>\n<li>Create a Lambda function that reads in a CSV file from Amazon S3, calls the Amazon Fraud Detector <code>get_event_prediction<\/code> function for each record in the CSV file, and writes a CSV file to Amazon S3.<\/li>\n<li>Add an Amazon S3 event trigger to invoke your Lambda function whenever a new CSV file is uploaded to the S3 bucket.<\/li>\n<li>Create a sample CSV file of event records to test the batch prediction process.<\/li>\n<li>Test the end-to-end process by uploading your sample CSV file to your input S3 bucket and reviewing prediction results in the newly generated CSV file in your output S3 bucket.<\/li>\n<\/ol>\n<h2>Creating and publishing a detector<\/h2>\n<p>You can create and publish a detector version using the Amazon Fraud Detector console or via the APIs. For console instructions, see <a href=\"https:\/\/docs.aws.amazon.com\/frauddetector\/latest\/ug\/get-started.html\" target=\"_blank\" rel=\"noopener noreferrer\">Get started (console)<\/a> or <a href=\"https:\/\/aws.amazon.com\/blogs\/aws\/amazon-fraud-detector-is-now-generally-available\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Fraud Detector is now Generally Available<\/a>. After you complete this step, note the following items, which you need in later steps:<\/p>\n<ul>\n<li>AWS Region you created the detector in<\/li>\n<li>Detector name and version<\/li>\n<li>Name of the entity type and event type used by your detector<\/li>\n<li>List of variables for the entity type used in your detector<\/li>\n<\/ul>\n<p>The following screenshot shows the detail view of a detector version.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-17297 size-full\" title=\"Detail view of a detector version\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/20\/2-NewAcctScreenshot.jpg\" alt=\"\" width=\"900\" height=\"362\"><\/p>\n<p>The following screenshot shows the detail view of an event type.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-17298 size-full\" title=\"Detail view of an event type\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/20\/3-Screenshot-2.jpg\" alt=\"\" width=\"900\" height=\"496\"><\/p>\n<h2>Creating the input and output S3 buckets<\/h2>\n<p>Create the following S3 buckets on the Amazon S3 console:<\/p>\n<ul>\n<li>\n<strong>fraud-detector-input<\/strong> \u2013 Where you upload the CSV file containing events for batch predictions<\/li>\n<li>\n<strong>fraud-detector-output<\/strong> \u2013 Where the Lambda function writes the prediction results file<\/li>\n<\/ul>\n<p>Make sure you create your buckets in the same Region as your detector. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/user-guide\/create-bucket.html\" target=\"_blank\" rel=\"noopener noreferrer\">How do I create an S3 Bucket?<\/a><\/p>\n<h2>Creating the IAM role<\/h2>\n<p>To create the <a href=\"https:\/\/docs.aws.amazon.com\/lambda\/latest\/dg\/lambda-intro-execution-role.html\" target=\"_blank\" rel=\"noopener noreferrer\">execution role<\/a> in IAM that gives your Lambda function permission to access the AWS resources required for this solution, complete the following steps:<\/p>\n<ol>\n<li>On the IAM console, choose <strong>Roles<\/strong>.<\/li>\n<li>Choose <strong>Create role<\/strong>.<\/li>\n<li>Select <strong>Lambda<\/strong>.<\/li>\n<li>Choose <strong>Next<\/strong>.<\/li>\n<li>Attach the following policies:\n<ul>\n<li>\n<strong>AWSLambdaBasicExecutionRole<\/strong> \u2013 Provides the Lambda function with write permissions to <a href=\"http:\/\/aws.amazon.com\/cloudwatch\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon CloudWatch Logs<\/a>.<\/li>\n<li>\n<strong>AWSXRayDaemonWriteAccess<\/strong> \u2013 Allows the <a href=\"http:\/\/aws.amazon.com\/xray\" target=\"_blank\" rel=\"noopener noreferrer\">AWS X-Ray<\/a> daemon to relay raw trace data and retrieve sampling data to be used by X-Ray.<\/li>\n<li>\n<strong>AmazonFraudDetectorFullAccessPolicy<\/strong> \u2013 Provides permissions to create resources and generate fraud predictions in Amazon Fraud Detector.<\/li>\n<li>\n<strong>AmazonS3FullAccess<\/strong> \u2013 Provides the Lambda function permissions to read and write objects in Amazon S3. This policy provides broad Amazon S3 access; as a best practice, consider reducing the scope of this policy to the S3 buckets required for this example, or use an inline policy such as the following:<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-json\">{\r\n    \"Version\": \"2012-10-17\",\r\n    \"Statement\": [\r\n        {\r\n            \"Sid\": \"VisualEditor0\",\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n                \"s3:PutObject\",\r\n                \"s3:GetObject\"\r\n            ],\r\n            \"Resource\": [\r\n                \"arn:aws:s3:::fraud-detector-input\/*\",\r\n                \"arn:aws:s3:::fraud-detector-output\/*\"\r\n            ]\r\n        }\r\n    ]\r\n}\r\n<\/code><\/pre>\n<\/div>\n<ol start=\"6\">\n<li>Choose <strong>Next<\/strong>.<\/li>\n<li>Enter a name for your role (for example, lambda-s3-role).<\/li>\n<li>Choose <strong>Create role<\/strong>.<\/li>\n<\/ol>\n<h2>Creating the Lambda function<\/h2>\n<p>Now let\u2019s create our Lambda function on the Lambda console.<\/p>\n<ol>\n<li>On the Lambda console, choose <strong>Create function<\/strong>.<\/li>\n<li>For <strong>Function<\/strong> name, enter a name (for example, afd-batch-function).<\/li>\n<li>For <strong>Runtime<\/strong>, choose <strong>Python 3.8<\/strong>.<\/li>\n<li>For <strong>Execution role<\/strong>, select <strong>Use an existing role<\/strong>.<\/li>\n<li>For <strong>Existing role<\/strong>, choose the role you created.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-17299 size-full\" title=\"Creating the Lambda function\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/20\/4-BasicInformation.jpg\" alt=\"\" width=\"900\" height=\"540\"><\/p>\n<ol start=\"6\">\n<li>Choose <strong>Create function<\/strong>\n<\/li>\n<\/ol>\n<p>Next, we walk through sections of the code used in the Lambda function. This code goes into the <strong>Function code<\/strong> section of your Lambda function. The full Lambda function code is available in the next section.<\/p>\n<h3>Packages<\/h3>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">import json\r\nimport csv\r\nimport boto3\r\n<\/code><\/pre>\n<\/div>\n<h3>Defaults<\/h3>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\"># -- make a connection to fraud detector -- \r\nclient = boto3.client(\"frauddetector\")\r\n# -- S3 bucket to write scored data to -- \r\nS3_BUCKET_OUT = \"fraud-detector-output\"\r\n# -- specify event, entity, and detector  -- \r\nENTITY_TYPE    = \"customer\"\r\nEVENT_TYPE     = \"new_account_registration_full_details\"\r\nDETECTOR_NAME  = \"new_account_detector\"\r\nDETECTOR_VER   = \"3\"\r\n<\/code><\/pre>\n<\/div>\n<p>We have entered the values from the detector we created and the output S3 bucket. Replace these default values with the values you used when creating your output S3 bucket and Amazon Fraud Detector resources.<\/p>\n<h3>Functions<\/h3>\n<p>We use a few helper functions along with the main <code>lambda_handler()<\/code> function:<\/p>\n<ul>\n<li>\n<strong>get_event_variables(EVENT_TYPE)<\/strong> \u2013 Returns a list of the variables for the event type. We map these to the input file positions.<\/li>\n<li>\n<strong>prep_record(record_map, event_map, line)<\/strong> \u2013 Returns a record containing just the data required by the detector.<\/li>\n<li>\n<strong>get_score(event, record)<\/strong> \u2013 Returns the fraud prediction risk scores and rule outcomes from the Amazon Fraud Detector <code>get_event_prediction<\/code>function. The <code>get_score<\/code> function uses two extra helper functions to format model scores (<code>prep_scores<\/code>) and rule outcomes (<code>prep_outcomes<\/code>).<\/li>\n<\/ul>\n<p>Finally, the <code>lambda_handler(event, context)<\/code> drives the whole process. See the following example code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">get_event_variables(EVENT_TYPE)\r\ndef get_event_variables(EVENT_TYPE):\r\n    \"\"\" return list of event variables \r\n    \"\"\"\r\n    response = client.get_event_types(name=EVENT_TYPE)\r\n    event_variables = []\r\n\r\n    for v in response['eventTypes'][0]['eventVariables']:\r\n        event_variables.append(v)\r\n    return event_variables\r\nprep_record(record_map, event_map, line)\r\ndef prep_record(record_map, event_map, line):\r\n    \"\"\" structure the record for scoring \r\n    \"\"\"\r\n    record = {}\r\n    for key in record_map.keys():\r\n        record[key] = line[record_map[key]]\r\n        \r\n    event = {}\r\n    for key in event_map.keys():\r\n        event[key] = line[event_map[key]]\r\n    return record, event\r\n\r\nprep_scores(model_scores)\r\ndef prep_scores(model_scores):\r\n    \"\"\" return list of models and scores\r\n    \"\"\"\r\n    detector_models = []\r\n    for m in model_scores:\r\n        detector_models.append(m['scores'])\r\n    return detector_models\r\n\r\nprep_outcomes(rule_results)\r\ndef prep_outcomes(rule_results):\r\n    \"\"\" return list of rules and outcomes \r\n    \"\"\"\r\n    detector_outcomes = []\r\n    for rule in rule_results:\r\n        rule_outcomes ={}\r\n        rule_outcomes[rule['ruleId']] = rule['outcomes']\r\n        detector_outcomes.append(rule_outcomes)\r\n    return detector_outcomes \r\n\r\ndef get_score(event, record):\r\ndef get_score(event, record):\r\n    \"\"\" return the score to the function\r\n    \"\"\"\r\n    pred_rec = {}\r\n    \r\n    try:\r\n        pred = client.get_event_prediction(detectorId=DETECTOR_NAME, \r\n                                       detectorVersionId=DETECTOR_VER,\r\n                                       eventId = event['EVENT_ID'],\r\n                                       eventTypeName = EVENT_TYPE,\r\n                                       eventTimestamp = event['EVENT_TIMESTAMP'], \r\n                                       entities = [{'entityType': ENTITY_TYPE, 'entityId':event['ENTITY_ID']}],\r\n                                       eventVariables=  record) \r\n                                       \r\n        pred_rec[\"score\"]   = prep_scores(pred['modelScores'])\r\n        pred_rec[\"outcomes\"]= prep_outcomes(pred['ruleResults'])\r\n\r\n    except: \r\n        pred_rec[\"score\"]   = [-999]\r\n        pred_rec[\"outcomes\"]= [\"error\"]\r\n    \r\n    return pred_rec\r\n<\/code><\/pre>\n<\/div>\n<p>The following is the full code for the Lambda function:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">import boto3 \r\nimport csv\r\nimport json\r\n\r\n# -- make a connection to fraud detector -- \r\nclient = boto3.client(\"frauddetector\")\r\n\r\n# -- S3 bucket to write batch predictions out to -- \r\nS3_BUCKET_OUT = \"fraud-detector-output\"\r\n\r\n# -- specify event, entity, and detector  -- \r\nENTITY_TYPE    = \"customer\"\r\nEVENT_TYPE     = \"new_account_registration_full_details\"\r\nDETECTOR_NAME  = \"new_account_detector\"\r\nDETECTOR_VER   = \"3\"\r\n\r\ndef get_event_variables(EVENT_TYPE):\r\n    \"\"\" return list of event variables \r\n    \"\"\"\r\n    response = client.get_event_types(name=EVENT_TYPE)\r\n    event_variables = []\r\n\r\n    for v in response['eventTypes'][0]['eventVariables']:\r\n        event_variables.append(v)\r\n    return event_variables\r\n\r\ndef prep_record(record_map, event_map, line):\r\n    \"\"\" structure the record for scoring \r\n    \"\"\"\r\n    record = {}\r\n    for key in record_map.keys():\r\n        record[key] = line[record_map[key]]\r\n        \r\n    event = {}\r\n    for key in event_map.keys():\r\n        event[key] = line[event_map[key]]\r\n    return record, event\r\n\r\ndef prep_scores(model_scores):\r\n    \"\"\" return list of models and scores\r\n    \"\"\"\r\n    detector_models = []\r\n    for m in model_scores:\r\n        detector_models.append(m['scores'])\r\n    return detector_models\r\n\r\ndef prep_outcomes(rule_results):\r\n    \"\"\"return list of rules and outcomes\r\n    \"\"\"\r\n    detector_outcomes = []\r\n    for rule in rule_results:\r\n        rule_outcomes = {}\r\n        rule_outcomes[rule['ruleId']] = rule['outcomes']\r\n        detector_outcomes.append(rule_outcomes)\r\n    return detector_outcomes\r\n\r\ndef get_score(event, record):\r\n    \"\"\" return the score to the function\r\n    \"\"\"\r\n    pred_rec = {}\r\n    \r\n    try:\r\n        pred = client.get_event_prediction(detectorId=DETECTOR_NAME, \r\n                                       detectorVersionId=DETECTOR_VER,\r\n                                       eventId = event['EVENT_ID'],\r\n                                       eventTypeName = EVENT_TYPE,\r\n                                       eventTimestamp = event['EVENT_TIMESTAMP'], \r\n                                       entities = [{'entityType': ENTITY_TYPE, 'entityId':event['ENTITY_ID']}],\r\n                                       eventVariables=  record) \r\n                                       \r\n        pred_rec[\"score\"]   = prep_scores(pred['modelScores'])\r\n        pred_rec[\"outcomes\"]= prep_outcomes(pred['ruleResults'])\r\n\r\n    except: \r\n        pred_rec[\"score\"]   = [-999]\r\n        pred_rec[\"outcomes\"]= [\"error\"]\r\n    \r\n    return pred_rec\r\n\r\ndef lambda_handler(event, context):\r\n    \"\"\" the lambda event handler triggers the process. \r\n    \"\"\"\r\n    S3_BUCKET_IN = event['Records'][0]['s3']['bucket']['name']\r\n    S3_FILE      = event['Records'][0]['s3']['object']['key']\r\n    S3_OUT_FILE  = \"batch_{0}\".format(S3_FILE)\r\n    \r\n    \r\n    # -- open a temp file to write predictions to. \r\n    f = open(\"\/tmp\/csv_file.csv\", \"w+\")\r\n    temp_csv_file = csv.writer(f) \r\n    \r\n    # -- get the input file -- \r\n    s3    = boto3.resource('s3')\r\n    obj   = s3.Object(S3_BUCKET_IN, S3_FILE)\r\n    data  = obj.get()['Body'].read().decode('utf-8').splitlines()\r\n    lines = csv.reader(data)\r\n    \r\n    # -- get the file header -- \r\n    file_variables = next(lines)\r\n    \r\n    # -- write the file header to temporary file -- \r\n    temp_csv_file.writerow(file_variables + [\"MODEL_SCORES\", \"DETECTOR_OUTCOMES\"])\r\n    \r\n    # -- get list of event variables -- \r\n    event_variables = get_event_variables(EVENT_TYPE)\r\n    \r\n    # -- map event variables to file structure -- \r\n    record_map = {}\r\n    for var in event_variables:\r\n        record_map[var] = file_variables.index(var)\r\n    \r\n    # -- map event fields to file structure --\r\n    event_map = {}\r\n    for var in ['ENTITY_ID', 'EVENT_ID', 'EVENT_TIMESTAMP']:\r\n        event_map[var] = file_variables.index(var)\r\n    \r\n   # -- for each record in the file, prep it, score it, write it to temp. \r\n    for i,line in enumerate(lines):\r\n        record, event       = prep_record(record_map, event_map, line)\r\n        record_pred         = get_score(event, record)\r\n        #print(list(record_pred.values()))\r\n        temp_csv_file.writerow(line + list(record_pred.values()))\r\n    \r\n    \r\n    # -- close the temp file and upload it to your OUTPUT bucket    \r\n    f.close()\r\n    s3_client = boto3.client('s3')\r\n    s3_client.upload_file('\/tmp\/csv_file.csv', S3_BUCKET_OUT, \"batch_pred_results_\" + S3_FILE  )\r\n    \r\n    return {\r\n        'statusCode': 200,\r\n        'body': json.dumps('Batch Complete!')\r\n    }\r\n<\/code><\/pre>\n<\/div>\n<p>After you add the code to your Lambda function, choose <strong>Deploy<\/strong> to save.<\/p>\n<h2>Configuring your Lambda settings and creating the Amazon S3 trigger<\/h2>\n<p>The batch prediction processes require memory and time to process, so we need to change the Lambda function\u2019s default memory allocation and maximum run time.<\/p>\n<ol>\n<li>On the Lambda console, locate your function.<\/li>\n<li>On the function detail page, under <strong>Basic settings<\/strong>, choose <strong>Edit<\/strong>.<\/li>\n<li>For <strong>Memory<\/strong>, choose <strong>2048 MB<\/strong>.<\/li>\n<li>For <strong>Timeout<\/strong>, enter 15 <strong>min<\/strong>.<\/li>\n<li>Choose <strong>Save<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-17300 size-full\" title=\"Configuring your Lambda settings\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/20\/5-EditSettings.jpg\" alt=\"\" width=\"900\" height=\"927\"><\/p>\n<p>A 15-minute timeout allows the function to process up to roughly 4,000 predictions per batch, so you should keep this in mind as you consider your CSV file creation and upload strategy.<\/p>\n<p>You can now make it so that this Lambda function triggers when a CSV file is uploaded to your input S3 bucket.<\/p>\n<ol start=\"6\">\n<li>At the top of the Lambda function detail page, in the <strong>Designer<\/strong> box, choose <strong>Add trigger<\/strong>.<\/li>\n<li>Choose <strong>S3<\/strong>.<\/li>\n<li>For <strong>Bucket<\/strong>, choose your input S3 bucket.<\/li>\n<li>For <strong>Suffix<\/strong>, enter <code>.csv<\/code>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-17301 size-full\" title=\"Creating the Amazon S3 trigger\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/20\/6-AddTrigger.jpg\" alt=\"\" width=\"900\" height=\"838\"><\/p>\n<p>A warning about recursive invocation appears. You don\u2019t want to trigger a read and write to the same bucket, which is why you created a second S3 bucket for the output.<\/p>\n<ol start=\"10\">\n<li>Select the check-box to acknowledge the recursive invocation warning.<\/li>\n<li>Choose <strong>Add<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-17302 size-full\" title=\"Recursive invocation warning\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/20\/7-Recursive.jpg\" alt=\"\" width=\"900\" height=\"248\"><\/p>\n<h2>Creating a sample CSV file of event records<\/h2>\n<p>We need to create a sample CSV file of event records to test the batch prediction process. In this CSV file, include a column for each variable in your event type schema. In addition, include columns for:<\/p>\n<ul>\n<li>\n<strong>EVENT_ID<\/strong> \u2013 An identifier for the event, such as a transaction number. The field values must satisfy the following regular expression pattern: <code>^[0-9a-z_-]+$<\/code>.<\/li>\n<li>\n<strong>ENTITY_ID<\/strong> \u2013 An identifier for the entity performing the event, such as an account number. The field values must also satisfy the following regular expression pattern: <code>^[0-9a-z_-]+$<\/code>.<\/li>\n<li>\n<strong>EVENT_TIMESTAMP<\/strong> \u2013 A timestamp, in ISO 8601 format, for when the event occurred.<\/li>\n<\/ul>\n<p>Column header names must match their corresponding Amazon Fraud Detector variable names exactly.<\/p>\n<p>In your CSV file, each row corresponds to one event that you want to generate a prediction for. The following screenshot shows an example of a test CSV file.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-17303 size-full\" title=\"Test csv file example\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/20\/8-Speadsheet.jpg\" alt=\"\" width=\"900\" height=\"176\"><\/p>\n<p>For more information about Amazon Fraud Detector variable data types and formatting, see <a href=\"https:\/\/docs.aws.amazon.com\/frauddetector\/latest\/ug\/create-a-variable.html\" target=\"_blank\" rel=\"noopener noreferrer\">Create a variable<\/a>.<\/p>\n<h2>Performing a test batch prediction<\/h2>\n<p>To test our Lambda function, we simply upload our test file to the <code>fraud-detector-input<\/code> S3 bucket via the Amazon S3 console. This triggers the Lambda function. We can then check the <code>fraud-detector-output<\/code> S3 bucket for the results file.<\/p>\n<p>The following screenshot shows that the test CSV file <code>20_event_test.csv<\/code> is uploaded to the <code>fraud-detector-input<\/code> S3 bucket.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-17334\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/21\/9-Screenshot-1.jpg\" alt=\"\" width=\"900\" height=\"443\"><\/p>\n<p>When batch prediction is complete, the results CSV file <code>batch_pred_results_20_event_test.csv<\/code> is uploaded to the <code>fraud-detector-output<\/code> S3 bucket (see the following screenshot).<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-17305 size-full\" title=\"fraud-detector-output\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/20\/10-Screenshot-1.jpg\" alt=\"\" width=\"900\" height=\"447\"><\/p>\n<p>The following screenshots show our results CSV file. The new file has two new columns: <code>MODEL_SCORES<\/code> and <code>DETECTOR_OUTCOMES<\/code>. <code>MODEL_SCORES<\/code> contains model names, model details, and prediction scores for any models used in the detector. <code>DETECTOR_OUTCOMES<\/code> contains all rule results, including any matched rules and their corresponding outcomes.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-17306 size-full\" title=\"Results CSV file\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/20\/11-Table.jpg\" alt=\"\" width=\"900\" height=\"203\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-17307 size-full\" title=\"Results CSV file\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/20\/12-Table.jpg\" alt=\"\" width=\"900\" height=\"200\"><\/p>\n<p>If the results file doesn\u2019t appear in the output S3 bucket, you can check the CloudWatch log stream to see if the Lambda function ran into any issues. To do this, go to your Lambda function on the Lambda console and choose the <strong>Monitoring<\/strong> tab, then choose <strong>View logs in CloudWatch. <\/strong>In CloudWatch, choose the log stream covering the time period you uploaded your CSV file.<\/p>\n<h2>Conclusion<\/h2>\n<p>Congrats! You have successfully performed a batch of fraud predictions. Depending on your use case, you may want to use your prediction results in other AWS services. For example, you can analyze the prediction results in <a href=\"https:\/\/aws.amazon.com\/quicksight\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon QuickSight<\/a> or send results that are high risk to <a href=\"https:\/\/aws.amazon.com\/augmented-ai\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Augmented AI<\/a> (Amazon A2I) for a human review of the prediction.<\/p>\n<p>Amazon Fraud Detector has a 2-month free trial that includes 30,000 predictions per month. After that, pricing starts at $0.005 per prediction for rules-only predictions and $0.03 for ML-based predictions. For more information, see <a href=\"https:\/\/aws.amazon.com\/fraud-detector\/pricing\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Fraud Detector pricing<\/a>. For more information about Amazon Fraud Detector, including links to additional blog posts, sample notebooks, user guide, and API documentation, see <a href=\"https:\/\/aws.amazon.com\/fraud-detector\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Fraud Detector<\/a>.<\/p>\n<p>The next step is to start dropping files into your S3 bucket! Good luck!<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-17336 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/10\/21\/Tostenrude.jpg\" alt=\"\" width=\"101\" height=\"136\">Nick Tostenrude <\/strong>is a Senior Manager of Product in AWS, where he leads the Amazon Fraud Detector service team. Nick joined Amazon nine years ago. He has spent the past four years as part of the AWS Fraud Prevention organization. Prior to AWS, Nick spent five years in Amazon\u2019s Kindle and Devices organizations, leading product teams focused on the Kindle reading experience, accessibility, and K-12 Education.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-12239 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/05\/07\/mike-ames.jpg\" alt=\"\" width=\"100\" height=\"131\">Mike Ames<\/strong> is a Research Science Manager working on Amazon Fraud Detector. He helps companies use machine learning to combat fraud, waste and abuse. In his spare time, you can find him jamming to 90s metal with an electric mandolin.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/performing-batch-fraud-predictions-using-amazon-fraud-detector-amazon-s3-and-aws-lambda\/<\/p>\n","protected":false},"author":0,"featured_media":437,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/436"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=436"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/436\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/437"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=436"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=436"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=436"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}