{"id":342,"date":"2020-10-02T06:47:03","date_gmt":"2020-10-02T06:47:03","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/10\/02\/building-an-end-to-end-intelligent-document-processing-solution-using-aws\/"},"modified":"2020-10-02T06:47:03","modified_gmt":"2020-10-02T06:47:03","slug":"building-an-end-to-end-intelligent-document-processing-solution-using-aws","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/10\/02\/building-an-end-to-end-intelligent-document-processing-solution-using-aws\/","title":{"rendered":"Building an end-to-end intelligent document processing solution using AWS"},"content":{"rendered":"<div id=\"\">\n<p>As organizations grow larger in size, so does the need for having better document processing. In industries such as healthcare, legal, insurance, and banking, the continuous influx of paper-based or PDF documents (like invoices, health charts, and insurance claims) have pushed businesses to consider evolving their document processing capabilities. In such scenarios, businesses and organizations find themselves in a race against time to deploy a sophisticated document analysis pipeline that can handle these documents in an automated and scalable fashion.<\/p>\n<p>You can use <a target=\"_blank\" href=\"https:\/\/aws.amazon.com\/textract\/\" rel=\"noopener noreferrer\">Amazon Textract<\/a> and <a target=\"_blank\" href=\"https:\/\/aws.amazon.com\/augmented-ai\/\" rel=\"noopener noreferrer\">Amazon Augmented AI<\/a> (Amazon A2I) to <a target=\"_blank\" href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/using-amazon-textract-with-amazon-augmented-ai-for-processing-critical-documents\/\" rel=\"noopener noreferrer\">process critical documents<\/a> and for your <a target=\"_blank\" href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/setting-up-human-review-of-your-nlp-based-entity-recognition-models-with-amazon-sagemaker-ground-truth-amazon-comprehend-and-amazon-a2i\/\" rel=\"noopener noreferrer\">NLP-based entity recognition models with Amazon SageMaker Ground Truth, Amazon Comprehend, and Amazon A2I<\/a>. This post introduces another way to create a retrainable end-to-end document analysis solution with Amazon Textract, <a target=\"_blank\" href=\"https:\/\/aws.amazon.com\/comprehend\/\" rel=\"noopener noreferrer\">Amazon Comprehend<\/a>, and Amazon A2I.<\/p>\n<p>This solution takes scanned images of physical documents as input and extracts the text using Amazon Textract. It sends the text to be analyzed by a custom entity recognizer trained in Amazon Comprehend. Machine Learning applications such as Amazon Comprehend work really well at scale, and in order to achieve 100% accuracy, you can use human reviewers to review and validate low confidence predictions. Additionally, you can use this human input to improve your underlying machine learning models. This is done by sending the output from Amazon Comprehend to be reviewed by human reviewers using Amazon A2I so that you can feed it back to retrain the models and improve the quality for future iterations. You can also use Amazon A2I to provide human oversight to your machine learning models and randomly send some data for human review to sample the output quality of your custom entity recognizer. This automated pipeline can scale to millions of documents with the help of these services and allow businesses to do more detailed analysis of their documents.<\/p>\n<h2>Solution overview<\/h2>\n<p>The following diagram illustrates the solution architecture.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16132\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/1-DocumentAnalysisSolution.jpg\" alt=\"\" width=\"900\" height=\"471\"><\/p>\n<p>This solution takes images (scanned documents or screenshots or pictures of documents) as input. You can upload these files programmatically or through the <a href=\"http:\/\/aws.amazon.com\/console\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Management Console<\/a> into an <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) bucket in the <code>input<\/code> folder. This action triggers an <a href=\"https:\/\/aws.amazon.com\/lambda\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Lambda<\/a> function, <code>TextractComprehendLambda<\/code>, through <a href=\"https:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/dev\/NotificationHowTo.html\" target=\"_blank\" rel=\"noopener noreferrer\">event notifications<\/a>.<\/p>\n<p>The <code>TextractComprehendLambda<\/code> function sends the image to Amazon Textract to extract the text from the image. When it acquires the results, it collates the results and sends the text to the Amazon Comprehend custom entity recognizer. The custom entity recognizer is a pre-trained model that identifies entities in the text that are valuable to your business. This post demonstrates how to do this, in detail, in the following sections.<\/p>\n<p>The custom entity recognizer stores the results in a separate bucket, which acts as a temporary storage for this data. This bucket has another event notification, which triggers the <code>ComprehendA2ILambda<\/code> function. This Lambda function takes the output from the custom entity recognizer, processes it, and send the results to Amazon A2I by creating a human loop for review and verification.<\/p>\n<p>Amazon A2I starts the human loop, providing reviewers an interface to double-check and correct the results that may not have been identified in the custom entity recognition process. These reviewers submit their responses through the Amazon A2I worker console. When the human loop is complete, Amazon A2I sends an <a href=\"http:\/\/aws.amazon.com\/cloudwatch\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon CloudWatch<\/a> event, which triggers the <code>HumanReviewCompleted<\/code> Lambda.<\/p>\n<p>The <code>HumanReviewCompleted<\/code> function checks if the human reviewers have added any more annotations (because they found more custom entities). If the human reviewers found something that the custom entity recognizer missed, the function creates a new file called <code>updated_entity_list.txt<\/code>. This file contains all the entities that weren\u2019t present in the previous training dataset.<\/p>\n<p>At the end of each day, a CloudWatch alarm triggers the <code>NewEntityCheck<\/code> function. This function compares the <code>entity_list.txt<\/code> file and the <code>updated_entity_list.txt<\/code> file to check if any new entities were added in the last day. If so, it starts a new Amazon Comprehend custom entity recognizer training job and enables the CloudWatch time-based event trigger that triggers the <code>CERTrainingCompleteCheck<\/code> function every 15 minutes.<\/p>\n<p>The <code>CERTrainingCompleteCheck<\/code> function checks if the Amazon Comprehend custom entity recognizer has finished training. If so, the function adds the entries from <code>updated_entity_list.txt<\/code> to <code>entity_list.txt<\/code> so it doesn\u2019t train the model again, unless even more entities are found by the human reviewers. It also disables its own CloudWatch time-based event trigger, because it doesn\u2019t need to check the training process until it starts again. The next invocation of the <code>TextractComprehend<\/code> function uses the new custom entity recognizer, which has learned from the previous reviews of the humans.<\/p>\n<p>All these Lambda functions use <a href=\"https:\/\/docs.aws.amazon.com\/systems-manager\/latest\/userguide\/systems-manager-parameter-store.html\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Systems Manager Parameter Store<\/a> for sharing, retaining, and updating the various variables, like which custom entity recognizer is the current one and where all the data is stored.<\/p>\n<p>We demonstrate this solution in the <code>us-east-1<\/code> Region but, you can run it in any compatible Region. For more information about availability of services in your Region, see the <a href=\"https:\/\/aws.amazon.com\/about-aws\/global-infrastructure\/regional-product-services\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Region Table<\/a>.<\/p>\n<h2>Prerequisites<\/h2>\n<p>This post requires that you have an <a href=\"https:\/\/signin.aws.amazon.com\/signin?redirect_uri=https%3A%2F%2Fportal.aws.amazon.com%2Fbilling%2Fsignup%2Fresume&amp;client_id=signup\" target=\"_blank\" rel=\"noopener noreferrer\">AWS account<\/a> with appropriate <a href=\"https:\/\/aws.amazon.com\/iam\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) permissions to launch the <a href=\"http:\/\/aws.amazon.com\/cloudformation\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CloudFormation<\/a> template.<\/p>\n<h2>Deploying your solution<\/h2>\n<p>To deploy your solution, you complete the following high-level steps:<\/p>\n<ol>\n<li>Create an S3 bucket.<\/li>\n<li>Create a custom entity recognizer.<\/li>\n<li>Create a human review workflow.<\/li>\n<li>Deploy the CloudFormation stack.<\/li>\n<\/ol>\n<h3>Creating an S3 bucket<\/h3>\n<p>You first create the main bucket for this post. You use it to receive the input (the original scans of documents), and store the outputs for each step of the analysis. The Lambda functions pick up the results at the end of each state and collate them for further use and record-keeping. For instructions on creating a bucket, see <a href=\"https:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/user-guide\/create-bucket.html\" target=\"_blank\" rel=\"noopener noreferrer\">Create a Bucket<\/a>.<\/p>\n<p>Capture the name of the S3 bucket and save it to use later in this walkthrough. We refer this bucket as <span><em>&lt;primary_bucket&gt;<\/em> <\/span>in this post. Replace this with the name of your actual bucket as you follow along.<\/p>\n<h3>Creating a custom entity recognizer<\/h3>\n<p>Amazon Comprehend allows you to bring your own training data, and train custom entity recognition models to customize the entity recognition process to your business-specific use cases. You can do this without having to write any code or have any in-house machine learning (ML) expertise. For this post, we provide a training dataset and document image, but you can use your own datasets when customizing Amazon Comprehend to suit your use case.<\/p>\n<ol>\n<li>Download the <a href=\"https:\/\/aws-ml-blog.s3.amazonaws.com\/artifacts\/Document-Analysis-Solution\/comprehend-cer-resources.zip\" target=\"_blank\" rel=\"noopener noreferrer\">training dataset<\/a>.<\/li>\n<li>Locate the bucket you created on the Amazon S3 console.<\/li>\n<\/ol>\n<p>For this post, we use the bucket <code>textract-comprehend-a2i-data<\/code>, but you should use the name that you used for <span><em>&lt;primary_bucket&gt;<\/em><\/span>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16133\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/2-AmazonS3.jpg\" alt=\"\" width=\"900\" height=\"188\"><\/p>\n<ol start=\"3\">\n<li>Open the bucket and choose <strong>Create folder<\/strong>.<\/li>\n<li>For name, enter <code>comprehend_data<\/code>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16134\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/3-Textract-a2i-data.jpg\" alt=\"\" width=\"900\" height=\"524\"><\/p>\n<ol start=\"5\">\n<li>Uncompress the file you downloaded earlier and upload the files to the <code>comprehend_data<\/code> folder.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16135\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/4-Textract-comprehend.jpg\" alt=\"\" width=\"900\" height=\"449\"><\/p>\n<ol start=\"6\">\n<li>On the Amazon Comprehend console, click on\u00a0<strong>Launch Amazon Comprehend<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16537\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/30\/Update.jpg\" alt=\"\" width=\"900\" height=\"416\"><\/p>\n<ol start=\"7\">\n<li>Under <strong>Customization, <\/strong>choose <strong>Custom entity recognition<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16137\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/6-Customization.jpg\" alt=\"\" width=\"900\" height=\"422\"><\/p>\n<ol start=\"8\">\n<li>Choose <strong>Train Recognizer<\/strong> to open the entity recognizer training page.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16138\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/7-Entity-Recognition.jpg\" alt=\"\" width=\"900\" height=\"496\"><\/p>\n<ol start=\"9\">\n<li>For <strong>Recognizer name<\/strong>, enter a name.<\/li>\n<\/ol>\n<p>The name that you choose appears in the console hereafter, so something human readable and easily identifiable is ideal.<\/p>\n<ol start=\"10\">\n<li>For <strong>Custom entity types<\/strong>, enter your custom entity type (for this post, we enter <code>DEVICE<\/code>).<\/li>\n<\/ol>\n<p>At the time of this writing, you can have up to 25 entity types per custom entity recognizer in Amazon Comprehend.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16139\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/8-Train-entity.jpg\" alt=\"\" width=\"900\" height=\"607\"><\/p>\n<ol start=\"11\">\n<li>In the <strong>Training data<\/strong> section, select <strong>Using entity list and training docs<\/strong>.<\/li>\n<li>Add the paths to <code>entity_list.csv<\/code> and <code>raw_txt.csv<\/code> for your <span><em>&lt;primary_bucket&gt;<\/em><\/span>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16140\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/9-Trainingdata.jpg\" alt=\"\" width=\"900\" height=\"600\"><\/p>\n<ol start=\"13\">\n<li>In the <strong>IAM role<\/strong> section, select <strong>Create a new role<\/strong>.<\/li>\n<li>For <strong>Name suffix<\/strong>, enter a suffix you can identify later (for this post, we enter <code>TDA<\/code>).<\/li>\n<li>Leave the remaining settings as default and choose <strong>Train<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16141\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/10-IAM-role.jpg\" alt=\"\" width=\"900\" height=\"688\"><\/p>\n<ol start=\"16\">\n<li>When the training is complete, choose your recognizer and copy the ARN for your custom entity recognizer for future use.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16142\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/11-TDA-custom-entity.jpg\" alt=\"\" width=\"900\" height=\"739\"><\/p>\n<h3>Creating a human review workflow<\/h3>\n<p>To create a human review workflow, you need to have three things ready:<\/p>\n<ul>\n<li>\n<strong>Reviewing workforce<\/strong> \u2013 A <em>work team<\/em> is a group of people that you select to review your documents. You can create a work team from a <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sms-workforce-management.html\" target=\"_blank\" rel=\"noopener noreferrer\">workforce<\/a>, which is made up of <a href=\"https:\/\/www.mturk.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Mechanical Turk<\/a> workers, vendor-managed workers, or your own private workers that you invite to work on your tasks. Whichever workforce type you choose, Amazon A2I takes care of sending tasks to workers. For this post, you create a work team using a private workforce and add yourself to the team to preview the Amazon A2I workflow.<\/li>\n<li>\n<strong>Worker task template <\/strong>\u2013 This is a template that defines what the console looks like to the reviewers.<\/li>\n<li>\n<strong>S3 bucket <\/strong>\u2013 This is where the output of Amazon A2I is stored. You already created a bucket earlier, so this post uses the same bucket.<\/li>\n<\/ul>\n<h4>Creating a workforce<\/h4>\n<p>To create and manage your private workforce, you can use the <a href=\"https:\/\/console.aws.amazon.com\/sagemaker\/groundtruth?region=us-east-1#\/labeling-workforces\" target=\"_blank\" rel=\"noopener noreferrer\">Labeling workforces page<\/a> on the <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> console. When following the instructions, you can create a private workforce by entering worker emails or importing a pre-existing workforce from an <a href=\"http:\/\/aws.amazon.com\/cognito\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Cognito<\/a> user pool.<\/p>\n<p>If you already have a work team, you can use the same work team with Amazon A2I and skip to the following section.<\/p>\n<p>To create your private work team, complete the following steps:<\/p>\n<ol>\n<li>Navigate to the <a href=\"https:\/\/console.aws.amazon.com\/sagemaker\/groundtruth?region=us-east-1#\/labeling-workforces\" target=\"_blank\" rel=\"noopener noreferrer\">Labeling workforces page<\/a> on the Amazon SageMaker console.<\/li>\n<li>On the <strong>Private <\/strong>tab, choose <strong>Create private team<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16143\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/12-Privateworkforce.jpg\" alt=\"\" width=\"900\" height=\"263\"><\/p>\n<ol start=\"3\">\n<li>Choose <strong>Invite new workers by email<\/strong>.<\/li>\n<li>For this post, enter your email address to work on your document processing tasks.<\/li>\n<\/ol>\n<p>You can enter a list of up to 50 email addresses, separated by commas, into the <strong>Email addresses<\/strong> box.<\/p>\n<ol start=\"5\">\n<li>Enter an organization name and contact email.<\/li>\n<li>Choose <strong>Create private team<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16144\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/13-Create-private-team.jpg\" alt=\"\" width=\"900\" height=\"857\"><\/p>\n<ol start=\"7\">\n<li>After you create a private team, choose the team to start adding reviewers to your private workforce.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16145\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/14-Private_workforce_summary.jpg\" alt=\"\" width=\"900\" height=\"535\"><\/p>\n<ol start=\"8\">\n<li>On the <strong>Workers <\/strong>tab, choose <strong>Add workers to team<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16146\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/15-Textract-Comprehend-a2i.jpg\" alt=\"\" width=\"900\" height=\"499\"><\/p>\n<ol start=\"9\">\n<li>Enter the email addresses you want to add and choose <strong>Invite new workers<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16147\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/16-invite-new-workers.jpg\" alt=\"\" width=\"900\" height=\"411\"><\/p>\n<p>After you add the workers (in this case, yourself), you get an email invitation. The following screenshot shows an example email.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16148\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/17-You-are-invited-to-work.jpg\" alt=\"\" width=\"900\" height=\"423\"><\/p>\n<p>After you choose the link and change your password, you\u2019re registered as a verified worker for this team. Your one-person team is now ready to review.<\/p>\n<ol start=\"10\">\n<li>Choose the link for <strong>Labeling Portal Sign-in URL<\/strong> and log in using the credentials generated in the previous step.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16149\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/18-private-workforce-summary.jpg\" alt=\"\" width=\"900\" height=\"470\"><\/p>\n<p>You should see a page similar to the following screenshot.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16150\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/19-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"267\"><\/p>\n<p>This is the Amazon A2I worker portal.<\/p>\n<h4>Creating a worker task template<\/h4>\n<p>You can use a worker template to customize the interface and instructions that your workers see when working on your tasks. To create a worker task template, complete the following steps:<\/p>\n<ol>\n<li>Navigate to the <a href=\"https:\/\/console.aws.amazon.com\/a2i\/home?region=us-east-1#\/worker-task-templates\" target=\"_blank\" rel=\"noopener noreferrer\">Worker task templates<\/a> page on the Amazon SageMaker console.<\/li>\n<\/ol>\n<p>For this post, we use Region <code>us-east-1<\/code>. For availability details for Amazon A2I and Amazon Translate in your preferred Region, see the <a href=\"https:\/\/aws.amazon.com\/about-aws\/global-infrastructure\/regional-product-services\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Region Table<\/a>.<\/p>\n<ol start=\"2\">\n<li>Choose <strong>Create template<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16151\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/20-worker-task-templates.jpg\" alt=\"\" width=\"900\" height=\"291\"><\/p>\n<ol start=\"3\">\n<li>For <strong>Template name<\/strong>, enter <code>translate-a2i-template<\/code>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16152\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/21-create-template.jpg\" alt=\"\" width=\"900\" height=\"525\"><\/p>\n<ol start=\"4\">\n<li>In the <strong>Template editor <\/strong>field<strong>, enter <\/strong>the code from the following <a href=\"https:\/\/aws-ml-blog.s3.amazonaws.com\/artifacts\/Document-Analysis-Solution\/task-template.html.zip\" target=\"_blank\" rel=\"noopener noreferrer\">task-template.html.zip<\/a> file:<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">&lt;!-- Copyright Amazon.com, Inc. and its affiliates. All Rights Reserved.\r\nSPDX-License-Identifier: MIT\r\n\r\nLicensed under the MIT License. See the LICENSE accompanying this file\r\nfor the specific language governing permissions and limitations under\r\nthe License. --&gt;\r\n\r\n&lt;script src=\"https:\/\/assets.crowd.aws\/crowd-html-elements.js\"&gt;&lt;\/script&gt;\r\n\r\n&lt;crowd-entity-annotation\r\n        name=\"crowd-entity-annotation\"\r\n        header=\"Highlight parts of the text below\"\r\n        labels=\"{{ task.input.labels | to_json | escape }}\"\r\n        initial-value=\"{{ task.input.initialValue }}\"\r\n        text=\"{{ task.input.originalText }}\"\r\n&gt;\r\n    &lt;full-instructions header=\"Named entity recognition instructions\"&gt;\r\n        &lt;ol&gt;\r\n            &lt;li&gt;&lt;strong&gt;Read&lt;\/strong&gt; the text carefully.&lt;\/li&gt;\r\n            &lt;li&gt;&lt;strong&gt;Highlight&lt;\/strong&gt; words, phrases, or sections of the text.&lt;\/li&gt;\r\n            &lt;li&gt;&lt;strong&gt;Choose&lt;\/strong&gt; the label that best matches what you have highlighted.&lt;\/li&gt;\r\n            &lt;li&gt;To &lt;strong&gt;change&lt;\/strong&gt; a label, choose highlighted text and select a new label.&lt;\/li&gt;\r\n            &lt;li&gt;To &lt;strong&gt;remove&lt;\/strong&gt; a label from highlighted text, choose the X next to the abbreviated label name on the highlighted text.&lt;\/li&gt;\r\n            &lt;li&gt;You can select all of a previously highlighted text, but not a portion of it.&lt;\/li&gt;\r\n        &lt;\/ol&gt;\r\n    &lt;\/full-instructions&gt;\r\n\r\n    &lt;short-instructions&gt;\r\n        Highlight the custom entities that went missing.\r\n    &lt;\/short-instructions&gt;\r\n\r\n&lt;\/crowd-entity-annotation&gt;\r\n\r\n&lt;script&gt;\r\n    document.addEventListener('all-crowd-elements-ready', () =&gt; {\r\n        document\r\n            .querySelector('crowd-entity-annotation')\r\n            .shadowRoot\r\n            .querySelector('crowd-form')\r\n            .form;\r\n    });\r\n&lt;\/script&gt;\r\n<\/code><\/pre>\n<\/div>\n<\/div>\n<ol start=\"5\">\n<li>Choose <strong>Create<\/strong>\n<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16153\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/22-template-editor.jpg\" alt=\"\" width=\"900\" height=\"716\"><\/p>\n<h4>Creating a human review workflow<\/h4>\n<p>Human review workflows allow human reviewers to audit the custom entities that are detected using Amazon Comprehend on an ongoing basis. To create a human review workflow, complete the following steps:<\/p>\n<ol>\n<li>Navigate to the <a href=\"https:\/\/console.aws.amazon.com\/a2i\/home?region=us-east-1#\/human-review-workflows\" target=\"_blank\" rel=\"noopener noreferrer\">Human review workflow<\/a> page the Amazon SageMaker console.<\/li>\n<li>Choose <strong>Create human review workflow<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16154\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/23-human-review-workflows.jpg\" alt=\"\" width=\"900\" height=\"291\"><\/p>\n<ol start=\"3\">\n<li>In the <strong>Workflow settings<\/strong> section, for <strong>Name<\/strong>, enter a unique workflow name.<\/li>\n<li>For <strong>S3 bucket<\/strong>, enter the S3 bucket where you want to store the human review results.<\/li>\n<\/ol>\n<p>For this post, we use the same bucket that we created earlier, but add the suffix <code>\/a2i-raw-output<\/code>. For example, if you created a bucket called <code>textract-comprehend-a2i-data<\/code>, enter the path <code>s3:\/\/textract-comprehend-a2i-data\/a2i-raw-output<\/code>. This subfolder contains the edits that the reviewers make in all the human review workflow jobs that are created for Amazon Comprehend custom entity recognition. (Replace the bucket name with the value of <span><em>&lt;primary_bucket&gt;<\/em><\/span>.)<\/p>\n<ol start=\"5\">\n<li>For <strong>IAM role<\/strong>, choose <strong>Create a new role<\/strong> from the drop-down menu.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16155\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/24-create-human-review-workflow.jpg\" alt=\"\" width=\"900\" height=\"467\"><\/p>\n<p>Amazon A2I can create a role automatically for you.<\/p>\n<ol start=\"6\">\n<li>For <strong>S3 buckets you specify<\/strong>, select <strong>Specific S3 buckets<\/strong>.<\/li>\n<li>Enter the name of the S3 bucket you created earlier (<span><em>&lt;primary_bucket&gt;<\/em><\/span>).<\/li>\n<li>Choose <strong>Create<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16156\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/25-create-IAM-role.jpg\" alt=\"\" width=\"900\" height=\"428\"><\/p>\n<p>You see a confirmation when role creation is complete and your role is now pre-populated in the <strong>IAM role<\/strong> drop-down menu.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16157\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/26-IAM-role.jpg\" alt=\"\" width=\"900\" height=\"147\"><\/p>\n<ol start=\"9\">\n<li>For <strong>Task type<\/strong>, select <strong>Custom<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16158\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/27-Task-type.jpg\" alt=\"\" width=\"900\" height=\"497\"><\/p>\n<ol start=\"10\">\n<li>In the <strong>Worker task template<\/strong> section, for <strong>Template<\/strong>, choose <strong>custom-entity-review-template<\/strong>.<\/li>\n<li>For <strong>Task description<\/strong>, add a description that briefly describes the task for your workers.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16159\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/28-worker-task-template.jpg\" alt=\"\" width=\"900\" height=\"317\"><\/p>\n<ol start=\"12\">\n<li>In the <strong>Workers<\/strong> section, select<\/li>\n<li>For <strong>Private teams<\/strong>, choose <strong>textract-comprehend-a2i-review-team<\/strong>.<\/li>\n<li>Choose <strong>Create<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16160\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/29-workers.jpg\" alt=\"\" width=\"900\" height=\"347\"><\/p>\n<p>You see a confirmation when human review workflow creation is complete.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16161\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/30-human-review-workflow-has.jpg\" alt=\"\" width=\"900\" height=\"347\"><\/p>\n<p>Copy the workflow ARN and save it somewhere. You need this in the upcoming steps. You also need to keep the Amazon A2I Worker Portal (created earlier) open and ready after this step.<\/p>\n<h3>Deploying the CloudFormation stack<\/h3>\n<p>Launch the following CloudFormation stack to deploy the stack required for running the entire flow:<\/p>\n<p><a href=\"https:\/\/us-east-1.console.aws.amazon.com\/cloudformation\/home?region=us-east-1#\/stacks\/create\/review?templateURL=https:\/\/s3.amazonaws.com\/aws-ml-blog\/artifacts\/Document-Analysis-Solution\/Textract-Comprehend-A2I-Template.yaml\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16174\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/LaunchStack.jpg\" alt=\"\" width=\"144\" height=\"27\"><\/a><\/p>\n<p>This creates the remaining elements for running your human review workflow for the custom entity recognizer. When creating the stack, enter the following values:<\/p>\n<ul>\n<li>\n<strong>CustomEntityRecognizerARN<\/strong> \u2013 The ARN for the custom entity recognizer.<\/li>\n<li>\n<strong>CustomEntityTrainingDatasetS3URI<\/strong> \u2013 The path to the training dataset that you used for creating the custom entity recognizer.<\/li>\n<li>\n<strong>CustomEntityTrainingListS3URI<\/strong> \u2013 The path to the entity list that you used for training the custom entity recognizer.<\/li>\n<li>\n<strong>FlowDefinitionARN<\/strong> \u2013 The ARN of the human review workflow.<\/li>\n<li>\n<strong>S3BucketName<\/strong> \u2013 The name of the bucket you created.<\/li>\n<li>\n<strong>S3ComprehendBucketName<\/strong> \u2013 A random name that must be unique so the template can create an empty S3 bucket to store temporary output from Amazon Comprehend in. You don\u2019t need to create this bucket\u2014the Cloudformation template does that for you, just provide a unique name here.<\/li>\n<\/ul>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16162\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/31-Specify-stack-details.jpg\" alt=\"\" width=\"900\" height=\"678\"><\/p>\n<p>Choose the defaults of the stack deployment wizard. On the <strong>Review <\/strong>page, in the <strong>Capabilities and transforms<\/strong> section, select the three check-boxes and choose <strong>Create stack<\/strong>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16163\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/32-capabilities-and-transforms.jpg\" alt=\"\" width=\"900\" height=\"340\"><\/p>\n<p>You need to confirm that the stack was deployed successfully on your account. You can do so by navigating to the AWS CloudFormation console and looking for the stack name <code>TDA<\/code>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16164\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/33-screenshotstacks.jpg\" alt=\"\" width=\"900\" height=\"120\"><\/p>\n<p>When the status of the stack changes to <code>CREATE_COMPLETE<\/code>, you have successfully deployed the document analysis solution to your account.<\/p>\n<h2>Testing the solution<\/h2>\n<p>You can now test the end-to-end flow of this solution. To test each component, you complete the following high-level steps:<\/p>\n<ol>\n<li>Upload a file.<\/li>\n<li>Verify the Amazon Comprehend job status.<\/li>\n<li>Review the worker portal.<\/li>\n<li>Verify the changes were recorded.<\/li>\n<\/ol>\n<h3>Uploading a file<\/h3>\n<p>In real-world situations, when businesses receive a physical document, they scan, photocopy, email, or upload it to some form of an image-based format for safe-keeping as a backup mechanism. The following is the <a href=\"https:\/\/aws-ml-blog.s3.amazonaws.com\/artifacts\/Document-Analysis-Solution\/Sample+Doc.jpg\" target=\"_blank\" rel=\"noopener noreferrer\">sample document<\/a> we use in this post.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16165 size-large\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/34-sample-image-707x1024.jpg\" alt=\"\" width=\"707\" height=\"1024\"><\/p>\n<p>To upload the file, complete the following steps:<\/p>\n<ol>\n<li>\n<a href=\"https:\/\/aws-ml-blog.s3.amazonaws.com\/artifacts\/Document-Analysis-Solution\/Sample+Doc.jpg\" target=\"_blank\" rel=\"noopener noreferrer\">Download the image<\/a>.<\/li>\n<li>On the Amazon S3 console, navigate to your <span><em>&lt;primary_bucket&gt;<\/em><\/span>.<\/li>\n<li>Choose <strong>Create folder<\/strong>.<\/li>\n<li>For <strong>Name<\/strong>, enter <code>input<\/code>.<\/li>\n<li>Choose <strong>Save<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16166\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/35-textract-comprehend-a2i.jpg\" alt=\"\" width=\"900\" height=\"616\"><\/p>\n<ol start=\"6\">\n<li>Upload the image you downloaded into this folder.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16167\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/36-Upload.jpg\" alt=\"\" width=\"900\" height=\"574\"><\/p>\n<p>This upload triggers the <code>TextractComprehendA2ILambda<\/code> function, which sends the uploaded image to Amazon Textract and sends the response received from Amazon Comprehend.<\/p>\n<h3>Verifying Amazon Comprehend job status<\/h3>\n<p>You can now verify that the Amazon Comprehend job is working.<\/p>\n<ol>\n<li>On the Amazon Comprehend console, choose <strong>Analysis jobs<\/strong>.<\/li>\n<li>Verify that your job is in status <code>In progress<\/code>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16168\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/37-analysis-jobs.jpg\" alt=\"\" width=\"900\" height=\"285\"><\/p>\n<p>When the status switches to <code>Completed<\/code>, you can proceed to the next step.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16169\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/38-analysis-jobs.jpg\" alt=\"\" width=\"900\" height=\"292\"><\/p>\n<h3>Reviewing the worker portal<\/h3>\n<p>You can now test out the human review worker portal.<\/p>\n<ol>\n<li>Navigate to the Amazon A2I worker portal that you created.<\/li>\n<\/ol>\n<p>You should have a new job waiting to be processed.<\/p>\n<ol start=\"2\">\n<li>Select the job and choose <strong>Start working<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16170\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/39-start-working.jpg\" alt=\"\" width=\"900\" height=\"349\"><\/p>\n<p>You\u2019re redirected to the review screen.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16171\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/40-highlights-parts-of-the-text.jpg\" alt=\"\" width=\"900\" height=\"448\"><\/p>\n<ol start=\"3\">\n<li>Tag any new entities that the algorithm missed.<\/li>\n<li>When you\u2019re finished, choose <strong>Submit<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16172\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/41-highlights-parts-of-the-text2.jpg\" alt=\"\" width=\"900\" height=\"445\"><\/p>\n<h3>Verify that the changes were recorded<\/h3>\n<p>Now that you have added your inputs in the A2I console, the <code>HumanWorkflowCompleted<\/code> Lambda function adds the identified entities to the already existing file and stores it in a separate entity list in the S3 bucket. You can verify that this has happened by navigating to <span><em>&lt;primary_bucket&gt;<\/em><\/span> on the Amazon S3 console.<\/p>\n<p>In the folder <code>comprehend_data<\/code>, you should see a new file called <code>updated_entity_list.csv<\/code>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16173\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/42-screenshot.jpg\" alt=\"\" width=\"900\" height=\"451\"><\/p>\n<p>The <code>NewEntityCheck<\/code> Lambda function uses this file at the end of each day to compare against the original <code>entity_list.csv<\/code> file. If new entities are in the <code>updated_entity_list.csv<\/code> file, the model is retrained and replaces the older custom entity recognition model.<\/p>\n<p>This allows the Amazon Comprehend custom entity recognition model to improve continuously by incorporating the feedback received from human reviewers through Amazon A2I. Over time, this can reduce the need for reviewers and manual intervention by analyzing documents in a more intelligent and sophisticated manner.<\/p>\n<h2>Cost<\/h2>\n<p>With this solution, you can now process scanned and physical documents at scale and do ML-powered analysis on them. The cost to run this example is less than $5.00. For more information about exact costs, see <a href=\"https:\/\/aws.amazon.com\/textract\/pricing\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Textract pricing<\/a>, <a href=\"https:\/\/aws.amazon.com\/comprehend\/pricing\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Comprehend pricing<\/a>, and <a href=\"https:\/\/aws.amazon.com\/augmented-ai\/pricing\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon A2I pricing<\/a>.<\/p>\n<h2>Cleaning up<\/h2>\n<p>To avoid incurring future charges, delete the resources when not in use.<\/p>\n<h2>Conclusion<\/h2>\n<p>This post demonstrated how you can build an end-to-end document analysis solution for analyzing scanned images of documents using Amazon Textract, Amazon Comprehend, and Amazon A2I. This allows you to create review workflows for the critical documents you need to analyze using your own private workforce, and provides increased accuracy and context.<\/p>\n<p>This solution also demonstrated how you can improve your Amazon Comprehend custom entity recognizer over time by retraining the models on the newer entities that the reviewers identify.<\/p>\n<p>For the code used in this walkthrough, see the <a href=\"https:\/\/github.com\/aws-samples\/amazon-textract-comprehend-a2i\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repo<\/a>. For information about adding another review layer for Amazon Textract using Amazon A2I, see <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/using-amazon-textract-with-amazon-augmented-ai-for-processing-critical-documents\/\" target=\"_blank\" rel=\"noopener noreferrer\">Using Amazon Textract with Amazon Augmented AI for processing critical documents<\/a>.<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-16175 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/22\/Tripathi.jpg\" alt=\"\" width=\"100\" height=\"124\"><strong>Purnesh Tripathi<\/strong> is a Solutions Architect at Amazon Web Services. He has been a data scientist in his previous life, and is passionate about the benefits that Machine Learning and Artificial Intelligence bring to a business. He works with small and medium businesses, and startups in New Zealand to help them innovate faster using AWS.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/building-an-end-to-end-intelligent-document-processing-solution-using-aws\/<\/p>\n","protected":false},"author":0,"featured_media":343,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/342"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=342"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/342\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/343"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=342"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=342"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=342"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}