{"id":1541,"date":"2022-02-09T18:02:21","date_gmt":"2022-02-09T18:02:21","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/02\/09\/extract-entities-from-insurance-documents-using-amazon-comprehend-named-entity-recognition\/"},"modified":"2022-02-09T18:02:21","modified_gmt":"2022-02-09T18:02:21","slug":"extract-entities-from-insurance-documents-using-amazon-comprehend-named-entity-recognition","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/02\/09\/extract-entities-from-insurance-documents-using-amazon-comprehend-named-entity-recognition\/","title":{"rendered":"Extract entities from insurance documents using Amazon Comprehend named entity recognition"},"content":{"rendered":"<div id=\"\">\n<p>Intelligent document processing (IDP) is a common use case for customers on AWS. You can utilize\u00a0<a class=\"c-link\" href=\"https:\/\/aws.amazon.com\/comprehend\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/aws.amazon.com\/comprehend\/\" data-sk=\"tooltip_parent\" data-remove-tab-index=\"true\">Amazon Comprehend<\/a>\u00a0and\u00a0<a class=\"c-link\" href=\"https:\/\/aws.amazon.com\/textract\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/aws.amazon.com\/textract\/\" data-sk=\"tooltip_parent\" data-remove-tab-index=\"true\">Amazon Textract<\/a>\u00a0for a variety of use cases ranging from document extraction, data classification, and entity extraction. One specific industry that uses IDP is insurance. They use IDP to automate data extraction for common use cases such as claims intake, policy servicing, quoting, payments, and next best actions. However, in some cases, an office receives a document with complex, label-less information. This is normally difficult for optical character recognition (OCR) software to capture, and identifying relationships and key entities becomes a challenge. The solution is often requires manual human entry to ensure high accuracy.<\/p>\n<p>In this post, we demonstrate how you can use <a href=\"https:\/\/aws.amazon.com\/about-aws\/whats-new\/2021\/09\/amazon-comprehend-extract-entities-native-format\/\" target=\"_blank\" rel=\"noopener noreferrer\">named entity recognition<\/a> (NER) for documents in their native formats in Amazon Comprehend to address these challenges.<\/p>\n<h2>Solution overview<\/h2>\n<p>In an insurance scenario, an insurer might receive a demand letter from an attorney\u2019s office. The demand letter includes information such as what law office is sending the letter, who their client is, and what actions are required to satisfy their requests, as shown in the following example:<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image001.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32722\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image001.png\" alt=\"\" width=\"923\" height=\"1000\"><\/a><\/p>\n<p>Because of the varied locations that this information could be found in a demand letter, these documents are often forwarded to an individual adjuster, who takes the time to read through the letter to determine all the necessary information required to proceed with a claim. The document may have multiple names, addresses, and requests that each need to be classified. If the client is mixed up with the beneficiary, or the addresses are switched, delays could add up and negative consequences could impact the company and customers. Because there are often small differences between categories like addresses and names, the documents are often processed by humans rather than using an IDP approach.<\/p>\n<p>The preceding example document has many instances of overlapping entity values (entities that share similar properties but aren\u2019t related). Examples of this are the address of the law office vs. the address of the insurance company or the names of the different individuals (attorney name, beneficiary, policy holder). Additionally, there is positional information (where the entity is positioned within the document) that a traditional text-only algorithm might miss. Therefore, traditional recognition techniques may not meet requirements.<\/p>\n<p>In this post, we use named entity recognition in Amazon Comprehend to solve these challenges. The benefit of using this method is that the custom entity recognition model uses both the natural language and positional information of the text to accurately extract custom entities that may otherwise be impacted when flattening a document, as demonstrated in our preceding example of overlapping entity values. For this post, we use an AWS artificially created dataset of legal requisition and demand letters for life insurance, but you can use this approach across any industry and document that may benefit from spatial data in custom NER training. The following diagram depicts the solution architecture:<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/Screen-Shot-2022-02-04-at-1.49.29-PM.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-32746 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/Screen-Shot-2022-02-04-at-1.49.29-PM.png\" alt=\"\" width=\"1079\" height=\"449\"><\/a><\/p>\n<p>We implement the solution with the following high-level steps:<\/p>\n<ol>\n<li>Clone the repository containing the sample dataset.<\/li>\n<li>Create an <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) bucket.<\/li>\n<li>Create and train your custom entity recognition model.<\/li>\n<li>Use the model by running an asynchronous batch job.<\/li>\n<\/ol>\n<h2>Prerequisites<\/h2>\n<p>You need to complete the following prerequisites to use this solution:<\/p>\n<ol>\n<li><a href=\"https:\/\/github.com\/pyenv\/pyenv\" target=\"_blank\" rel=\"noopener noreferrer\">Install<\/a> Python 3.8.x.<\/li>\n<li>Make sure you have <a href=\"https:\/\/pypi.org\/project\/pipenv\/\" target=\"_blank\" rel=\"noopener noreferrer\">pip installed<\/a>.<\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/cli\/latest\/userguide\/install-cliv2.html\" target=\"_blank\" rel=\"noopener noreferrer\">Install and configure<\/a> the <a href=\"http:\/\/aws.amazon.com\/cli\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Command Line Interface<\/a> (AWS CLI).<\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/cli\/latest\/userguide\/cli-configure-files.html\" target=\"_blank\" rel=\"noopener noreferrer\">Configure<\/a> your AWS credentials.<\/li>\n<\/ol>\n<h2>Annotate your documents<\/h2>\n<p>To train a custom entity recognition model that can be used on your PDF, Word, and plain text documents, you need to first annotate PDF documents using a custom <a href=\"https:\/\/aws.amazon.com\/sagemaker\/groundtruth\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Ground Truth<\/a> annotation template that is provided by Amazon Comprehend. For instructions, see <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/custom-document-annotation-for-extracting-named-entities-in-documents-using-amazon-comprehend\/\" target=\"_blank\" rel=\"noopener noreferrer\">Custom document annotation for extracting named entities in documents using Amazon Comprehend<\/a>.<\/p>\n<p>We recommend a minimum of 250 documents and 100 annotations per entity to ensure good quality predictions. With more training data, you\u2019re more likely to produce a higher-quality model.<\/p>\n<p>When you\u2019ve finished annotating, you can train a custom entity recognition model and use it to extract custom entities from PDF, Word, and plain text documents for batch (asynchronous) processing.<\/p>\n<p><strong>For this post, we have already labeled our sample dataset, and you don\u2019t have to annotate the documents provided<\/strong>. However, if you want to use your own documents or adjust the entities, you have to annotate the documents. For instructions, see <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/custom-document-annotation-for-extracting-named-entities-in-documents-using-amazon-comprehend\/\" target=\"_blank\" rel=\"noopener noreferrer\">Custom document annotation for extracting named entities in documents using Amazon Comprehend<\/a>.<\/p>\n<p>We extract the following entities (which are case sensitive):<\/p>\n<ul>\n<li><code>Law Firm<\/code><\/li>\n<li><code>Law Office Address<\/code><\/li>\n<li><code>Insurance Company<\/code><\/li>\n<li><code>Insurance Company Address<\/code><\/li>\n<li><code>Policy Holder Name<\/code><\/li>\n<li><code>Beneficiary Name<\/code><\/li>\n<li><code>Policy Number<\/code><\/li>\n<li><code>Payout<\/code><\/li>\n<li><code>Required Action<\/code><\/li>\n<li><code>Sender<\/code><\/li>\n<\/ul>\n<p>The dataset provided is entirely artificially generated. Any mention of names, places, and incidents are either products of the author\u2019s imagination or are used fictitiously. Any resemblance to actual events or locales or persons, living or dead, is entirely coincidental.<\/p>\n<h2>Clone the repository<\/h2>\n<p>Start by cloning the repository by running the following command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">git clone https:\/\/github.com\/aws-samples\/aws-legal-entity-extraction<\/code><\/pre>\n<\/p><\/div>\n<p>The repository contains the following files:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">aws-legal-entity-extraction\n\t\/source\n\t\/annotations\n\toutput.manifest\n\tsample.pdf\n\tbucketnamechange.py<\/code><\/pre>\n<\/p><\/div>\n<h2>Create an S3 bucket<\/h2>\n<p>To create an S3 bucket to use for this example, complete the following steps:<\/p>\n<ol>\n<li>On the Amazon S3 console, choose <strong>Buckets<\/strong> in the navigation pane.<\/li>\n<li>Choose <strong>Create bucket<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image005.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-32725 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image005.png\" alt=\"\" width=\"1430\" height=\"402\"><\/a><\/li>\n<li>Note the name of the bucket you just created.<\/li>\n<\/ol>\n<p>To reuse the annotations that we already made for the dataset, we have to modify the <code>output.manifest<\/code> file and reference the bucket we just created.<\/p>\n<ol start=\"4\">\n<li>Modify the file by running the following commands:\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">cd aws-legal-entity-extraction\npython3 bucketnamechange.py\nEnter the name of your bucket: <span>&lt;Enter the name of the bucket you created&gt;<\/span><\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<\/ol>\n<p>When the script is finished running, you receive the following message:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">The manifest file is updated with the correct bucket<\/code><\/pre>\n<\/p><\/div>\n<p>We can now begin training our model.<\/p>\n<h2>Create and train the model<\/h2>\n<p>To start training your model, complete the following steps:<\/p>\n<ol>\n<li>On the Amazon S3 console, upload the <code>\/source<\/code> folder, <code>\/annotations<\/code> folder, <code>output.manifest<\/code>, and <code>sample.pdf<\/code> files.<\/li>\n<\/ol>\n<p>Your bucket should look similar to the following screenshot.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image007.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32726\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image007.png\" alt=\"\" width=\"1195\" height=\"782\"><\/a><\/p>\n<ol start=\"2\">\n<li>On the Amazon Comprehend console, under <strong>Customization<\/strong> in the navigation pane, choose <strong>Custom entity recognition<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image009.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32727\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image009.png\" alt=\"\" width=\"1844\" height=\"610\"><\/a><\/li>\n<li>Choose <strong>Create new model<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image011.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32728\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image011.png\" alt=\"\" width=\"1844\" height=\"672\"><\/a><\/li>\n<li>For <strong>Model name<\/strong>, enter a name.<\/li>\n<li>For <strong>Language<\/strong>, choose <strong>English<\/strong>.<\/li>\n<li>For <strong>Custom entity type<\/strong>, add the following case-sensitive entities:\n<ol type=\"a\">\n<li><code>Law Firm<\/code><\/li>\n<li><code>Law Office Address<\/code><\/li>\n<li><code>Insurance Company<\/code><\/li>\n<li><code>Insurance Company Address<\/code><\/li>\n<li><code>Policy Holder Name<\/code><\/li>\n<li><code>Beneficiary Name<\/code><\/li>\n<li><code>Policy Number<\/code><\/li>\n<li><code>Payout<\/code><\/li>\n<li><code>Required Action<\/code><\/li>\n<li><code>Sender<\/code><\/li>\n<\/ol>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image013.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32729\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image013.png\" alt=\"\" width=\"1562\" height=\"1408\"><\/a><\/p>\n<\/li>\n<li>In <strong>Data specifications<\/strong>, for <strong>Data format<\/strong>, select <strong>Augmented manifest<\/strong> to reference the manifest we created when we annotated the documents.<\/li>\n<li>For <strong>Training model type<\/strong>, select <strong>PDF, Word Documents<\/strong>.<\/li>\n<\/ol>\n<p>This specifies the type of documents you\u2019re using for training and inference.<\/p>\n<ol start=\"9\">\n<li>For <strong>SageMaker Ground Truth augmented manifest file S3 location<\/strong>, enter the location of the <code>output.manifest<\/code> file in your S3 bucket.<\/li>\n<li>For <strong>S3 prefix for Annotation data files<\/strong>, enter the path to the <code>annotations<\/code> folder.<\/li>\n<li>For <strong>S3 prefix for Source documents<\/strong>, enter the path to the <code>source<\/code> folder.<\/li>\n<li>For <strong>Attribute names<\/strong>, enter <code>legal-entity-label-job-labeling-job-20220104T172242<\/code>.<\/li>\n<\/ol>\n<p>The attribute name corresponds to the name of the labeling job you create for annotating the documents. For the pre-annotated documents, we use the name <code>legal-entity-label-job-labeling-job-20220104T172242<\/code>. If you choose to annotate your documents, substitute this value with the name of your annotation job.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image017.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32731\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image017.png\" alt=\"\" width=\"1602\" height=\"1080\"><\/a><\/p>\n<ol start=\"13\">\n<li>Create a new <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) role and give it permissions to the bucket that contains all your data.<\/li>\n<li>Finish creating the model (select the <strong>Autosplit<\/strong> option for your data source to see similar metrics to those in the following screenshots).<\/li>\n<\/ol>\n<p>Now your recognizer model is visible on the dashboard with the model training status and metrics.<\/p>\n<p><strong>The model may take several minutes to train.<\/strong><br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image020.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32733\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image020.png\" alt=\"\" width=\"1429\" height=\"673\"><\/a><\/p>\n<p>The following screenshot shows your model metrics when the training is complete.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image022.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32734\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image022.png\" alt=\"\" width=\"1428\" height=\"773\"><\/a><\/p>\n<h2>Use the custom entity recognition model<\/h2>\n<p>To use the custom entity recognition models trained on PDF documents, we create a batch job to process them asynchronously.<\/p>\n<ol>\n<li>On the Amazon Comprehend console, choose <strong>Analysis jobs<\/strong>.<\/li>\n<li>Choose <strong>Create job<\/strong>.<\/li>\n<li>Under <strong>Input data<\/strong>, enter the Amazon S3 location of the annotated PDF documents to process (for this post, the <code>sample.pdf<\/code> file).<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image024.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32735\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image024.png\" alt=\"\" width=\"2350\" height=\"489\"><\/a><\/li>\n<li>For <strong>Input format<\/strong>, select <strong>One document per file<\/strong>.<\/li>\n<li>Under <strong>Output Data<\/strong>, enter the Amazon S3 location you want them to populate in. For this post, we create a new folder called <code>analysis-output<\/code> in the S3 bucket containing all source PDF documents, annotated documents, and manifest.<\/li>\n<li>Use an IAM role with permissions to the <code>sample.pdf<\/code> folder.<\/li>\n<\/ol>\n<p>You can use the role created earlier.<\/p>\n<ol start=\"7\">\n<li>Choose <strong>Create job<\/strong>.<\/li>\n<\/ol>\n<p>This is an asynchronous job so it may take a few minutes to complete processing. When the job is complete, you get link to the output. When you open this output, you see a series of files as follows:<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image026.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-32736\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/ML-7485-image026.png\" alt=\"\" width=\"766\" height=\"211\"><\/a><\/p>\n<p>You can open the file <code>sample.pdf.out<\/code> in your preferred text editor. If you search for the <strong>Entities Block<\/strong>, you can find the entities identified in the document. The following table shows an example.<\/p>\n<table border=\"1px\">\n<tbody>\n<tr>\n<td><span><strong>Type<\/strong><\/span><\/td>\n<td><span><strong>Text<\/strong><\/span><\/td>\n<td><span><strong>Score<\/strong><\/span><\/td>\n<\/tr>\n<tr>\n<td>Insurance Company<\/td>\n<td>Budget Mutual Insurance Company<\/td>\n<td>0.999984086<\/td>\n<\/tr>\n<tr>\n<td>Insurance Company Address<\/td>\n<td>9876 Infinity Aven Springfield, MI 65541<\/td>\n<td>0.999982051<\/td>\n<\/tr>\n<tr>\n<td>Law Firm<\/td>\n<td>Bill &amp; Carr<\/td>\n<td>0.99997298<\/td>\n<\/tr>\n<tr>\n<td>Law Office Address<\/td>\n<td>9241 13th Ave SWn Spokane, Washington (WA),99217<\/td>\n<td>0.999274625<\/td>\n<\/tr>\n<tr>\n<td>Beneficiary Name<\/td>\n<td>Laura Mcdaniel<\/td>\n<td>0.999972464<\/td>\n<\/tr>\n<tr>\n<td>Policy Holder Name<\/td>\n<td>Keith Holt<\/td>\n<td>0.999781546<\/td>\n<\/tr>\n<tr>\n<td>Policy Number<\/td>\n<td>(#892877136)<\/td>\n<td>0.999950143<\/td>\n<\/tr>\n<tr>\n<td>Payout<\/td>\n<td>$15,000<\/td>\n<td>0.999980728<\/td>\n<\/tr>\n<tr>\n<td>Sender<\/td>\n<td>Angela Berry<\/td>\n<td>0.999723455<\/td>\n<\/tr>\n<tr>\n<td>Required Action<\/td>\n<td>We are requesting that you forward the full policy amount of Please forward ann acknowledgement of our demand and please forward the umbrella policy information if one isn applicable. Please send my secretary any information regarding liens on his policy.<\/td>\n<td>0.999989449<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Expand the solution<\/h2>\n<p>You can choose from a myriad of possibilities for what to do with the detected entities, such as the following:<\/p>\n<ul>\n<li>Ingest them into a backend system of record<\/li>\n<li>Create a searchable index based on the extracted entities<\/li>\n<li>Enrich machine learning and analytics using extracted entity values as parameters for model training and inference<\/li>\n<li>Configure back-office flows and triggers based on detected entity value (such as specific law firms or payout values)<\/li>\n<\/ul>\n<p>The following diagram depicts these options:<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/legalentitydiagrams-Training1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-32750 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/legalentitydiagrams-Training1.png\" alt=\"\" width=\"1038\" height=\"491\"><\/a><\/p>\n<h2>Conclusion<\/h2>\n<p>Complex document types can often be impediments to full-scale IDP automation. In this post, we demonstrated how you can build and use custom NER models directly from PDF documents. This method is especially powerful for instances where positional information is especially pertinent (similar entity values and varied document formats). Although we demonstrated this solution by using legal requisition letters in insurance, you can extrapolate this use case across healthcare, manufacturing, retail, financial services, and many other industries.<\/p>\n<p>To learn more about Amazon Comprehend, visit the <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/what-is.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Comprehend Developer Guide<\/a>.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/01\/1622773633578-1.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-30281 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/01\/1622773633578-1.jpg\" alt=\"\" width=\"100\" height=\"100\"><\/a> Raj Pathak<\/strong> is a Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Document Extraction, Contact Center Transformation and Computer Vision.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/Enzo.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-32741 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/04\/Enzo.png\" alt=\"\" width=\"100\" height=\"133\"><\/a><strong>Enzo Staton <\/strong>is a Solutions Architect with a passion for working with companies to increase their cloud knowledge. He works closely as a trusted advisor and industry specialists with customers around the country.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/extract-entities-from-insurance-documents-using-amazon-comprehend-named-entity-recognition\/<\/p>\n","protected":false},"author":0,"featured_media":1542,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1541"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1541"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1541\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1542"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1541"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1541"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1541"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}