{"id":628,"date":"2020-11-25T22:28:32","date_gmt":"2020-11-25T22:28:32","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/11\/25\/announcing-the-launch-of-amazon-comprehend-events\/"},"modified":"2020-11-25T22:28:32","modified_gmt":"2020-11-25T22:28:32","slug":"announcing-the-launch-of-amazon-comprehend-events","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/11\/25\/announcing-the-launch-of-amazon-comprehend-events\/","title":{"rendered":"Announcing the launch of Amazon Comprehend Events"},"content":{"rendered":"<div id=\"\">\n<p>Every day, financial organizations need to analyze news articles, SEC filings, and press releases, as well as track financial events such as bankruptcy announcements, changes in executive leadership at companies, and announcements of mergers and acquisitions. They want to accurately extract the key data points and associations among various people and organizations mentioned within an announcement to update their investment models in a timely manner. Traditional natural language processing services can extract entities such as people, organizations and locations from text, but financial analysts need more. They need to understand how these entities relate to each other in the text.<\/p>\n<p>Today, <a href=\"https:\/\/aws.amazon.com\/comprehend\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Comprehend<\/a> is launching Comprehend Events, a new API for event extraction from natural language text documents. With this launch, you can use Comprehend Events to extract granular details about real-world events and associated entities expressed in unstructured text. This new API allows you to answer who-what-when-where questions over large document sets, at scale and without prior NLP experience.<\/p>\n<p>This post gives an overview of the NLP capabilities that Comprehend Events supports, along with suggestions for processing and analyzing documents with this feature. We\u2019ll close with a discussion of several solutions that use Comprehend Events, such as knowledge base population, semantic search, and document triage, all of which can be developed with companion AWS services for storing, visualizing, and analyzing the predictions made by Comprehend Events.<\/p>\n<h2>Comprehend Events overview<\/h2>\n<p>The Comprehend Events API, under the hood, converts unstructured text into structured data that answers who-what-when-where-how questions. Comprehend Events lets you extract the event structure from a document, distilling pages of text down to easily processed data for consumption by your AI applications or graph visualization tools. In the following figure, an Amazon press release announcing the 2017 acquisition of Whole Foods Market, Inc. is rendered as a graph showing the core semantics of the acquisition event, as well as the status of Whole Foods\u2019 CEO post merger.<\/p>\n<p>\u00a0<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18838\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-1.jpg\" alt=\"\" width=\"800\" height=\"580\"><\/p>\n<blockquote>\n<p>Amazon (AMZN) today announced that they will acquire Whole Foods Market (WFM) for $42 per share in an all-cash transaction valued at approximately $13.7 billion, including Whole Foods Market\u2019s net debt. Whole Foods Market will continue to operate stores under the Whole Foods Market brand and source from trusted vendors and partners around the world. John Mackey will remain as CEO of Whole Foods Market and Whole Foods Market\u2019s headquarters will stay in Austin, Texas.<\/p>\n<p>From: <a href=\"https:\/\/press.aboutamazon.com\/news-releases\/news-release-details\/amazon-acquire-whole-foods-market\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Press Center Release Archive<\/a><\/p>\n<\/blockquote>\n<p><strong><br \/>Extracted event triggers<\/strong> \u2013 Which events took place. In our example, <code>CORPORATE_ACQUISITION<\/code> and <code>EMPLOYMENT<\/code> events were detected. Not shown in the preceding figure, the API also returns which words in the text indicate the occurrence of the event, for example the words \u201cacquire\u201d and \u201ctransaction\u201d in the context of the document indicate that a <code>CORPORATE_ACQUISITION<\/code> took place. The Comprehend Events API returns a variety of insights into the event semantics of a document:<\/p>\n<ul>\n<li>\n<strong>Extracted entity mentions<\/strong> \u2013 Which words in the text indicate which entities are involved in the event, including named entities such as \u201cWhole Foods Market\u201d and common nouns such as \u201ctoday.\u201d The API also returns the type of the entity detected, for example <code>ORGANIZATION<\/code> for \u201cWhole Foods Market.\u201d<\/li>\n<li>\n<strong>Event argument role (also known as slot filling)<\/strong> \u2013 Which entities play which roles in which events; for example Amazon is an <code>INVESTOR<\/code> in the acquisition event.<\/li>\n<li>\n<strong>Groups of coreferential event triggers<\/strong> \u2013 Which triggers in the document refer to the same event. The API also groups triggers such as \u201ctransaction\u201d and \u201cacquire\u201d around the<code> CORPORATE_ACQUISITION<\/code> event (not shown above).<\/li>\n<li>\n<strong>Groups of coreferential entity mentions<\/strong> \u2013 Which mentions in the document refer to the same entity. For example, the API returns the grouping of \u201cAmazon\u201d with \u201cthey\u201d as a single entity (not shown above).<\/li>\n<\/ul>\n<p>At the time of launch, Comprehend Events is available as an asynchronous API supporting extraction of a fixed set of event types in the finance domain. This domain includes a variety of event types (such as <code>CORPORATE_ACQUISITION<\/code> and <code>IPO<\/code>), both standard and novel entity types (such as <code>PER<\/code> and <code>ORG<\/code> vs. <code>STOCK_CODE<\/code> and <code>MONETARY_VALUE<\/code>), and the argument roles that can connect them (such as <code>INVESTOR<\/code>, <code>OFFERING_DATE<\/code>, or <code>EMPLOYER<\/code>). For the complete ontology, see the <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/how-events.html\" target=\"_blank\" rel=\"noopener noreferrer\">Detect Events API documentation<\/a>.<\/p>\n<p>To demonstrate the functionality of the feature, we\u2019ll show you how to process a small set of sample documents, using both the Amazon Comprehend console and the Python SDK.<\/p>\n<h2>Formatting documents for processing<\/h2>\n<p>The first step is to transform raw documents into a suitable format for processing. Comprehend Events imposes a few requirements on document size and composition:<\/p>\n<ul>\n<li>Individual documents must be UTF-8 encoded and no more than 10 KB in length. As a best practice, we recommend segmenting larger documents at logical boundaries (section headers) or performing sentence segmentation with existing open-source tools.<\/li>\n<li>For best performance, markup (such as HTML), tabular material, and other non-prose spans of text should be removed from documents. The service is intended to process paragraphs of unstructured text.<\/li>\n<li>A single job must not contain more than 50 MB of data. Larger datasets must be divided into smaller sets of documents for parallel processing. The different document format modes also impose size restrictions:\n<ul>\n<li>\n<strong>One document per file (ODPF)<\/strong> \u2013 A maximum of 5,000 files in a single <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) location.<\/li>\n<li>\n<strong>One document per line (ODPL)<\/strong> \u2013 A maximum of 5,000 lines in a single text file. Newline characters (<code>n<\/code>, <code>r<\/code>, <code>rn<\/code>) should be replaced with other whitespace characters within a given document.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>For this post, we use a set of 117 documents sampled from <a href=\"https:\/\/press.aboutamazon.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon\u2019s Press Center<\/a>: <a href=\"https:\/\/github.com\/aws-samples\/amazon-comprehend-examples\/blob\/master\/amazon_comprehend_events_tutorial\/data\/sample_finance_dataset.txt\" target=\"_blank\" rel=\"noopener noreferrer\">sample_finance_dataset.txt<\/a>. The documents are formatted as a single ODPL text file and already conform to the preceding requirements. To implement this solution on your own, just upload the text file to an S3 bucket in your account before continuing with the following steps.<\/p>\n<h2>Job creation option 1: Using the Amazon Comprehend console<\/h2>\n<p>Creating a new Events labeling job takes only a few minutes.<\/p>\n<ol>\n<li>On the Amazon Comprehend console, choose <strong>Analysis jobs<\/strong>.<\/li>\n<li>Chose <strong>Create job<\/strong>.<\/li>\n<li>For <strong>Name<\/strong>, enter a name (for this post, we use <code>events-test-job<\/code>).<\/li>\n<li>For <strong>Analysis type<\/strong>\u00b8 choose <strong>Events<\/strong>.<\/li>\n<li>For <strong>Language<\/strong>, choose <strong>English<\/strong>.<\/li>\n<li>For <strong>Target event types<\/strong>, choose your types of events (for example, <strong>Corporate acquisition<\/strong>).<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18839\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-2.jpg\" alt=\"\" width=\"800\" height=\"601\"><\/p>\n<ol start=\"7\">\n<li>In the <strong>Input data <\/strong>section, for <strong>S3 location<\/strong>, enter the location of the sample ODPL file you downloaded earlier.<\/li>\n<li>In the <strong>Output data <\/strong>section, for <strong>S3 location<\/strong>, enter a location for the event output.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18840\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-3.jpg\" alt=\"\" width=\"800\" height=\"573\"><\/p>\n<ol start=\"9\">\n<li>For <strong>IAM role<\/strong>, choose to use an existing <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access<\/a> (IAM) role or create a new one.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18841\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-4.jpg\" alt=\"\" width=\"800\" height=\"348\"><\/p>\n<ol start=\"10\">\n<li>Choose <strong>Create job<\/strong>.<strong>\u00a0<\/strong>\n<\/li>\n<\/ol>\n<p>A new job appears in the <strong>Analysis jobs<\/strong> queue.<\/p>\n<h2>Job creation option 2: Using the SDK<\/h2>\n<p>Alternatively, you can perform these same steps with the <a href=\"https:\/\/aws.amazon.com\/sdk-for-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">Python SDK<\/a>. First, we specify Comprehend Events job parameters, just as we would with any other Amazon Comprehend feature. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\"># Client and session information\r\nsession = boto3.Session()\r\ncomprehend_client = session.client(service_name=\"comprehend\")\r\n\r\n# Constants for S3 bucket and input data file.\r\nbucket = \"comprehend-events-blogpost-us-east-1\"\r\nfilename = 'sample_finance_dataset.txt'\r\ninput_data_s3_path = f's3:\/\/{bucket}\/' + filename\r\noutput_data_s3_path = f's3:\/\/{bucket}\/'\r\n\r\n# IAM role with access to Comprehend and specified S3 buckets\r\njob_data_access_role = 'arn:aws:iam::xxxxxxxxxxxxx:role\/service-role\/AmazonComprehendServiceRole-test-events-role'\r\n\r\n# Other job parameters\r\ninput_data_format = 'ONE_DOC_PER_LINE'\r\njob_uuid = uuid.uuid1()\r\njob_name = f\"events-job-{job_uuid}\"\r\nevent_types = [\"BANKRUPTCY\", \"EMPLOYMENT\", \"CORPORATE_ACQUISITION\", \r\n               \"INVESTMENT_GENERAL\", \"CORPORATE_MERGER\", \"IPO\",\r\n               \"RIGHTS_ISSUE\", \"SECONDARY_OFFERING\", \"SHELF_OFFERING\",\r\n               \"TENDER_OFFERING\", \"STOCK_SPLIT\"]\r\n<\/code><\/pre>\n<\/div>\n<p>Next, we use the <code>start_events_detection_job<\/code> API endpoint to start the analysis of the input data file and capture the job ID, which we use later to poll and retrieve results:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\"># Begin the inference job\r\nresponse = comprehend_client.start_events_detection_job(\r\n    InputDataConfig={'S3Uri': input_data_s3_path,\r\n                     'InputFormat': input_data_format},\r\n    OutputDataConfig={'S3Uri': output_data_s3_path},\r\n    DataAccessRoleArn=job_data_access_role,\r\n    JobName=job_name,\r\n    LanguageCode='en',\r\n    TargetEventTypes=event_types\r\n)\r\n\r\n# Get the job ID\r\nevents_job_id = response['JobId']\r\n<\/code><\/pre>\n<\/div>\n<p>An asynchronous Comprehend Events job typically takes a few minutes for a small number of documents and up to several hours for lengthier inference tasks. For our sample dataset, inference should take approximately 20 minutes. It\u2019s helpful to poll the API using the <code>describe_events_detection_job<\/code> endpoint. When the job is complete, the API returns a <code>JobStatus<\/code> of <code>COMPLETED<\/code>. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\"># Get current job status\r\njob = comprehend_client.describe_events_detection_job(JobId=events_job_id)\r\n\r\n# Loop until job is completed\r\nwaited = 0\r\ntimeout_minutes = 30\r\nwhile job['EventsDetectionJobProperties']['JobStatus'] != 'COMPLETED':\r\n    sleep(60)\r\n    waited += 60\r\n    assert waited\/\/60 &lt; timeout_minutes, \"Job timed out after %d seconds.\" % waited\r\n    job = comprehend_client.describe_events_detection_job(JobId=events_job_id)\r\n<\/code><\/pre>\n<\/div>\n<p>Finally, we collect the Events inference output from Amazon S3 and convert to a list of dictionaries, each of which contains the predictions for a given document:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\"># The output filename is the input filename + \".out\"\r\noutput_data_s3_file = job['EventsDetectionJobProperties']['OutputDataConfig']['S3Uri'] + filename + '.out'\r\n\r\n# Load the output into a result dictionary    # Get the files.\r\nresults = []\r\nwith smart_open.open(output_data_s3_file) as fi:\r\n    results.extend([json.loads(line) for line in fi.readlines() if line])<\/code><\/pre>\n<\/div>\n<h2>The Comprehend Events API output schema<\/h2>\n<p>When complete, the output is written to Amazon S3 in JSON lines format, with each line encoding all the event extraction predictions for a single document. Our output schema includes the following information:<\/p>\n<ul>\n<li>Comprehend Events system output contains separate objects for entities and events, each organized into groups of coreferential objects.<\/li>\n<li>The API output includes the text, character offset, and type of each entity mention and trigger.<\/li>\n<li>Event argument roles are linked to entity groups by an <code>EntityIndex<\/code>.<\/li>\n<li>Confidence scores for classification tasks are given as Score. Confidence of entity and trigger group membership is given with <code>GroupScore<\/code>.<\/li>\n<li>Two additional fields, <code>File<\/code> and <code>Line<\/code>, are present as well, allowing you to track document provenance.<\/li>\n<\/ul>\n<p>The following Comprehend Events API output schema represents entities as lists of mentions and events as lists of triggers and arguments:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-json\">{ \r\n    \"Entities\": [\r\n        {\r\n            \"Mentions\": [\r\n                {\r\n                    \"BeginOffset\": number,\r\n                    \"EndOffset\": number,\r\n                    \"Score\": number,\r\n                    \"GroupScore\": number,\r\n                    \"Text\": \"string\",\r\n                    \"Type\": \"string\"\r\n                }, ...\r\n            ]\r\n        }, ...\r\n    ],\r\n    \"Events\": [\r\n        {\r\n            \"Type\": \"string\",\r\n            \"Arguments\": [\r\n                {\r\n                    \"EntityIndex\": number,\r\n                    \"Role\": \"string\",\r\n                    \"Score\": number\r\n                }, ...\r\n            ],\r\n            \"Triggers\": [\r\n                {\r\n                    \"BeginOffset\": number,\r\n                    \"EndOffset\": number,\r\n                    \"Score\": number,\r\n                    \"Text\": \"string\",\r\n                    \"GroupScore\": number,\r\n                    \"Type\": \"string\"\r\n                }, ...\r\n            ]\r\n        }, ...\r\n    ]\r\n    \"File\": \"string\",\r\n    \"Line\": \"string\r\n}\r\n<\/code><\/pre>\n<\/div>\n<h2>Analyzing Events output<\/h2>\n<p>The API output encodes all the semantic relationships necessary to immediately produce several useful visualizations of any given document. We walk through a few such depictions of the data in this section, referring you to the <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> <a href=\"https:\/\/github.com\/aws-samples\/amazon-comprehend-examples\/tree\/master\/amazon_comprehend_events_tutorial\" target=\"_blank\" rel=\"noopener noreferrer\">Jupyter notebook<\/a> accompanying this post for the working Python code necessary to produce them. We use the <a href=\"https:\/\/press.aboutamazon.com\/news-releases\/news-release-details\/amazon-acquire-whole-foods-market\" target=\"_blank\" rel=\"noopener noreferrer\">press release about Amazon\u2019s acquisition of Whole Foods<\/a> mentioned earlier in this post as an example.<\/p>\n<h3>Visualizing entity and trigger spans<\/h3>\n<p>As with any sequence labeling task, one of the simplest visualizations for Comprehend Events output is highlighting triggers and entity mentions, along with their respective tags. For this post, we use <a href=\"https:\/\/spacy.io\/usage\/visualizers\" target=\"_blank\" rel=\"noopener noreferrer\">displaCy<\/a>\u2018s ability to render custom tags. In the following visualization, we see some of the the usual range of entity types detected by NER systems (<code>PERSON<\/code>, <code>ORGANIZATION<\/code>), as well as finance-specific ones, such as <code>STOCK_CODE<\/code> and <code>MONETARY_VALUE<\/code>. Comprehend Events detects non-named entities (common nouns and pronouns) as well as named ones. In addition to entities, we also see tagged event triggers, such as \u201cmerger\u201d (<code>CORPORATE_MERGER<\/code>) and \u201cacquire\u201d (<code>CORPORATE_ACQUISITION<\/code>).<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18842\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-5.jpg\" alt=\"\" width=\"800\" height=\"376\"><\/p>\n<h3>Graphing event structures<\/h3>\n<p>Highlighting tagged spans is informative because it localizes system predictions about entity and event types in the text. However, it doesn\u2019t show the most informative thing about the output: the predicted argument role associations among events and entities. The following plot depicts the event structure of the document as a semantic graph. In the graph, vertices are entity mentions and triggers; edges are the argument roles held by the entities in relation to the triggers. For simple renderings of a small number of events, we recommend common open-source tools such as <a href=\"https:\/\/networkx.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">networkx<\/a> and <a href=\"https:\/\/pyvis.readthedocs.io\/en\/latest\/\" target=\"_blank\" rel=\"noopener noreferrer\">pyvis<\/a>, which we used to produce this visualization. For larger graphs, and graphs of large numbers of documents, we recommend a more robust solution for graph storage, such as <a href=\"https:\/\/aws.amazon.com\/neptune\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Neptune<\/a>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18843\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-6.jpg\" alt=\"\" width=\"800\" height=\"678\"><\/p>\n<h3>Tabulating event structures<\/h3>\n<p>Lastly, you can always render the event structure produced by the API as a flat table, indicating, for example, the argument roles of the various participants in each event, as in the following table. The table demonstrates how Comprehend Events groups entity mentions and triggers into coreferential groups. You can use these textual mention groups to verify and analyze system predictions.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18844\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-7.jpg\" alt=\"\" width=\"800\" height=\"271\"><\/p>\n<h2>Setting up the Comprehend Events AWS CloudFormation stack<\/h2>\n<p>You can quickly try out this example for yourself by deploying our sample code into your own account from the provided <a href=\"http:\/\/aws.amazon.com\/cloudformation\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CloudFormation<\/a> <a href=\"https:\/\/console.aws.amazon.com\/cloudformation\/home?region=us-east-1#\/stacks\/create\/review?stackName=ComprehendEventsBlog&amp;templateURL=https:\/\/aws-ml-blog.s3.amazonaws.com\/artifacts\/textract-comprehend-lex\/template-export.yml\" target=\"_blank\" rel=\"noopener noreferrer\">template<\/a>. We\u2019ve included all the necessary steps in a Jupyter notebook, so you can easily walk through creating the preceding visualizations and see how it all works. From there, you can easily modify it to run over other custom datasets, modify the results, ingest them into other systems, and build upon the solution. Complete the following steps:<\/p>\n<ol>\n<li>Choose <strong>Launch Stack<\/strong>:<\/li>\n<\/ol>\n<p><a href=\"https:\/\/us-east-1.console.aws.amazon.com\/cloudformation\/home?region=us-east-1#\/stacks\/new?stackName=ComprehendEventsBlog&amp;templateURL=https:\/\/serverless-analytics.s3.amazonaws.com\/comprehend-events-blog\/eventsBlog.yml\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16018\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/17\/9-LaunchStack.jpg\" alt=\"\" width=\"141\" height=\"31\"><\/a><\/p>\n<ol start=\"2\">\n<li>After the template loads in the AWS CloudFormation console, choose <strong>Next<\/strong>.<\/li>\n<li>For <strong>Stack name<\/strong>, enter a name for your deployment.<\/li>\n<li>Choose <strong>Next<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18845\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-8.jpg\" alt=\"\" width=\"800\" height=\"393\"><\/p>\n<ol start=\"5\">\n<li>Choose <strong>Next <\/strong>on the following page.<\/li>\n<li>Select the check box acknowledging this template will create IAM resources.<\/li>\n<\/ol>\n<p>This allows the SageMaker notebook instance to talk with Amazon S3 and Amazon Comprehend.<\/p>\n<ol start=\"7\">\n<li>Choose <strong>Create stack<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18846\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-9.jpg\" alt=\"\" width=\"800\" height=\"303\"><\/p>\n<ol start=\"8\">\n<li>When stack creation is complete, browse to your notebook instances on the SageMaker console.<\/li>\n<\/ol>\n<p>A new instance is already loaded with the example data and Jupyter notebook.<\/p>\n<ol start=\"9\">\n<li>Choose <strong>Open Jupyter<\/strong> for the <code>comprehend-events-blog<\/code> notebook.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18847\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-10.jpg\" alt=\"\" width=\"800\" height=\"126\"><\/p>\n<p>The data and notebook are already loaded on the instance. This was done through a SageMaker lifecycle configuration.<\/p>\n<ol start=\"10\">\n<li>Choose the <code>notebooks<\/code> folder.<\/li>\n<li>Choose the <code>comprehend_events_finance_tutorial.ipynb<\/code> notebook.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18848\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-11.jpg\" alt=\"\" width=\"612\" height=\"377\"><\/p>\n<ol start=\"12\">\n<li>Step through the notebook to try Comprehend Events out yourself.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18849\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Amazon-Comprehend-Events-12.jpg\" alt=\"\" width=\"800\" height=\"476\"><\/p>\n<h2>Applications using Comprehend Events<\/h2>\n<p>We have demonstrated applying Comprehend Events to a small set of documents and demonstrated visualizing the event structures found in a sample document. The power of Comprehend Events, however, lies in its ability to extract and structure business-relevant facts from large collections of unstructured documents. In this section, we discuss a few potential solutions that you could build on top of the foundation provided by Comprehend Events.<\/p>\n<h3>Knowledge graph construction<\/h3>\n<p>Business and financial services analysts need to visually explore event-based relationships among corporate entities, identifying potential patterns over large collections of data. Without a tool like Comprehend Events, you have to manually identify entities of interests and events in documents and manually enter them in network visualization tools for tracking. Comprehend Events allows you to populate knowledge graphs over large collections of data. You can store these graphs and search, for example, in Neptune and explore using network visualization tools without expensive manual extraction.<\/p>\n<h3>Semantic search<\/h3>\n<p>Analysts also need to find documents in which actors of interest participate in events of interest (at places, at times). The most common approach to this task involves enterprise search: using complex Boolean queries to find co-occurring strings that typically match your desired search patterns. Natural language is rich and highly variable, however, and even the best searches often miss key details in unstructured text. Comprehend Events allows you to populate a search index with event-argument associations, enriching free text search with extracted event data. You can process collections of documents with Comprehend Events, index the documents in <a href=\"https:\/\/aws.amazon.com\/elasticsearch-service\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elasticsearch Service<\/a> (Amazon ES) with the extracted event data, and enable field-based search over event-argument tuples in downstream applications.<\/p>\n<h3>Document triage<\/h3>\n<p>An additional application of Comprehend Events is simple filtration of large text collections for events of interest. This task is typically performed with a tool such as Amazon Comprehend customer classification, but requires hundreds or thousands of annotated training documents to produce a custom model. Comprehend Events allows developers without such training data to process a large collection of documents and detect financial events found in the event taxonomy. You can simply process batches of documents with the asynchronous API and route documents matching pre-defined event patterns to downstream applications.<\/p>\n<h2>Conclusion<\/h2>\n<p>This post has demonstrated the application and utility of Comprehend Events for information processing in the finance domain. This new feature gives you the ability to enrich your applications with close semantic analysis of financial events from unstructured text, all without any NLP model training or tuning. For more information, just check out our <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/how-events.html\" target=\"_blank\" rel=\"noopener noreferrer\">documentation<\/a> or try out the above walkthrough for yourself in the <a href=\"https:\/\/console.aws.amazon.com\/comprehend\/v2\/home?region=us-east-1#create-analysis-job\" target=\"_blank\" rel=\"noopener noreferrer\">Console<\/a> or in our Jupyter notebook through <a href=\"https:\/\/us-east-1.console.aws.amazon.com\/cloudformation\/home?region=us-east-1#\/stacks\/new?stackName=ComprehendEventsBlog&amp;templateURL=https:\/\/serverless-analytics.s3.amazonaws.com\/comprehend-events-blog\/eventsBlog.yml\" target=\"_blank\" rel=\"noopener noreferrer\">CloudFormation<\/a> or on <a href=\"https:\/\/github.com\/aws-samples\/amazon-comprehend-examples\/tree\/master\/amazon_comprehend_events_tutorial\" target=\"_blank\" rel=\"noopener noreferrer\">Github<\/a>. We\u2019re exciting to hear your comments and questions in the comments section!<\/p>\n<p>\u00a0<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-18854 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Graham-Horwood.jpg\" alt=\"\" width=\"100\" height=\"133\"><strong>Graham Horwood<\/strong> is a data scientist at Amazon AI. His work focuses on natural language processing technologies for customers in the public and commercial sectors.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-18853 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Ben-Snively.jpg\" alt=\"\" width=\"100\" height=\"133\"><strong>Ben Snively<\/strong>\u00a0is an AWS Public Sector Specialist Solutions Architect.\u00a0He works with government, non-profit, and education customers on big data\/analytical and AI\/ML projects, helping them build solutions using AWS.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-18855 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/11\/24\/Sameer-Karnik.jpg\" alt=\"\" width=\"100\" height=\"130\">Sameer Karnik<\/strong> is a Sr. Product Manager leading product for Amazon Comprehend, AWS\u2019s natural language processing service.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/announcing-the-launch-of-amazon-comprehend-events\/<\/p>\n","protected":false},"author":0,"featured_media":629,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/628"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=628"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/628\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/629"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=628"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=628"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=628"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}