{"id":294,"date":"2020-09-26T00:29:23","date_gmt":"2020-09-26T00:29:23","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/09\/26\/active-learning-workflow-for-amazon-comprehend-custom-classification-models-part-1-2\/"},"modified":"2020-09-26T00:29:23","modified_gmt":"2020-09-26T00:29:23","slug":"active-learning-workflow-for-amazon-comprehend-custom-classification-models-part-1-2","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/09\/26\/active-learning-workflow-for-amazon-comprehend-custom-classification-models-part-1-2\/","title":{"rendered":"Active learning workflow for Amazon Comprehend custom classification models \u2013 Part 1"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/aws.amazon.com\/comprehend\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Comprehend<\/a>\u00a0\u00a0Custom Classification API enables you to easily build custom text classification models using your business-specific labels without learning ML. For example, your customer support organization can use Custom Classification to automatically categorize inbound requests by problem type based on how the customer has described the issue. \u00a0You can use custom classifiers to automatically label support emails with appropriate issue types, routing customer phone calls to the right agents, and categorizing social media posts into user segments.<\/p>\n<p>For custom classification, you start by creating a training job with a ground truth dataset comprising a collection of text and corresponding category labels. Upon completing the job, you have a classifier that can classify any new text into one or more named categories. When the custom classification model classifies a new unlabeled text document, it predicts what it has learned from the training data. Sometimes you may not have a training dataset with various language patterns, or once you deploy the model, you start seeing completely new data patterns. In these cases, the model may not be able to classify these new data patterns accurately. How can we ensure continuous model training to keep it up to date with new data and patterns?<\/p>\n<p>In this two part blog series, we discuss an architecture pattern that allows you to build an active learning workflow for Amazon Comprehend custom classification models. The first post will describe a workflow comprising real-time classification, feedback pipelines and human review workflows using <a href=\"https:\/\/aws.amazon.com\/augmented-ai\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Augmented AI<\/a> (Amazon A2I). The second post will cover the automated model building using the human reviewed data, selecting the best model, and automated deployment of an endpoint of the chosen model.<\/p>\n<p>Feedback loops play a pivotal role in keeping the models up to date. This feedback helps the models learn about their misclassifications and learn the right ones. This process of teaching the models continuously through feedback and deploying them is called <em>active learning<\/em>.<\/p>\n<p>For every prediction Amazon Comprehend Custom Classification makes, it also gives a confidence score associated with its prediction. This architecture proposes that you set an acceptable threshold and only accept the predictions with a confidence score that exceeds the threshold. All the predictions that have a confidence score less than the desired threshold are flagged for human review. The human decides whether to accept the model\u2019s prediction or correct it.<\/p>\n<p>In some instances, the model may be confident about its predictions, but the classification might be wrong. In these scenarios, the end-user applications that receive the model predictions can request explicit feedback from its users on the prediction quality. A human moderator reviews this explicit feedback and reclassifies instances where the feedback was negative. This process of generating human-verified data and using it for model retraining helps keep the models up to date, reduce data drift, and achieve higher model accuracy.<\/p>\n<h2>Feedback Workflow Architecture.<\/h2>\n<p>In this section, we discuss an architectural pattern for implementing an end-to-end active learning workflow for custom classification models in Amazon Comprehend using Amazon A2I. The active learning workflow comprises the following components:<\/p>\n<ol>\n<li>Real-time classification<\/li>\n<li>Feedback loops<\/li>\n<li>Human classification<\/li>\n<li>Model building<\/li>\n<li>Model selection<\/li>\n<li>Model deployment<\/li>\n<\/ol>\n<p>The following diagram illustrates this architecture covering the first three components. In the following sections, we walk you through each step in the workflow.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-16272 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/FEEDBACKLOOPS-HUMANCLASSIFICATION15.png\" alt=\"Architecture Diagram for Feedback Loops\" width=\"1599\" height=\"1149\"><\/p>\n<h3>Real-time classification<\/h3>\n<p>To use custom classification in Amazon Comprehend, you need to create a custom classification job that reads a ground truth dataset from an <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) bucket and builds a classification model. After the model builds successfully, you can create an endpoint that allows you to make real-time classifications of unlabeled text. This stage is represented by steps 1\u20133 in the preceding architecture:<\/p>\n<ol>\n<li>The end-user application calls an <a href=\"https:\/\/aws.amazon.com\/api-gateway\" target=\"_blank\" rel=\"noopener noreferrer\">API Gateway<\/a> endpoint with a text that needs to be classified.<\/li>\n<li>The API Gateway endpoint then calls an <a href=\"http:\/\/aws.amazon.com\/lambda\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Lambda<\/a> function configured to call an Amazon Comprehend endpoint.<\/li>\n<li>The Lambda function calls the Amazon Comprehend endpoint, which returns the unlabeled text classification and a confidence score.<\/li>\n<\/ol>\n<h3>Feedback collection<\/h3>\n<p>When the endpoint returns the classification and the confidence score during the real-time classification, you can send instances with low-confidence scores to human review. This type of feedback is called <em>implicit feedback<\/em>.<\/p>\n<ol start=\"4\">\n<li>The Lambda function sends the implicit feedback to an <a href=\"https:\/\/aws.amazon.com\/kinesis\/data-firehose\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Kinesis Data Firehose<\/a>.<\/li>\n<\/ol>\n<p>The other type of feedback is called <em>explicit feedback<\/em> and comes from the application\u2019s end-users that use the custom classification feature. This type of feedback comprises the instances of text where the user wasn\u2019t happy with the prediction. Explicit feedback can be sent either in real-time through an API or a batch process.<\/p>\n<ol start=\"5\">\n<li>End-users of the application submit explicit real-time feedback through an API Gateway endpoint.<\/li>\n<li>The Lambda function backing the API endpoint transforms the data into a standard feedback format and writes it to the Kinesis Data Firehose delivery stream.<\/li>\n<li>End-users of the application can also submit explicit feedback as a batch file by uploading it to an S3 bucket.<\/li>\n<li>A trigger configured on the S3 bucket triggers a Lambda function.<\/li>\n<li>The Lambda function transforms the data into a standard feedback format and writes it to the delivery stream.<\/li>\n<li>Both the implicit and explicit feedback data gets sent to a delivery stream in a standard format. All this data is buffered and written to an S3 bucket.<\/li>\n<\/ol>\n<h3>Human classification<\/h3>\n<p>The human classification stage includes the following steps:<\/p>\n<ol start=\"11\">\n<li>A trigger configured on the feedback bucket in Step 10 invokes a Lambda function.<\/li>\n<li>The Lambda function creates Amazon A2I human review tasks for all the feedback data received.<\/li>\n<li>Workers assigned to the classification jobs log in to the human review portal and either approve the classification by the model or classify the text with the right labels.<\/li>\n<li>After the human review, all these instances are stored in an S3 bucket and used for retraining the models. Part 2 of this series covers the retraining workflow.<\/li>\n<\/ol>\n<h2>Solution overview<\/h2>\n<p>The next few sections of the post go over how to set up this architecture in your AWS account. We classify news into four categories: World, Sports, Business, and Sci\/Tech, using the <a href=\"https:\/\/registry.opendata.aws\/fast-ai-nlp\/\" target=\"_blank\" rel=\"noopener noreferrer\">AG News dataset<\/a> for custom classification, and set up the implicit and explicit feedback loop. You need to complete two manual steps:<\/p>\n<ol>\n<li>Create an Amazon Comprehend custom classifier and an endpoint.<\/li>\n<li>Create an <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> private workforce, worker task template, and human review workflow.<\/li>\n<\/ol>\n<p>After this, you run the provided <a href=\"http:\/\/aws.amazon.com\/cloudformation\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CloudFormation<\/a> template to set up the rest of the architecture.<\/p>\n<h2>Prerequisites<\/h2>\n<p><strong>Before you get started, download the dataset and upload it to Amazon S3.<\/strong> This dataset comprises a collection of news articles and their corresponding category labels. We have created a training dataset called train.csv from the original dataset and made it available for <a href=\"https:\/\/github.com\/aws-samples\/amazon-comprehend-active-learning-framework\/blob\/master\/data\/train.csv\" target=\"_blank\" rel=\"noopener noreferrer\">download<\/a>.<\/p>\n<p>The following screenshot shows a sample of the train.csv file.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16201 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/2-Screenshot-3.jpg\" alt=\"CSV file representing the Training data set\" width=\"900\" height=\"122\"><\/p>\n<p>After you download the train.csv file, upload it to an S3 bucket in your account for reference during training. For more information about uploading files, see <a href=\"https:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/user-guide\/upload-objects.html\" target=\"_blank\" rel=\"noopener noreferrer\">How do I upload files and folders to an S3 bucket?<\/a><\/p>\n<h2>Creating a custom classifier and an endpoint<\/h2>\n<p>To create your classifier for classifying news, complete the following steps:<\/p>\n<ol>\n<li>On the <a href=\"https:\/\/console.aws.amazon.com\/comprehend\/v2\/home?region=us-east-1#entity-recognition\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Comprehend console<\/a>, choose <strong>Custom Classification<\/strong>.<\/li>\n<li>Choose <strong>Train classifier<\/strong>.<\/li>\n<li>For<strong> Name<\/strong>, enter <code>news-classifier-demo<\/code>.<\/li>\n<li>Select <strong>Using<\/strong> <strong>Multi-class mode<\/strong>.<\/li>\n<li>For <strong>Training data S3 location<\/strong>, enter the path for train.csv in your S3 bucket, for example, <code>s3:\/\/<em>&lt;your-bucketname&gt;<\/em>\/train.csv<\/code>.<\/li>\n<li>For <strong>Output data S3 location<\/strong>, enter the S3 bucket path where you want the output, such as <code>s3:\/\/<em>&lt;your-bucketname&gt;<\/em>\/<\/code>.<\/li>\n<li>For <strong>IAM role<\/strong>, select <strong>Create an IAM role<\/strong>.<\/li>\n<li>For <strong>Permissions to access<\/strong>, choose<strong> Input and output (if specified) S3 bucket<\/strong>.<\/li>\n<li>For<strong> Name suffix<\/strong>, enter <code>ComprehendCustom<\/code>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16202 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/3-Classifier-Mode.jpg\" alt=\"Comprehend Custom Classification Model Creation\" width=\"900\" height=\"899\"><\/p>\n<ol start=\"10\">\n<li>Scroll down and choose <strong>Train Classifier<\/strong> to start the training process.<\/li>\n<\/ol>\n<p>The training takes some time to complete. You can either wait to create an endpoint or come back to this step later after finishing the steps in the section <strong>Creating a private workforce, worker task template, and human review workflow<\/strong>.<\/p>\n<h2>Creating a custom classifier real-time endpoint<\/h2>\n<p>To create your endpoint, complete the following steps:<\/p>\n<ol>\n<li>On the Amazon Comprehend console, choose <strong>Custom Classification<\/strong>.<\/li>\n<li>From the <strong>Classifiers <\/strong>list, choose the name of the custom model for which you want to create the endpoint and select your model <code>news-classifier-demo<\/code>.<\/li>\n<li>From the <strong>Actions<\/strong> drop-down menu, choose <strong>Create endpoint<\/strong>.<\/li>\n<li>For <strong>Endpoint name<\/strong>, enter <code>classify-news-endpoint<\/code> and give it one inference unit.<\/li>\n<li>Choose <strong>Create endpoint<\/strong>\n<\/li>\n<li>Copy the endpoint ARN as shown in the following screenshot. You use it when running the CloudFormation template in a future step.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16203 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/4-Classify-news-endpoint.jpg\" alt=\"Custom Classification Model Endpoint Page\" width=\"900\" height=\"305\"><\/p>\n<h2>Creating a private workforce, worker task template, and human review workflow.<\/h2>\n<p>This section walks you through creating a private workforce in Amazon SageMaker, a worker task template, and your human review workflow.<\/p>\n<h3>Creating <strong>a labeling workforce<\/strong><br \/>\n<\/h3>\n<ol>\n<li>For this post, you will create a private work team and add only one user (you) to it. For instructions, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sms-workforce-create-private-console.html#create-workforce-sm-console\" target=\"_blank\" rel=\"noopener noreferrer\">Create a Private Workforce (Amazon SageMaker Console)<\/a>.<\/li>\n<li>Once the user accepts the invitation, you will have to add him to the workforce. For instructions, see the\u00a0<strong>Add a Worker to a Work Team<\/strong> section the <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sms-workforce-management-private-console.html\">Manage a Workforce (Amazon SageMaker Console)<\/a>\n<\/li>\n<\/ol>\n<h3><strong>Creating a worker task template<\/strong><\/h3>\n<p>To create a worker task template, complete the following steps:<\/p>\n<ol>\n<li>On the <a href=\"https:\/\/console.aws.amazon.com\/a2i\">Amazon A2I console,<\/a> choose <strong>Worker task templates<\/strong>.<\/li>\n<li>Choose to <strong>Create a <\/strong><strong>template<\/strong>.<\/li>\n<li>For <strong>Template name<\/strong>, enter <code>custom-classification-template<\/code>.<\/li>\n<li>For<strong> Template type<\/strong>, choose <strong>Custom,<\/strong>\n<\/li>\n<li>In the <strong>Template editor<\/strong>, enter the following <a href=\"https:\/\/github.com\/aws-samples\/amazon-comprehend-active-learning-framework\/blob\/master\/amazon-a2i-task-uis\/custom-classification.html\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub UI template code<\/a>.<\/li>\n<li>Choose <strong>Create<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16204 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/5-template-editor.jpg\" alt=\"Worker Task Template\" width=\"900\" height=\"466\"><\/p>\n<h3><strong>Creating a human review workflow<\/strong><\/h3>\n<p>To create your human review workflow, complete the following steps:<\/p>\n<ol>\n<li>On the <a href=\"https:\/\/console.aws.amazon.com\/a2i\">Amazon A2I console<\/a>, choose <strong>Human review workflows.<\/strong>\n<\/li>\n<li>\n<strong>Choose Create human review workflow<\/strong>.<\/li>\n<li>For <strong>Name, <\/strong>enter <code>classify-workflow<\/code><strong>.<\/strong>\n<\/li>\n<li>Specify an <strong>S3 bucket<\/strong> to store output: <code>s3:\/\/<em>&lt;your bucketname&gt;<\/em>\/<\/code>.<\/li>\n<\/ol>\n<p>Use the same bucket where you downloaded your train.csv in the prerequisite step.<\/p>\n<ol start=\"5\">\n<li>For <strong>IAM role<\/strong>, select <strong>Create a new role<\/strong>.<\/li>\n<li>For <strong>Task type<\/strong>, choose <strong>Custom.<\/strong>\n<\/li>\n<li>Under <strong>Worker task template creation<\/strong>, select the custom classification template you created.<\/li>\n<li>For <strong>Task description<\/strong>, enter <code>Read the instructions and review the document<\/code><em>.<\/em>\n<\/li>\n<li>Under <strong>Workers,<\/strong> select <strong>Private<\/strong>.<\/li>\n<li>Use the drop-down list to choose the private team that you created.<\/li>\n<li>Choose <strong>Create<\/strong>.<\/li>\n<li>Copy the workflow ARN (see the following screenshot) to use when initializing the CloudFormation parameters.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16205 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/6-How-it-works.jpg\" alt=\"Human Review Workflow Page\" width=\"900\" height=\"228\"><\/p>\n<h2>Deploying the CloudFormation template to set up active learning feedback<\/h2>\n<p>Now that you have completed the manual steps, you can run the CloudFormation template to set up this architecture\u2019s building blocks, including the real-time classification, feedback collection, and the human classification.<\/p>\n<p>Before deploying the CloudFormation template, make sure you have the following to pass as parameters:<\/p>\n<ul>\n<li>Custom classifier endpoint ARN<\/li>\n<li>Amazon A2I workflow ARN<\/li>\n<\/ul>\n<ol>\n<li>Choose <strong>Launch Stack<\/strong>:<\/li>\n<\/ol>\n<p><a href=\"https:\/\/console.aws.amazon.com\/cloudformation\/home?region=us-east-1#\/stacks\/create\/review?stackName=comprehend-active-learning&amp;templateURL=https:\/\/aws-ml-blog.s3.amazonaws.com\/artifacts\/comprehend-a2i-active-learning\/comprehend-active-learning-infra.yml\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16216 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/LaunchStack.jpg\" alt=\"\" width=\"107\" height=\"20\"><\/a><\/p>\n<ol start=\"2\">\n<li>Enter the following parameters:\n<ol type=\"a\">\n<li>\n<strong>ComprehendEndpointARN<\/strong> \u2013 The endpoint ARN you copied.<\/li>\n<li>\n<strong>HumanReviewWorkflowARN<\/strong> \u2013 The workflow ARN you copied.<\/li>\n<li>\n<strong>ComrehendClassificationScoreThreshold <\/strong>\u2013 Enter 0.5, which means a 50% threshold for low confidence score.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16206 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/7-Parameters.jpg\" alt=\"CloudFormation Required Parameters\" width=\"900\" height=\"481\"><\/p>\n<ol start=\"3\">\n<li>Choose <strong>Next<\/strong> until the <strong>Capabilities <\/strong>\n<\/li>\n<li>Select the check-box to provide acknowledgment to AWS CloudFormation to create <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) resources and expand the template.<\/li>\n<\/ol>\n<p>For more information about these resources, see <a href=\"https:\/\/aws.amazon.com\/iam\/resources\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS IAM resources<\/a>.<\/p>\n<ol start=\"5\">\n<li>Choose <strong>Create stack<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16207 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/8-Capabilities.jpg\" alt=\"Acknowledgement section of the CloudFormation Page\" width=\"900\" height=\"219\"><\/p>\n<p>Wait until the status of the stack changes from <code>CREATE_IN_PROGRESS<\/code> to <code>CREATE_COMPLETE<\/code>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-16303 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/24\/image4.png\" alt=\"CloudFormation Outputs\" width=\"1868\" height=\"643\"><\/p>\n<ol start=\"6\">\n<li>On the <strong>Outputs<\/strong> tab of the stack (see the following screenshot), copy the value for \u00a0<code>BatchUploadS3Bucket<\/code>, <code>FeedbackAPIGatewayID<\/code>, and <code>TextClassificationAPIGatewayID<\/code> to interact with the feedback loop.<\/li>\n<li>Both the TextClassificationAPI and FeedbackAPI will require and API key to interact with them. The Cloudformtion output <code>ApiGWKey<\/code> refers to the name of the API key. Currently this API key is associated with a usage plan that allows 2000 requests per month.<\/li>\n<li>On the<a href=\"https:\/\/console.aws.amazon.com\/apigateway\"> API Gateway console<\/a>, choose either the TextClassification API or the the FeedbackAPI. Choose\u00a0<strong>API Keys<\/strong> from the left navigation. Choose your API key from step 7. Expand the <strong>API key<\/strong> section in the right pane and copy the value.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-16304 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/24\/image6.png\" alt=\"API Key page\" width=\"1899\" height=\"543\"><\/p>\n<ol start=\"9\">\n<li>You can manage the usage plan by following the instructions on, <a href=\"https:\/\/docs.aws.amazon.com\/apigateway\/latest\/developerguide\/api-gateway-create-usage-plans-with-console.html\">Create, configure, and test usage plans with the API Gateway console.<\/a>\n<\/li>\n<li>You can also add fine grained authentication and authorization to your APIs. For more information on securing your APIs, you can follow instructions on <a href=\"https:\/\/docs.aws.amazon.com\/apigateway\/latest\/developerguide\/apigateway-control-access-to-api.html\">Controlling and managing access to a REST API in API Gateway.<\/a>\n<\/li>\n<\/ol>\n<h2>Testing the feedback loop<\/h2>\n<p>In this section, we walk you through testing your feedback loop, including real-time classification, implicit and explicit feedback, and human review tasks.<\/p>\n<h3>Real-time classification<\/h3>\n<p>To interact and test these APIs, you need to download <a href=\"https:\/\/www.postman.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Postman<\/a>.<\/p>\n<p>The API Gateway endpoint receives an unlabeled text document from a client application and internally calls the custom classification endpoint, which returns the predicted label and a confidence score.<\/p>\n<ol>\n<li>Open Postman and enter the <code>TextClassificationAPIGateway<\/code> URL in POST method.<\/li>\n<li>In the Headers section, configure the API key. \u00a0x-api-key : \u00a0&lt;&lt; Your API key &gt;&gt;<\/li>\n<li>In the text field, enter the following JSON code (make sure you have <strong>JSON <\/strong>selected and enable <strong>raw<\/strong>):<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">{\"classifier\":\"&lt;your custom classifier name&gt;\", \"sentence\":\"MS Dhoni retires and a billion people had mixed feelings.\"}<\/code><\/pre>\n<\/div>\n<ol start=\"3\">\n<li>Choose <strong>Send<\/strong>.<\/li>\n<\/ol>\n<p>You get a response back with a confidence score and class, as seen in the following screenshot.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16209 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/10-screenshot.jpg\" alt=\"Sample JSON request to the Classify Text API endpoint.\" width=\"900\" height=\"586\"><\/p>\n<h3>Implicit feedback<\/h3>\n<p>When the endpoint returns the classification and the confidence score during the real-time classification, you can route all the instances where the confidence score doesn\u2019t meet the threshold to human review. This type of feedback is called implicit feedback. For this post, we set the threshold as 0.5 as an input to the CloudFormation stack parameter.<\/p>\n<p>You can change this threshold when deploying the CloudFormation template based on your needs.<\/p>\n<h3>Explicit feedback<\/h3>\n<p>The explicit feedback comes from the end-users of the application that uses the custom classification feature. This type of feedback comprises the instances of text where the user wasn\u2019t happy with the prediction. You can send the predicted label by the model\u2019s explicit feedback through the following methods:<\/p>\n<ul>\n<li>Real time through an API, which is usually triggered through a like\/dislike button on a UI.<\/li>\n<li>Batch process, where a file with a collection of misclassified utterances is put together based on a user survey conducted by the customer outreach team.<\/li>\n<\/ul>\n<h4>Invoking the explicit real-time feedback loop<\/h4>\n<p>To test the Feedback API, complete the following steps:<\/p>\n<ol>\n<li>Open Postman and enter the <code>FeedbackAPIGatewayID<\/code> value from your CloudFormation stack output in POST method.<\/li>\n<li>In the Headers section, configure the API key. <code>\u00a0x-api-key : \u00a0&lt;&lt; Your API key &gt;&gt;<\/code>\n<\/li>\n<li>In the text field, enter the following JSON code (for <code>classifier<\/code>, enter the classifier you created, such as <code>news-classifier-demo<\/code>, and make sure you have <strong>JSON <\/strong>selected and enable <strong>raw<\/strong>):<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">{\"classifier\":\"&lt;your custom classifier name&gt;\",\"sentence\":\"Sachin is Indian Cricketer.\"}<\/code><\/pre>\n<\/div>\n<ol start=\"3\">\n<li>Choose<strong> Send<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16210 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/11-Screenshot-1.jpg\" alt=\"Sample JSON request to the Feedback API endpoint.\" width=\"900\" height=\"470\"><\/p>\n<h4>Submitting explicit feedback as a batch file<\/h4>\n<p>Download the following <a href=\"https:\/\/github.com\/aws-samples\/amazon-comprehend-active-learning-framework\/blob\/master\/data\/test-feedback(1).json\" target=\"_blank\" rel=\"noopener noreferrer\">test feedback<\/a> JSON file, populate it with your data, and upload it into the <code>BatchUploadS3Bucket<\/code> created when you deployed your CloudFormation template. The following code shows some sample data in the file:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-json\">{\r\n   \"classifier\":\"news-classifier-demo\",\r\n   \"sentences\":[\r\n      \"US music firms take legal action against 754 computer users alleged to illegally swap music online.\",\r\n      \"A gamer spends $26,500 on a virtual island that exists only in a PC role-playing game.\"\r\n   ]\r\n}\r\n<\/code><\/pre>\n<\/div>\n<p>Uploading the file triggers the Lambda function that starts your human review loop.<\/p>\n<h3>Human review tasks<\/h3>\n<p>All the feedback collected through the implicit and explicit methods is sent for human classification. The labeling workforce can include <a href=\"http:\/\/aws.amazon.com\/mturk\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Mechanical Turk<\/a>, private teams, or AWS Marketplace vendors. For this post, we create a private workforce. The URL to the labeling portal is located on the Amazon SageMaker console, on the <strong>Labeling workforces<\/strong> page, on the <strong>Private<\/strong> tab.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16211 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/12-AmazonSageMaker.jpg\" alt=\"Private Workforce section of the SageMaker console.\" width=\"900\" height=\"294\"><\/p>\n<p>After you log in, you can see the human review tasks assigned to you. Select the task to complete and choose <strong>Start working<\/strong>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16212 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/13-Start-working.jpg\" alt=\"Human Review Task Page\" width=\"900\" height=\"181\"><\/p>\n<p>You see the tasks displayed based on the worker template used when creating the human workflow.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16213 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/14-Screenshot-1.jpg\" alt=\"Human Review Task \" width=\"900\" height=\"139\"><\/p>\n<p>After you complete the human classification and submit the tasks, the human-reviewed data is stored in the S3 bucket you configured when creating the human review workflow. Go to Amazon Sagemaker-&gt; Human review workflows-&gt;output location:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16214 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/15-Screensot.jpg\" alt=\"Human Review Task Output Location\" width=\"900\" height=\"271\"><\/p>\n<p>This human-reviewed data is used to retrain the custom classification model to learn newer patterns and improve its overall accuracy. Below is screenshot of the human annotated output file output.json in S3 bucket:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-16215 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/16-Screenshot.jpg\" alt=\"Human Review Task Output payload\" width=\"900\" height=\"141\"><\/p>\n<p>The process of retraining the models with human-reviewed data, selecting the best model, and automatically deploying the new endpoints completes the active learning workflow. We cover these remaining steps in Part 2 of this series.<\/p>\n<h2>Cleaning up<\/h2>\n<p>To remove all resources created throughout this process and prevent additional costs, complete the following steps:<\/p>\n<ol>\n<li>On the Amazon S3 console, delete the S3 bucket that contains the training dataset.<\/li>\n<li>On the Amazon Comprehend console, delete the endpoint and the classifier.<\/li>\n<li>On the Amazon A2I console, delete the human review workflow, worker template, and the private workforce.<\/li>\n<li>On the <strong>AWS CloudFormation<\/strong> console, delete the stack you created. (This removes the resources the CloudFormation template created.)<\/li>\n<\/ol>\n<h2>Conclusion<\/h2>\n<p>Amazon Comprehend helps you build scalable and accurate natural language processing capabilities without any machine learning experience. This post provides a reusable pattern and infrastructure for active learning workflows for custom classification models. The feedback pipelines and human review workflow help the custom classifier learn new data patterns continuously. The second part of this series covers the automatic model building, selection, and deployment of custom classification models.<\/p>\n<p>For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/how-document-classification.html\" target=\"_blank\" rel=\"noopener noreferrer\">Custom Classification<\/a>. You can discover other Amazon Comprehend features and get inspiration from other <a href=\"https:\/\/aws.amazon.com\/blogs\/?awsf.blog-master-artificial-intelligence=category-artificial-intelligence%23amazon-comprehend\" target=\"_blank\" rel=\"noopener noreferrer\">AWS blog posts<\/a> about how to use Amazon Comprehend beyond classification.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-16217 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/Kesharaju.jpg\" alt=\"\" width=\"101\" height=\"116\"> \u00a0Shanthan Kesharaju<\/strong> is a Senior Architect in the AWS ProServe team. He helps our customers with AI\/ML strategy, architecture, and develop products with a purpose. Shanthan has an MBA in Marketing from Duke University and an MS in Management Information Systems from Oklahoma State University.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-16219 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/Mona.jpg\" alt=\"\" width=\"100\" height=\"134\"><strong>Mona Mona<\/strong> is an AI\/ML Specialist Solutions Architect based out of Arlington, VA. She works with World Wide Public Sector team and helps customers adopt machine learning on a large scale. She is passionate about NLP and ML Explainability areas in AI\/ML.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-16218 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/23\/Lewis.jpg\" alt=\"\" width=\"101\" height=\"129\"><strong>Joyson Neville Lewis<\/strong> obtained his master\u2019s in Information Technology from Rutgers University in 2018. He has worked as a Software\/Data engineer before diving into the Conversational AI domain in 2019, where he works with companies to connect the dots between business and AI using voice and chatbot solutions. Joyson joined Amazon Web Services in February of 2018 as a Big Data Consultant for AWS Professional Services team in NYC.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/active-learning-workflow-for-amazon-comprehend-custom-classification-models-part-1\/<\/p>\n","protected":false},"author":0,"featured_media":295,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/294"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=294"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/294\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/295"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=294"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=294"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=294"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}