{"id":987,"date":"2021-10-02T08:48:41","date_gmt":"2021-10-02T08:48:41","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/10\/02\/translate-and-analyze-text-using-sql-functions-with-amazon-redshift-amazon-translate-and-amazon-comprehend\/"},"modified":"2021-10-02T08:48:41","modified_gmt":"2021-10-02T08:48:41","slug":"translate-and-analyze-text-using-sql-functions-with-amazon-redshift-amazon-translate-and-amazon-comprehend","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/10\/02\/translate-and-analyze-text-using-sql-functions-with-amazon-redshift-amazon-translate-and-amazon-comprehend\/","title":{"rendered":"Translate and analyze text using SQL functions with Amazon Redshift, Amazon Translate, and Amazon Comprehend"},"content":{"rendered":"<div id=\"\">\n<p>You may have tables in your <a href=\"http:\/\/aws.amazon.com\/redshift\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Redshift<\/a> data warehouse or in your <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) data lake full of records containing customer case notes, product reviews, and social media messages, in many languages. Your task is to identify the products that people are talking about, determine if they\u2019re expressing happy thoughts or sad thoughts, translate their comments into a single common language, and create copies of the data for your business analysts with this new information added to each record. Additionally, you need to remove any personally identifiable information (PII), such as names, addresses, and credit card numbers.<\/p>\n<p>You already know how to use Amazon Redshift to transform data using simple SQL commands and built-in functions. Now you can also use Amazon Redshift to translate, analyze, and redact text fields, thanks to <a href=\"https:\/\/aws.amazon.com\/translate\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Translate<\/a>, <a href=\"https:\/\/aws.amazon.com\/comprehend\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Comprehend<\/a>, and the power of Amazon Redshift supported <a href=\"http:\/\/aws.amazon.com\/lambda\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Lambda<\/a> user-defined functions (UDFs).<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image001.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-28645 aligncenter\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image001.png\" alt=\"\" width=\"443\" height=\"195\"><\/a><\/p>\n<p>With Amazon Redshift, you can query and combine structured and semi-structured data across your data warehouse, operational database, and data lake using standard SQL. Amazon Comprehend is a natural language processing (NLP) service that makes it easy to uncover insights from text. Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. In this post, I show you how you can now use them together to perform the following actions:<\/p>\n<ul>\n<li>Detect and redact PII<\/li>\n<li>Detect and redact entities (such as items, places, or quantities)<\/li>\n<li>Detect the dominant language of a text field<\/li>\n<li>Detect the prevailing sentiment expressed\u2014positive, negative, neither, or both<\/li>\n<li>Translate text from one language to another<\/li>\n<\/ul>\n<p>This post accomplishes the following goals:<\/p>\n<ul>\n<li>Show you how to quickly set up the Amazon Redshift text analytics functions in your own AWS account (it\u2019s fast and easy!)<\/li>\n<li>Briefly explain how the functions work<\/li>\n<li>Discuss performance and cost<\/li>\n<li>Provide a tutorial where we do some text analytics on Amazon product reviews<\/li>\n<li>Describe all the available functions<\/li>\n<\/ul>\n<p>We include a list of all the available functions at the end of the post; the following code shows a few example queries and results:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_detect_sentiment('I am very happy', 'en') as sentiment\nsentiment\nPOSITIVE\n\nSELECT f_detect_pii_entities('I am Bob, I live in Herndon VA, and I love cars', 'en') as pii\npii\n[[\"NAME\",\"Bob\"],[\"ADDRESS\",\"Herndon VA\"]]\n\nSELECT f_redact_pii_entities('I am Bob, I live in Herndon VA, and I love cars', 'en', 'NAME,ADDRESS') as pii_redacted\npii_redacted\nI am [NAME], I live in [ADDRESS], and I love cars\n\nSELECT f_translate_text('It is a beautiful day in the neighborhood', 'auto', 'fr', 'null') as translated_text\ntranslated_text\nC'est une belle journ\u00e9e dans le quartier<\/code><\/pre>\n<\/p><\/div>\n<h2>Prerequisites<\/h2>\n<p>If you\u2019re new to Amazon Redshift, review the <a href=\"https:\/\/docs.aws.amazon.com\/redshift\/latest\/gsg\/getting-started.html\" target=\"_blank\" rel=\"noopener noreferrer\">Getting Started<\/a> guide to set up your cluster and SQL client.<\/p>\n<h2>Install the text analytics UDF<\/h2>\n<p>An Amazon Redshift UDF uses Lambda to implement the function capability. I discuss more details later in this post, but you don\u2019t need to understand the inner workings to use the text analytics UDF, so let\u2019s get started.<\/p>\n<p>Install the prebuilt Lambda function with the following steps:<\/p>\n<ol>\n<li>Navigate to the <a href=\"https:\/\/console.aws.amazon.com\/lambda\/home?region=us-east-1#\/create\/app?applicationId=arn:aws:serverlessrepo:us-east-1:777566285978:applications\/RedshiftTextAnalyticsUDF\" target=\"_blank\" rel=\"noopener noreferrer\">RedshiftTextAnalyticsUDF<\/a> application in the AWS Serverless Application Repository.<\/li>\n<li>In the <strong>Application settings<\/strong> section, keep the settings at their defaults.<\/li>\n<li>Select <strong>I acknowledge that this app creates custom IAM roles<\/strong>.<\/li>\n<li>Choose <strong>Deploy<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image002.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28646\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image002.png\" alt=\"\" width=\"407\" height=\"422\"><\/a><\/li>\n<li>When the application has deployed, choose the application <strong>Deployments<\/strong> tab and then <strong>CloudFormation stack<\/strong>.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image003.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28647\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image003.png\" alt=\"\" width=\"455\" height=\"216\"><\/a><\/li>\n<li>Choose the stack <strong>Outputs <\/strong>tab.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image004.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28648\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image004.png\" alt=\"\" width=\"527\" height=\"243\"><\/a><\/li>\n<li>Select the ARN that is shown as the value of the output labeled <code>RedshiftLambdaInvokeRole<\/code> and copy to the clipboard.<\/li>\n<li>On the Amazon Redshift console, in the navigation menu, choose <strong>CLUSTERS<\/strong>, then choose the name of the cluster that you want to update.<\/li>\n<li>For <strong>Actions<\/strong>, choose <strong>Manage IAM roles<\/strong>.<\/li>\n<li>Choose <strong>Enter ARN<\/strong> and enter the ARN for the role that you copied earlier.<\/li>\n<li>Choose <strong>Associate IAM<\/strong> <strong>role<\/strong> to add it to the list of <strong>Attached IAM roles<\/strong>.<\/li>\n<li>Choose <strong>Save changes<\/strong> to associate the IAM role with the cluster.<\/li>\n<li>Select the SQL code that is shown as the value of the output labeled <code>SQLScriptExternalFunction<\/code> and copy to the clipboard.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image005.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28649\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image005.png\" alt=\"\" width=\"530\" height=\"368\"><\/a><\/li>\n<li>Paste this SQL into your SQL client, and run it on your Amazon Redshift database as an admin user.<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image006.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28650\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image006.png\" alt=\"\" width=\"481\" height=\"286\"><\/a><\/li>\n<\/ol>\n<p>And that\u2019s it! Now you have a suite of new Lambda backed text analytics functions. You\u2019re ready to try some text analytics queries in Amazon Redshift.<\/p>\n<p>If you prefer to build and deploy from the source code instead, see the directions in the <a href=\"https:\/\/github.com\/aws-samples\/aws-redshift-udfs-textanalytics\/blob\/main\/README.md#build-and-install-udf-from-source\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repository README<\/a>.<\/p>\n<h2>Run your first text analytics query<\/h2>\n<p>Enter the following query into the SQL editor:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_detect_sentiment('I am very happy', 'en') as sentiment<\/code><\/pre>\n<\/p><\/div>\n<p>You get a simple <code>POSITIVE<\/code> result. Now try again, varying the input text\u2014try something less positive to see how the returned sentiment value changes.<\/p>\n<p>To get the sentiment along with confidence scores for each potential sentiment value, use the following query instead:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_detect_sentiment_all('I am very happy', 'en') as sentiment<\/code><\/pre>\n<\/p><\/div>\n<p>Now you get a JSON string containing the sentiment and all the sentiment scores:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-json\">{\"sentiment\":\"POSITIVE\",\"sentimentScore\":{\"positive\":0.999519,\"negative\":7.407639E-5,\"neutral\":2.7478999E-4,\"mixed\":1.3210243E-4}}<\/code><\/pre>\n<\/p><\/div>\n<p>You can use the built-in support in Amazon Redshift for semi-structured data on this result to extract the fields for further analysis. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/super-overview.html\" target=\"_blank\" rel=\"noopener noreferrer\">Ingesting and querying semistructured data in Amazon Redshift<\/a>. I show you examples later in this post.<\/p>\n<h2>How the UDF works<\/h2>\n<p>For more information about the Amazon Redshift UDF framework, see <a href=\"https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/udf-creating-a-lambda-sql-udf.html\" target=\"_blank\" rel=\"noopener noreferrer\">Creating a scalar Lambda UDF<\/a>.<\/p>\n<p>The Java class <a href=\"https:\/\/github.com\/aws-samples\/aws-redshift-udfs-textanalytics\/blob\/main\/src\/main\/java\/com\/amazonaws\/redshift\/udf\/textanalytics\/TextAnalyticsUDFHandler.java\" target=\"_blank\" rel=\"noopener noreferrer\">TextAnalyticsUDFHandler<\/a> implements the core logic for each of our UDF Lambda function handlers. Each text analytics function has a corresponding public method in this class.<\/p>\n<p>Amazon Redshift invokes our UDF Lambda function with batches of input records. The <code>TextAnalyticsUDFHandler<\/code> subdivides these batches into smaller batches of up to 25 rows to take advantage of the Amazon Comprehend synchronous multi-document batch APIs where they are available (for example, for detecting language, entities, and sentiment). When no synchronous multi-document API is available (such as for <code>DetectPiiEntity<\/code> and <code>TranslateText<\/code>), we use the single-document API instead.<\/p>\n<p>Amazon Comprehend API <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/guidelines-and-limits.html\" target=\"_blank\" rel=\"noopener noreferrer\">service quotas<\/a> provide guardrails to limit your cost exposure from unintentional high usage (we discuss this more in the following section). By default, the multi-document batch APIs process up to 250 records per second, and the single-document APIs process up to 20 records per second. Our UDFs use exponential backoff and retry to throttle the request rate to stay within these limits. You can request increases to the transactions per second quota for APIs using the <a href=\"https:\/\/console.aws.amazon.com\/servicequotas\/home?region=us-east-1#!\/template\" target=\"_blank\" rel=\"noopener noreferrer\">Quota Request Template<\/a> on the <a href=\"http:\/\/aws.amazon.com\/console\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Management Console<\/a>.<\/p>\n<p>Amazon Comprehend and Amazon Translate each enforce a maximum input string length of 5,000 utf-8 bytes. Text fields that are longer than 5,000 utf-8 bytes are truncated to 5,000 bytes for language and sentiment detection, and split on sentence boundaries into multiple text blocks of under 5,000 bytes for translation and entity or PII detection and redaction. The results are then combined.<\/p>\n<h2>Optimize cost<\/h2>\n<p>In addition to Amazon Redshift costs, the text analytics UDFs incur usage costs from Lambda, Amazon Comprehend, and Amazon Translate. The amount you pay is a factor of the total number of records and characters that you process with the UDFs. For more information, see <a href=\"https:\/\/aws.amazon.com\/lambda\/pricing\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Lambda pricing<\/a>, <a href=\"https:\/\/aws.amazon.com\/comprehend\/pricing\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Comprehend pricing<\/a>, and <a href=\"https:\/\/aws.amazon.com\/translate\/pricing\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Translate pricing<\/a>.<\/p>\n<p>To minimize the costs, I recommend that you avoid processing the same records multiple times. Instead, materialize the results of the text analytics UDF in a table that you can then cost-effectively query as often as needed without incurring additional UDF charges. Process newly arriving records incrementally using INSERT INTO\u2026SELECT queries to analyze and enrich only the new records and add them to the target table.<\/p>\n<p>Avoid calling the text analytics functions needlessly on records that you will subsequently discard. Write your queries to filter the dataset first using temporary tables, views, or nested queries, and then apply the text analytics functions to the resulting filtered records.<\/p>\n<p>Always assess the potential cost before you run text analytics queries on tables with vary large numbers of records.<\/p>\n<p>In this section, we provide two example cost assessments.<\/p>\n<h3>Example 1: Analyze the language and sentiment of tweets<\/h3>\n<p>Let\u2019s assume you have 10,000 tweet records, with average length 100 characters per tweet. Your SQL query detects the dominant language and sentiment for each tweet. You\u2019re in your second year of service (the Free Tier no longer applies). The cost details are as follows:<\/p>\n<ul>\n<li>Size of each tweet = 100 characters<\/li>\n<li>Number of units (100 character) per record (minimum is 3 units) = 3<\/li>\n<li>Total units = 10,000 (records) x 3 (units per record) x 2 (Amazon Comprehend requests per record) = 60,000<\/li>\n<li>Price per unit = $0.0001<\/li>\n<li>Total cost for Amazon Comprehend = [number of units] x [cost per unit] = 60,000 x $0.0001 = $6.00<\/li>\n<\/ul>\n<h3>Example 2: Translate tweets<\/h3>\n<p>Let\u2019s assume that 2,000 of your tweets aren\u2019t in your local language, so you run a second SQL query to translate them. The cost details are as follows:<\/p>\n<ul>\n<li>Size of each tweet = 100 characters<\/li>\n<li>Total characters = 2,000 (records) * 100 (characters per record) x 1 (Translate requests per record) = 200,000<\/li>\n<li>Price per character = $0.000015<\/li>\n<li>Total cost for Amazon Translate = [number of characters] x [cost per character] = 200,000 x $0.000015 = $3.00<\/li>\n<\/ul>\n<h2>Analyze insights from customer reviews<\/h2>\n<p>It\u2019s time to put our new text analytics queries to use.<\/p>\n<p>For a tutorial on using Amazon Comprehend to get actionable insights from customer reviews, see <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/tutorial-reviews.html\" target=\"_blank\" rel=\"noopener noreferrer\">Tutorial: Analyzing Insights from Customer Reviews with Amazon Comprehend<\/a>. In this post, I provide an alternate approach using SQL queries powered by Amazon Redshift and Amazon Comprehend.<\/p>\n<p>The tutorial takes approximately 15 minutes to complete, and costs up to $1.40 for Amazon Comprehend and Amazon Translate\u2014there is no cost if you\u2019re eligible for the Free Tier.<\/p>\n<h3>Configure Amazon Redshift Spectrum and create external schema<\/h3>\n<p>In this section, you create an <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) role, associate the role with your cluster, and create an external schema. Skip this section if you have previously configured <a href=\"https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/c-getting-started-using-spectrum.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Redshift Spectrum<\/a> on your Amazon Redshift cluster.<\/p>\n<ol>\n<li>On the IAM console, in the navigation pane, choose <strong>Roles.<\/strong><\/li>\n<li>Choose <strong>Create role<\/strong>.<\/li>\n<li>Choose <strong>AWS service<\/strong>, then choose <strong>Redshift<\/strong>.<\/li>\n<li>Under <strong>Select your use case<\/strong>, choose <strong>Redshift \u2013 Customizable<\/strong>, then choose <strong>Next: Permissions<\/strong>.<\/li>\n<li>On the <strong>Attach permissions<\/strong> policy page, choose the policies <code>AmazonS3ReadOnlyAccess<\/code>, <code>AWSGlueConsoleFullAccess<\/code>, and <code>AmazonAthenaFullAccess<\/code>.<\/li>\n<li>Choose <strong>Next: Review<\/strong>.<\/li>\n<li>For <strong>Role name<\/strong>, enter a name for your role, for example <code>mySpectrumRole<\/code>.<\/li>\n<li>Review the information, then choose <strong>Create role<\/strong>.<\/li>\n<li>In the navigation pane, choose <strong>Roles<\/strong>.<\/li>\n<li>Choose the name of your new role to view the summary, then copy the <strong>Role ARN<\/strong> to your clipboard.<\/li>\n<\/ol>\n<p>This value is the Amazon Resource Name (ARN) for the role that you just created. You use that value when you create external tables to reference your data files on Amazon S3.<\/p>\n<ol start=\"11\">\n<li>On the Amazon Redshift console, in the navigation menu, choose <strong>CLUSTERS<\/strong>, then choose the name of the cluster that you want to update.<\/li>\n<li>For <strong>Actions<\/strong>, choose <strong>Manage IAM roles<\/strong>.<\/li>\n<\/ol>\n<p>The <strong>IAM roles<\/strong> page appears.<\/p>\n<ol start=\"13\">\n<li>Choose <strong>Enter ARN<\/strong> and enter the ARN for the role that you copied earlier.<\/li>\n<li>Choose <strong>Add IAM<\/strong> role to add it to the list of <strong>Attached IAM roles<\/strong>.<\/li>\n<li>Choose <strong>Done<\/strong> to associate the IAM role with the cluster.<\/li>\n<\/ol>\n<p>The cluster is modified to complete the change.<\/p>\n<ol start=\"16\">\n<li>To create an external schema called spectrum, replace the IAM role ARN in the following command with the role ARN you created. Then run the following SQL statement on your Amazon Redshift cluster using your SQL client:<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">            create external schema spectrum \n            from data catalog \n            database 'spectrum' \n            iam_role 'arn:aws:iam::NNNNNNNNNNN:role\/mySpectrumRole'\n            create external database if not exists;\n<\/code><\/pre>\n<\/p><\/div>\n<h3>Configure Redshift Spectrum access to the Amazon product reviews dataset<\/h3>\n<p>We use the <a href=\"https:\/\/s3.amazonaws.com\/amazon-reviews-pds\/readme.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Customer Reviews Dataset<\/a>, conveniently hosted for public access in Amazon S3.<\/p>\n<ol>\n<li>Create an external table by running the following SQL statement on your Amazon Redshift cluster:<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">        CREATE EXTERNAL TABLE spectrum.amazon_reviews_parquet(\n          marketplace VARCHAR, \n          customer_id VARCHAR, \n          review_id VARCHAR, \n          product_id VARCHAR, \n          product_parent VARCHAR, \n          product_title VARCHAR, \n          star_rating int, \n          helpful_votes int, \n          total_votes int, \n          vine VARCHAR, \n          verified_purchase VARCHAR, \n          review_headline VARCHAR(max), \n          review_body VARCHAR(max), \n          review_date bigint, \n          year int)\n        PARTITIONED BY (product_category VARCHAR)\n        ROW FORMAT SERDE \n          'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' \n        STORED AS INPUTFORMAT \n          'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' \n        OUTPUTFORMAT \n          'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'\n        LOCATION\n          's3:\/\/amazon-reviews-pds\/parquet\/'\n<\/code><\/pre>\n<\/p><\/div>\n<ol start=\"2\">\n<li>Load a table partition for one <code>product_category<\/code>:<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">ALTER TABLE spectrum.amazon_reviews_parquet\nADD IF NOT EXISTS PARTITION(product_category='Video_DVD')\nLOCATION 's3:\/\/amazon-reviews-pds\/parquet\/product_category=Video_DVD';<\/code><\/pre>\n<\/p><\/div>\n<ol start=\"3\">\n<li>In your Amazon Redshift SQL client, run the following query to copy video and DVD reviews from the UK in the year 2000 (628 reviews) to an Amazon Redshift internal table:<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">CREATE TABLE amazon_reviews_enriched AS\nSELECT *\nFROM spectrum.amazon_reviews_parquet\nWHERE marketplace = 'UK' and year = 2000<\/code><\/pre>\n<\/p><\/div>\n<ol start=\"4\">\n<li>Run the following query to assess the average review length:<\/li>\n<\/ol>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT AVG(LENGTH(review_body)) AS average_review_length FROM amazon_reviews_enriched<\/code><\/pre>\n<\/p><\/div>\n<p>The average review length is around 627 characters. This equates to 7 Amazon Comprehend units per record (1 unit = 100 characters).<\/p>\n<h3>Detect the language for each review<\/h3>\n<p>To detect the language of each review, run the following query in the Amazon Redshift query editor\u2014it takes about 10 seconds to run and costs $0.40:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">    ALTER TABLE amazon_reviews_enriched ADD COLUMN language VARCHAR(8);\n\n    UPDATE amazon_reviews_enriched \n    SET language = f_detect_dominant_language(review_body);<\/code><\/pre>\n<\/p><\/div>\n<p>The first query creates a new column, <code>language<\/code>. The second query populates it with the results of the new UDF, <code>f_detect_dominant_language()<\/code>.<\/p>\n<p>Cost is calculated as: 628 (records) x 7 (units per record) x 1 (requests per record) x $0.0001 (Amazon Comprehend price per unit) = $0.44.<\/p>\n<p>Run the following query to see the detected language codes, with the corresponding count of reviews for each language:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT language, count(*) AS count FROM amazon_reviews_enriched GROUP BY language ORDER BY count DESC<\/code><\/pre>\n<\/p><\/div>\n<p>Seven of the reviews have been written in German (<code>de<\/code>).<\/p>\n<h3>Translate all reviews into one language<\/h3>\n<p>Our analysis will be easier if the reviews are all normalized into a common language. Run the following SQL to create and populate a new column with the English version of all reviews. It takes around 7 seconds to run, and costs $0.07.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">ALTER TABLE amazon_reviews_enriched ADD COLUMN review_body_en VARCHAR(max);   \nUPDATE amazon_reviews_enriched \nSET review_body_en = f_translate_text(review_body, language, 'en', 'null');<\/code><\/pre>\n<\/p><\/div>\n<p>The first statement creates a new column, <code>review_body_en<\/code>. The second statement populates it with the results of the new UDF, <code>f_translate_text()<\/code>.<\/p>\n<p>Cost is calculated as: 7 (non-English records) x 627 (characters per record) x 1 (requests per record) x $0.000015 (Amazon Translate price per character) = $0.07.<\/p>\n<p>Run the following query to see a few of the reviews translated from the original language to English:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT language, review_body, review_body_en FROM amazon_reviews_enriched\nWHERE language &lt;&gt; 'en'\nLIMIT 5<\/code><\/pre>\n<\/p><\/div>\n<h3><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image007.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28651\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image007.png\" alt=\"\" width=\"634\" height=\"79\"><\/a><\/h3>\n<h3>Detect sentiment and entities for each review<\/h3>\n<p>To detect sentiment, run the following SQL statements\u2014they use two text analytics functions, take around 25 seconds to run, and cost $0.88:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">ALTER TABLE amazon_reviews_enriched ADD COLUMN sentiment SUPER;\n\nALTER TABLE amazon_reviews_enriched ADD COLUMN entities SUPER;\n\nUPDATE amazon_reviews_enriched \nSET  sentiment = JSON_PARSE(f_detect_sentiment_all(review_body_en, 'en')), \n     entities = JSON_PARSE(f_detect_entities_all(review_body_en, 'en'));<\/code><\/pre>\n<\/p><\/div>\n<p>We add two additional columns, <code>sentiment<\/code> and <code>entities<\/code>, each using the Amazon Redshift semi-structured data type SUPER. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/super-overview.html\" target=\"_blank\" rel=\"noopener noreferrer\">Ingesting and querying semistructured data in Amazon Redshift<\/a>.<\/p>\n<p>The UPDATE query passes the English translation of each review to the new UDF functions <code>f_detect_sentiment_all()<\/code> and <code>f_detect_entities_all()<\/code>. These functions return JSON strings, which the query parses and stores in the new columns.<\/p>\n<p>Cost is calculated as: 628 (records) x 7 (units per record) x 2 (requests per record) x $0.0001 (Amazon Comprehend price per unit) = $0.88.<\/p>\n<p>Inspect some of the values for the new <code>sentiment<\/code> and <code>entities<\/code> columns:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT sentiment, entities FROM amazon_reviews_enriched LIMIT 5<\/code><\/pre>\n<\/p><\/div>\n<p>As expected, they contain nested structures and fields containing the results from Amazon Comprehend.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image008.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28652\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image008.png\" alt=\"\" width=\"637\" height=\"82\"><\/a><\/p>\n<p>Next, let\u2019s use the support in Amazon Redshift for semi-structured data to prepare these columns for analysis.<\/p>\n<h3>Prepare sentiment for analysis<\/h3>\n<p>Run the following SQL query to create a new table containing sentiment and sentiment scores expanded into separate columns:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SET enable_case_sensitive_identifier to TRUE;\n\nCREATE TABLE sentiment_results_final AS\nSELECT \n   review_date, year, product_title, star_rating, language, \n   sentiment.\"sentiment\" AS sentiment,\n   sentiment.\"sentimentScore\".\"positive\" AS positive_score,\n   sentiment.\"sentimentScore\".\"negative\" AS negative_score,\n   sentiment.\"sentimentScore\".\"neutral\" AS neutral_score,\n   sentiment.\"sentimentScore\".\"mixed\" AS mixed_score,\n   review_headline, review_body_en\nFROM amazon_reviews_enriched<\/code><\/pre>\n<\/p><\/div>\n<p>Preview the new <code>sentiment_results_final<\/code> table. Does the sentiment generally align with the text of the <code>review_body<\/code> field? How does it correlate with the <code>star_rating<\/code>? If you spot any dubious sentiment assignments, check the confidence scores to see if the sentiment was assigned with a low confidence.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT * FROM sentiment_results_final WHERE star_rating &lt;= 2 LIMIT 10<\/code><\/pre>\n<\/p><\/div>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image009.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28653\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image009.png\" alt=\"\" width=\"636\" height=\"138\"><\/a><\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT * FROM sentiment_results_final WHERE star_rating &gt;= 4 LIMIT 10<\/code><\/pre>\n<\/p><\/div>\n<h3><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image010.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28654\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image010.png\" alt=\"\" width=\"637\" height=\"140\"><\/a><\/h3>\n<h3>Prepare entities for analysis<\/h3>\n<p>Run the following Amazon Redshift SQL query to create a new table containing detected entities unnested into separate rows, with each field in a separate column:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">CREATE TABLE entities_results_final AS\nSELECT \n   r.review_date, r.year, r.product_title, r.star_rating, r.language, \n   e.\"text\" AS entity,\n   e.\"type\" category,\n   e.\"score\" AS score,\n   e.\"beginOffset\" AS beginoffset,\n   e.\"endOffset\" AS endoffset,\n   r.review_headline, r.review_body_en\nFROM amazon_reviews_enriched r, r.entities e<\/code><\/pre>\n<\/p><\/div>\n<p>Preview the contents of the new table, <code>entities_results_final<\/code>:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT product_title, entity, category, score, beginoffset, endoffset, review_body_en \nFROM entities_results_final ORDER BY product_title, beginoffset \nLIMIT 20<\/code><\/pre>\n<\/p><\/div>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image011.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28655\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image011.png\" alt=\"\" width=\"637\" height=\"143\"><\/a><\/p>\n<h3>Visualize in Amazon QuickSight (optional)<\/h3>\n<p>As an optional step, you can visualize your results with <a href=\"https:\/\/aws.amazon.com\/quicksight\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon QuickSight<\/a>. For instructions, see <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/tutorial-reviews-visualize.html\" target=\"_blank\" rel=\"noopener noreferrer\">Step 5: Visualizing Amazon Comprehend Output in Amazon QuickSight<\/a>.<\/p>\n<p>You can use the new word cloud visual type for entities, instead of tree map. In the word cloud chart menu, select <strong>Hide \u201cother\u201d categories<\/strong>.<\/p>\n<p>You now have a dashboard with sentiment and entities visualizations that looks similar to the following screenshot.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image012.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-28656\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/09\/28\/ML-5741-image012.png\" alt=\"\" width=\"1153\" height=\"402\"><\/a><\/p>\n<h2>Troubleshooting<\/h2>\n<p>If your query fails, check the <a href=\"http:\/\/aws.amazon.com\/cloudwatch\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon CloudWatch<\/a> metrics and logs generated by the UDF Lambda function.<\/p>\n<ol>\n<li>On the Lambda console, find the desired <code><a href=\"https:\/\/console.aws.amazon.com\/lambda\/home?region=us-east-1#\/functions\/RedshiftTextAnalyticsUDF-DetectSentimentAll\" target=\"_blank\" rel=\"noopener noreferrer\">RedshiftTextAnalyticsUDF<\/a><\/code> function.<\/li>\n<li>Choose <strong>Monitoring<\/strong>.<\/li>\n<\/ol>\n<p>You can view the CloudWatch metrics showing how often the function ran, how long it ran for, how often it failed, and more.<\/p>\n<ol start=\"3\">\n<li>Choose <strong>View logs in CloudWatch<\/strong> to open the function log streams for additional troubleshooting insights.<\/li>\n<\/ol>\n<p>For more information about viewing CloudWatch metrics via Lambda, see <a href=\"https:\/\/docs.aws.amazon.com\/lambda\/latest\/dg\/monitoring-cloudwatchlogs.html#monitoring-cloudwatchlogs-console\" target=\"_blank\" rel=\"noopener noreferrer\">Using the Lambda console<\/a>.<\/p>\n<h2>Additional use cases<\/h2>\n<p>There are many use cases for Amazon Redshift SQL text analytics functions. In addition to the example shown in this post, consider the following:<\/p>\n<ul>\n<li>Prepare research-ready datasets by redacting PII from customer or patient interactions<\/li>\n<li>Simplify extract, transform, and load (ETL) pipelines by using incremental SQL queries to enrich text data with sentiment and entities, such as streaming social media streams ingested by <a href=\"https:\/\/aws.amazon.com\/kinesis\/data-firehose\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Kinesis Data Firehose<\/a><\/li>\n<li>Use SQL queries to explore sentiment and entities in your customer support texts, emails, and support cases<\/li>\n<li>Standardize many languages to a single common language<\/li>\n<\/ul>\n<p>You may have additional use cases for these functions, or additional capabilities you want to see added, such as the following:<\/p>\n<ul>\n<li>SQL functions to call custom entity recognition and custom classification models in Amazon Comprehend<\/li>\n<li>SQL functions for de-identification\u2014extending the entity and PII redaction functions to replace entities with alternate unique identifiers<\/li>\n<\/ul>\n<p>The implementation is open source, which means that you can clone the repo, modify and extend the functions as you see fit, and (hopefully) send us pull requests so we can merge your improvements back into the project and make it better for everyone.<\/p>\n<h2>Clean up<\/h2>\n<p>After you complete this tutorial, you might want to clean up any AWS resources you no longer want to use. Active AWS resources can continue to incur charges in your account.<\/p>\n<ol>\n<li>On the Amazon Redshift console, run the following statements to drop the database and all the tables:\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">DROP TABLE entities_results_final;\nDROP TABLE sentiment_results_final;\nDROP TABLE amazon_reviews_enriched;\nDROP TABLE spectrum.amazon_reviews_parquet;<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>On the AWS CloudFormation console, delete the stack <code>serverlessrepo-RedshiftTextAnalyticsUDF<\/code>.<\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/quicksight\/latest\/user\/closing-account.html\" target=\"_blank\" rel=\"noopener noreferrer\">Cancel your QuickSight subscription<\/a>.<\/li>\n<\/ol>\n<h2>Conclusion<\/h2>\n<p>I have shown you how to install the sample text analytics UDF Lambda function for Amazon Redshift, so that you can use simple SQL queries to translate text using Amazon Translate, generate insights from text using Amazon Comprehend, and redact sensitive information. I hope you find this useful, and share examples of how you can use it to simplify your architectures and implement new capabilities for your business.<\/p>\n<p>The SQL functions described here are also available for <a href=\"http:\/\/aws.amazon.com\/athena\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Athena<\/a>. For more information, see <a href=\"http:\/\/www.amazon.com\/textanalyticsudf\" target=\"_blank\" rel=\"noopener noreferrer\">Translate, redact, and analyze text using SQL functions with Amazon Athena, Amazon Translate, and Amazon Comprehend<\/a>.<\/p>\n<p>Please share your thoughts with us in the comments section, or in the issues section of the project\u2019s <a href=\"https:\/\/github.com\/aws-samples\/aws-redshift-udfs-textanalytics\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repository<\/a>.<\/p>\n<h2>Appendix: Available function reference<\/h2>\n<p>This section summarizes the functions currently provided. The <a href=\"https:\/\/serverlessrepo.aws.amazon.com\/applications\/us-east-1\/777566285978\/RedshiftTextAnalyticsUDF\" target=\"_blank\" rel=\"noopener noreferrer\">README<\/a> file provides additional details.<\/p>\n<h3>Detect language<\/h3>\n<p>This function uses the Amazon Comprehend <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/how-languages.html\" target=\"_blank\" rel=\"noopener noreferrer\">BatchDetectDominantLanguage<\/a> API to identify the dominant language based on the first 5,000 bytes of input text.<\/p>\n<p>The following code returns a language code, such as <code>fr<\/code> for French or <code>en<\/code> for English:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_detect_dominant_language('il fait beau \u00e0 Orlando') as language<\/code><\/pre>\n<\/p><\/div>\n<p>The following code returns a JSON formatted array of language codes and corresponding confidence scores:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_detect_dominant_language_all('il fait beau \u00e0 Orlando') as language_all<\/code><\/pre>\n<\/p><\/div>\n<h3>Detect sentiment<\/h3>\n<p>This function uses the Amazon Comprehend <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/how-sentiment.html\" target=\"_blank\" rel=\"noopener noreferrer\">BatchDetectSentiment<\/a> API to identify the sentiment based on the first 5,000 bytes of input text.<\/p>\n<p>The following code returns a sentiment as POSITIVE, NEGATIVE, NEUTRAL, or MIXED:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_detect_sentiment('Joe is very happy', 'en') as sentiment<\/code><\/pre>\n<\/p><\/div>\n<p>The following code returns a JSON formatted object containing detected sentiment and confidence scores for each sentiment value:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_detect_sentiment_all('Joe is very happy', 'en') as sentiment_all<\/code><\/pre>\n<\/p><\/div>\n<h3>Detect entities<\/h3>\n<p>This function uses the Amazon Comprehend <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/how-entities.html\" target=\"_blank\" rel=\"noopener noreferrer\">DetectEntities<\/a> API to identify PII. Input text longer than 5,000 bytes results in multiple Amazon Comprehend API calls.<\/p>\n<p>The following code returns a JSON formatted object containing an array of <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/how-entities.html\" target=\"_blank\" rel=\"noopener noreferrer\">entity types<\/a> and values:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_detect_entities('His name is Joe, he lives in Richmond VA, he bought an Amazon Echo Show on January 5th, and he loves it', 'en') as entities<\/code><\/pre>\n<\/p><\/div>\n<p>The following code returns a JSON formatted object containing an array of <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/how-pii.html\" target=\"_blank\" rel=\"noopener noreferrer\">PII entity types<\/a>, with their values, scores, and character offsets:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_detect_entities_all('His name is Joe, he lives in Richmond VA, he bought an Amazon Echo Show on January 5th, and he loves it', 'en') as entities_all<\/code><\/pre>\n<\/p><\/div>\n<h3>Redact entities<\/h3>\n<p>This function replaces entity values for the specified entity types with <code>\u201c[ENTITY_TYPE]\u201d<\/code>. Input text longer than 5,000 bytes results in multiple Amazon Comprehend API calls. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_redact_entities('His name is Joe, he lives in Richmond VA, he bought an Amazon Echo Show on January 5th, and he loves it', 'en', 'ALL') as entities_redacted<\/code><\/pre>\n<\/p><\/div>\n<p>The command returns a redacted version on the input string. Specify one or more entity types to redact by providing a comma-separated list of valid types in the <code>types<\/code> string parameter, or <code>ALL<\/code> to redact all types.<\/p>\n<h3>Detect PII<\/h3>\n<p>This function uses the <a href=\"https:\/\/docs.aws.amazon.com\/comprehend\/latest\/dg\/how-pii.html\" target=\"_blank\" rel=\"noopener noreferrer\">DetectPiiEntities<\/a> API to identify PII. Input text longer than 5,000 bytes results in multiple Amazon Comprehend API calls.<\/p>\n<p>The following code returns a JSON formatted object containing an array of PII entity types and values:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_detect_pii_entities('His name is Joe, his username is joe123 and he lives in Richmond VA', 'en') as pii<\/code><\/pre>\n<\/p><\/div>\n<p>The following code returns a JSON formatted object containing an array of PII entity types, with their scores and character offsets:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_detect_pii_entities_all('His name is Joe, his username is joe123 and he lives in Richmond VA', 'en') as pii_all<\/code><\/pre>\n<\/p><\/div>\n<h3>Redact PII<\/h3>\n<p>This function replaces the PII values for the specified PII entity types with \u201c[PII_ENTITY_TYPE]\u201d. Input text longer than 5,000 bytes results in multiple Amazon Comprehend API calls. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_redact_pii_entities('His name is Joe, his username is joe123 and he lives in Richmond VA', 'en', 'ALL') as pii_redacted<\/code><\/pre>\n<\/p><\/div>\n<p>The function returns a redacted version on the input string. Specify one or more PII entity types to redact by providing a comma-separated list of valid types in the <code>type<\/code> string parameter, or <code>ALL<\/code> to redact all type.<\/p>\n<h3>Translate text<\/h3>\n<p>This function translates text from the source language to the target language. Input text longer than 5,000 bytes results in multiple Amazon Translate API calls. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-sql\">SELECT f_translate_text('It is a beautiful day in the neighborhood', 'auto', 'fr', 'null') as translated_text<\/code><\/pre>\n<\/p><\/div>\n<p>The function returns the translated string. Optionally, auto-detect the source language (use \u2018<code>auto<\/code>\u2018 as the language code, which uses Amazon Comprehend), and optionally specify a <a href=\"https:\/\/docs.aws.amazon.com\/translate\/latest\/dg\/how-custom-terminology.html\" target=\"_blank\" rel=\"noopener noreferrer\">custom terminology<\/a> (otherwise use \u2018<code>null<\/code>\u2018 for <code>customTerminologyName<\/code>).<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/02\/10\/Bob-Strahan-p.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-21654 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/02\/10\/Bob-Strahan-p.png\" alt=\"Bob Strahan\" width=\"100\" height=\"133\"><\/a>Bob Strahan<\/strong>\u00a0is a Principal Solutions Architect in the AWS Language AI Services team.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/translate-and-analyze-text-using-sql-functions-with-amazon-redshift-amazon-translate-and-amazon-comprehend\/<\/p>\n","protected":false},"author":0,"featured_media":988,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/987"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=987"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/987\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/988"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=987"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=987"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=987"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}