{"id":238,"date":"2020-09-17T05:26:14","date_gmt":"2020-09-17T05:26:14","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/09\/17\/automating-the-analysis-of-multi-speaker-audio-files-using-amazon-transcribe-and-amazon-athena\/"},"modified":"2020-09-17T05:26:14","modified_gmt":"2020-09-17T05:26:14","slug":"automating-the-analysis-of-multi-speaker-audio-files-using-amazon-transcribe-and-amazon-athena","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/09\/17\/automating-the-analysis-of-multi-speaker-audio-files-using-amazon-transcribe-and-amazon-athena\/","title":{"rendered":"Automating the analysis of multi-speaker audio files using Amazon Transcribe and Amazon Athena"},"content":{"rendered":"<div id=\"\">\n<p>In an effort to drive customer service improvements, many companies record the phone conversations between their customers and call center representatives. These call recordings are typically stored as audio files and processed to uncover insights such as customer sentiment, product or service issues, and agent effectiveness. To provide an accurate analysis of these audio files, the transcriptions need to clearly identify who spoke what and when.<\/p>\n<p>However, given the average customer service agent handles 30\u201350 calls a day, the sheer volume of audio files to analyze quickly becomes a challenge. Companies need a robust system for transcribing audio files in large batches to improve call center quality management. Similarly, legal investigations often need to efficiently analyze case-related audio files in search of potential evidence or insight that can help win legal cases. Also, in the healthcare sector, there is a growing need for this solution to help transcribe and analyze virtual patient-provider interactions.<\/p>\n<p><a href=\"https:\/\/aws.amazon.com\/transcribe\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Transcribe<\/a> is an automatic speech recognition (ASR) service that makes it easy to convert audio to text. One key feature of the service is called <em>speaker identification<\/em>, which you can use to label each individual speaker when transcribing multi-speaker audio files. You can specify Amazon Transcribe to identify 2\u201310 speakers in the audio clip. For the best results, define the correct number of speakers for the audio input.<\/p>\n<p>A contact center, which often records multi-channel audio, can also benefit from using a feature called <a href=\"https:\/\/docs.aws.amazon.com\/transcribe\/latest\/dg\/how-channel-id.html\" target=\"_blank\" rel=\"noopener noreferrer\"><em>channel identification<\/em><\/a>. The feature can separate each channel from within a single audio file and simultaneously transcribe each track. Typically, an agent and a caller are recorded on separate channels, which are merged into a single audio file. Contact center applications like <a href=\"https:\/\/aws.amazon.com\/connect\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Connect<\/a> record agent and customer conversations on different channels (for example, the agent\u2019s voice is captured in the left channel, and the customer\u2019s in the right for a two-channel stereo recording). Contact centers can submit the single audio file to Amazon Transcribe, which identifies the two channels and produces a coherent merged transcript with channel labels.<\/p>\n<p>In this post, we walk through a solution that analyzes audio files involving multiple speakers using Amazon Transcribe and <a href=\"https:\/\/aws.amazon.com\/athena\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Athena<\/a>, a serverless query service for big data. Combining these two services together, you can easily set up a serverless, pay-per-use solution for processing audio files into readable text and analyze the data using standard query language (SQL).<\/p>\n<h2>Solution overview<\/h2>\n<p>The following diagram illustrates the solution architecture.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone\" title=\"Solution architecture\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/16\/1-SolutionArchitecture-1.jpg\" alt=\"\" width=\"900\" height=\"871\"><\/p>\n<p><strong>The solution contains the following steps:<\/strong><\/p>\n<ol>\n<li>You upload the audio file to the <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) bucket AudioRawBucket.<\/li>\n<li>The Amazon S3 PUT event triggers the <a href=\"http:\/\/aws.amazon.com\/lambda\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Lambda<\/a> function LambdaFunction1.<\/li>\n<li>The function invokes an asynchronous Amazon Transcribe API call on the uploaded audio file.<\/li>\n<li>The function also writes a message into <a href=\"http:\/\/aws.amazon.com\/sqs\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Queue Service<\/a> (Amazon SQS) with the transcription job information.<\/li>\n<li>The transcription job runs and writes the output in JSON format to the target S3 bucket, AudioPrcsdBucket.<\/li>\n<li>An <a href=\"http:\/\/aws.amazon.com\/cloudwatch\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon CloudWatch Events<\/a> rule triggers the function(LambdaFunction2) to run for every 2 minutes interval.<\/li>\n<li>The function LambdaFunction2 reads the SQS queue for transcription jobs, checks for job completion, converts the JSON file to CSV, and loads an Athena table with the audio text data.<\/li>\n<li>You can access the processed audio file transcription from the AudioPrcsdBucket.<\/li>\n<li>You also query the data with Amazon Athena.<\/li>\n<\/ol>\n<h2>Prerequisites<\/h2>\n<p>To get started, you need the following:<\/p>\n<ul>\n<li>A valid AWS account with access to AWS services<\/li>\n<li>The Athena database \u201cdefault\u201d in an AWS account in us-east-1<\/li>\n<li>A multi-speaker audio file\u2014for this post, we use <a href=\"https:\/\/aws-ml-blog.s3.amazonaws.com\/artifacts\/transcribe_audio_processing\/medical-diarization.wav\" target=\"_blank\" rel=\"noopener noreferrer\">medical-diarization.wav<\/a>\n<\/li>\n<\/ul>\n<p>To achieve the best results, we recommend the following:<\/p>\n<ul>\n<li>Use a lossless format, such as WAV or FLAC, with PCM 16-bit encoding<\/li>\n<li>Use a sample rate of 8000 Hz for low-fidelity audio and 16000 Hz for high-fidelity audio<\/li>\n<\/ul>\n<h2>Deploying the solution<\/h2>\n<p>You can use the provided <a href=\"http:\/\/aws.amazon.com\/cloudformation\" target=\"_blank\" rel=\"noopener noreferrer\">AWS CloudFormation<\/a> template to launch and configure all the resources for the solution.<\/p>\n<ol>\n<li>Choose <strong>Launch Stack<\/strong>:<\/li>\n<\/ol>\n<p><a href=\"https:\/\/console.aws.amazon.com\/cloudformation\/home?region=us-east-1#\/stacks\/new?stackName=Transcribe-Blog-Multi-Speaker&amp;templateURL=https:\/\/aws-ml-blog.s3.amazonaws.com\/artifacts\/transcribe_audio_processing\/AudioProcessing_deliverable_CF.yaml\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-15948\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/16\/2-LaunchStack.jpg\" alt=\"\" width=\"107\" height=\"20\"><\/a><\/p>\n<p>This takes you to the Create stack wizard on the AWS CloudFormation console. The template is launched in the US East (N. Virginia) Region by default.<\/p>\n<p>The CloudFormation templates used in this post are designed to work only in the us-east-1 Region. These templates are also not intended for production use without modification.<\/p>\n<ol start=\"2\">\n<li>On the Select Template page, keep the default URL for the CloudFormation template, and choose Next.<\/li>\n<li>On the Specify Details page, review and provide values for the required parameters in the template.\n         <\/li>\n<\/ol>\n<p>Dev is your environment, where you want to deploy the template. AWS CloudFormation uses this value for resources in Lambda, Amazon SQS, and other services.<\/p>\n<ol start=\"4\">\n<li>After you specify the template details, choose Next.<\/li>\n<li>On the Options page, choose Next again.<\/li>\n<li>On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names.<\/li>\n<li>Choose <strong>Create Stack<\/strong>\n<\/li>\n<\/ol>\n<p>It takes approximately 5\u201310 minutes for the deployment to complete. When the stack launch is complete, it returns outputs with information about the resources that were created.<\/p>\n<p>You can <a href=\"https:\/\/docs.aws.amazon.com\/AWSCloudFormation\/latest\/UserGuide\/cfn-console-view-stack-data-resources.html\" target=\"_blank\" rel=\"noopener noreferrer\">view the stack outputs<\/a> on the <a href=\"http:\/\/aws.amazon.com\/console\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Management Console<\/a> or by using the following <a href=\"http:\/\/aws.amazon.com\/cli\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Command Line Interface<\/a> (AWS CLI) command:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">aws cloudformation describe-stacks --stack-name &lt;stack-name&gt; --region us-east-1 --query \tStacks[0].Outputs<\/code><\/pre>\n<\/div>\n<h3><strong>Resources created by the CloudFormation stack<\/strong><\/h3>\n<ul>\n<li>\n<strong>AudioRawBucket<\/strong> \u2013 Stores raw audio files based on the PUT event Lambda function for Amazon Transcribe to run<\/li>\n<li>\n<strong>AudioPrcsdBucket<\/strong> \u2013 Stores the processed output<\/li>\n<li>\n<strong>LambdaRole1<\/strong> \u2013 The Lambda role with required permissions for S3 buckets, Amazon SQS, Amazon Transcribe, and CloudWatch<\/li>\n<li>\n<strong>LambdaFunction1<\/strong> \u2013 The initial function to run Amazon Transcribe to process the audio file, create a JSON file, and update Amazon SQS<\/li>\n<li>\n<strong>LambdaFunction2<\/strong> \u2013 The post function that reads the SQS queue, converts (aggregates) the JSON to CSV format, and loads it into an Athena table<\/li>\n<li>\n<strong>TaskAudioQueue<\/strong>\u2013 The SQS queue for storing all audio processing requests<\/li>\n<li>\n<strong>ScheduledRule<\/strong>\u2013 The CloudWatch schedule for LambdaFunction2<\/li>\n<li>\n<strong>AthenaNamedQuery<\/strong> \u2013 The Athena table definition for storing processed audio files transcriptions with object information<\/li>\n<\/ul>\n<p>The Athena table for the audio text has the following definitions:<\/p>\n<ul>\n<li>\n<strong>audio_transcribe_job<\/strong> \u2013 The job submitted to transcribe the audio<\/li>\n<li>\n<strong>time_start<\/strong> \u2013 The beginning timestamp for the speaker<\/li>\n<li>\n<strong>speaker<\/strong> \u2013 Speaker tags (for example, spk_0, spk-1, and so on)<\/li>\n<li>\n<strong>speaker_text<\/strong> \u2013 The text from the speaker audio<\/li>\n<\/ul>\n<h2>Validating the solution<\/h2>\n<p>You can now validate that the solution works.<\/p>\n<ol>\n<li>Verify the AWS CloudFormation resources were created (see previous section for instructions via the console or AWS CLI).<\/li>\n<li>Upload the <a href=\"https:\/\/aws-ml-blog.s3.amazonaws.com\/artifacts\/transcribe_audio_processing\/medical-diarization.wav\" target=\"_blank\" rel=\"noopener noreferrer\">sample audio file<\/a> to the S3 bucket AudioRawBucket.<\/li>\n<\/ol>\n<p>The transcription process is asynchronous, so it can take a few minutes for the job to complete. You can check the job status on the Amazon Transcribe console and CloudWatch console.<\/p>\n<p>When the transcription job is complete and Athena table <code>transcribe_data<\/code> created, you can run Athena queries to verify the transcription output. See the following select statement:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-sql\">select * from \"default\".\"transcribe_data\" order by 1,2<\/code><\/pre>\n<\/div>\n<p><strong>The following table shows the output for the above select statement.<\/strong><\/p>\n<table border=\"1px\" width=\"0\" cellpadding=\"5px\">\n<tbody>\n<tr>\n<td width=\"156\"><strong>audio_transcribe_job<\/strong><\/td>\n<td width=\"84\"><strong>time_start<\/strong><\/td>\n<td width=\"67\"><strong>speaker<\/strong><\/td>\n<td width=\"500\"><strong>speaker_text<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"156\">medical-diarization.wav<\/td>\n<td width=\"84\">0:00:01<\/td>\n<td width=\"67\">spk_0<\/td>\n<td width=\"500\">\u00a0Hey, Jane. So what brings you into my office today?<\/td>\n<\/tr>\n<tr>\n<td width=\"156\">medical-diarization.wav<\/td>\n<td width=\"84\">0:00:03<\/td>\n<td width=\"67\">spk_1<\/td>\n<td width=\"500\">\u00a0Hey, Dr Michaels. Good to see you. I\u2019m just coming in from a routine checkup.<\/td>\n<\/tr>\n<tr>\n<td width=\"156\">medical-diarization.wav<\/td>\n<td width=\"84\">0:00:07<\/td>\n<td width=\"67\">spk_0<\/td>\n<td width=\"500\">\u00a0All right, let\u2019s see, I last saw you. About what, Like a year ago. And at that time, I think you were having some minor headaches. I don\u2019t recall prescribing anything, and we said we\u2019d maintain some observations unless things were getting worse.<\/td>\n<\/tr>\n<tr>\n<td width=\"156\">medical-diarization.wav<\/td>\n<td width=\"84\">0:00:20<\/td>\n<td width=\"67\">spk_1<\/td>\n<td width=\"500\">\u00a0That\u2019s right. Actually, the headaches have gone away. I think getting more sleep with super helpful. I\u2019ve also been more careful about my water intake throughout my work day.<\/td>\n<\/tr>\n<tr>\n<td width=\"156\">medical-diarization.wav<\/td>\n<td width=\"84\">0:00:29<\/td>\n<td width=\"67\">spk_0<\/td>\n<td width=\"500\">\u00a0Yeah, I\u2019m not surprised at all. Sleep deprivation and chronic dehydration or to common contributors to potential headaches. Rest is definitely vital when you become dehydrated. Also, your brain tissue loses water, causing your brain to shrink and, you know, kind of pull away from the skull. And this contributor, the pain receptors around the brain, giving you the sensation of a headache. So how much water are you roughly taking in each day<\/td>\n<\/tr>\n<tr>\n<td width=\"156\">medical-diarization.wav<\/td>\n<td width=\"84\">0:00:52<\/td>\n<td width=\"67\">spk_1<\/td>\n<td width=\"500\">\u00a0of? I\u2019ve become obsessed with drinking enough water. I have one of those fancy water bottles that have graduated markers on the side. I\u2019ve also been logging my water intake pretty regularly on average. Drink about three litres a day.<\/td>\n<\/tr>\n<tr>\n<td width=\"156\">medical-diarization.wav<\/td>\n<td width=\"84\">0:01:06<\/td>\n<td width=\"67\">spk_0<\/td>\n<td width=\"500\">\u00a0That\u2019s excellent. Before I start the routine physical exam is there anything else you like me to know? Anything you like to share? What else has been bothering you?<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Cleaning up<\/h2>\n<p>To avoid incurring additional charges, complete the following steps to clean up your resources when you are done with the solution:<\/p>\n<ol>\n<li>Delete the Athena table <code>transcribe_data<\/code> from default<\/li>\n<li>Delete the prefixes and objects you created from the buckets <code>AudioRawBucket<\/code> and <code>AudioPrcsdBucket<\/code>.<\/li>\n<li>Delete the CloudFormation stack, which removes your additional resources.<\/li>\n<\/ol>\n<h2>Conclusion<\/h2>\n<p>In this post, we walked through the solution, reviewed sample implementation of audio file conversion using Amazon S3, Amazon Transcribe, Amazon SQS, Lambda, and Athena, and validated the steps for processing and analyzing multi-speaker audio files.<\/p>\n<p>You can further extend this solution to perform sentiment analytics and improve your customer experience. For more information, see <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/detect-sentiment-from-customer-reviews-using-amazon-comprehend\/\">Detect sentiment from customer reviews using Amazon Comprehend<\/a>. For more information about live call and post-call analytics, see <a href=\"https:\/\/aws.amazon.com\/blogs\/aws\/aws-announces-aws-contact-center-intelligence-solutions\/\">AWS announces AWS Contact Center Intelligence solutions<\/a>.<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<h3><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-15962 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/16\/Mahendar-Gajula.jpg\" alt=\"\" width=\"100\" height=\"132\"><\/h3>\n<p><strong>Mahendar Gajula<\/strong> is a Big Data Consultant at AWS. He works with AWS customers in their journey to the cloud with a focus on Big data, Data warehouse and AI\/ML projects. In his spare time, he enjoys playing tennis and spending time with his family.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-15963 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/16\/Rajaro-Viljjapu.jpg\" alt=\"\" width=\"100\" height=\"135\">Rajarao Vijjapu<\/strong> is a data architect with AWS.\u00a0He works with AWS customers and partners to provide guidance and technical assistance about Big Data, Analytics, AI\/ML and Security projects, helping them improve the value of their solutions when using AWS.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/automating-the-analysis-of-multi-speaker-audio-files-using-amazon-transcribe-and-amazon-athena\/<\/p>\n","protected":false},"author":0,"featured_media":239,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/238"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=238"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/238\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/239"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=238"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=238"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=238"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}