{"id":320,"date":"2020-09-30T21:37:43","date_gmt":"2020-09-30T21:37:43","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/09\/30\/building-custom-language-models-to-supercharge-speech-to-text-performance-for-amazon-transcribe\/"},"modified":"2020-09-30T21:37:43","modified_gmt":"2020-09-30T21:37:43","slug":"building-custom-language-models-to-supercharge-speech-to-text-performance-for-amazon-transcribe","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/09\/30\/building-custom-language-models-to-supercharge-speech-to-text-performance-for-amazon-transcribe\/","title":{"rendered":"Building custom language models to supercharge speech-to-text performance for Amazon Transcribe"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/aws.amazon.com\/transcribe\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Transcribe<\/a> is a fully-managed automatic speech recognition service (ASR) that makes it easy to add speech-to-text capabilities to voice-enabled applications. As our service grows, so does the diversity of our customer base, which now spans domains such as insurance, finance, law, real estate, media, hospitality, and more. Naturally, customers in different market segments have asked Amazon Transcribe for more customization options to further enhance transcription performance.<\/p>\n<p>We\u2019re excited to introduce <a href=\"https:\/\/aws.amazon.com\/about-aws\/whats-new\/2020\/08\/amazon-transcribe-launches-custom-language-models\/\" target=\"_blank\" rel=\"noopener noreferrer\">Custom Language Models (CLM)<\/a>. The new feature allows you to submit a corpus of text data to train custom language models that target domain-specific use cases. Using CLM is easy because it capitalizes on existing data that you already possess (such as marketing assets, website content, and training manuals).<\/p>\n<p>In this post, we show you how to best use your available data to train a custom language model tailored for your <a href=\"https:\/\/aws.amazon.com\/transcribe\/\" target=\"_blank\" rel=\"noopener noreferrer\">speech-to-text<\/a> use case. Although our walkthrough uses a transcription example from the video gaming industry, you can use CLM to enhance custom speech recognition for any domain of your choosing. This post assumes that you\u2019re already familiar with how to use Amazon Transcribe, and focuses on demonstrating how to use the new CLM feature. Additional resources for using the service are available at the end.<\/p>\n<h2>Establishing context for evaluating CLM transcription performance<\/h2>\n<p>To evaluate how powerful CLM can be in terms of enhancing transcription accuracy, there are few steps we want to take. First we need to establish a baseline. To do that, we recorded an <a href=\"https:\/\/aws-ml-blog.s3.amazonaws.com\/artifacts\/Building-Custom-Language-Models\/clm-blog-16k-audio.m4a\" target=\"_blank\" rel=\"noopener noreferrer\">audio sample<\/a> that contains speech content and lingo commonly found in video game chats. We also have a human-generated transcription to benchmark the ground truth. This ground truth transcript serves as the reference transcript we use to compare the general transcription output of Amazon Transcribe and the output of CLM. The following is a partial snippet of this reference transcript.<\/p>\n<blockquote>\n<p>The 2020 holiday season is right around the corner. And with the way that the year\u2019s been going, we can all hope for a little excitement around the next-gen video game consoles coming out soon. So, what\u2019s the difference in hardware specs between the upcoming Playstation 5 and Xbox Series X? Well, let\u2019s take a look under the hoods of each next-gen gaming console. The PS5 features an AMD Zen 2 CPU with up to 3.5 GHz frequency. It sports an AMD Radeon GPU that touts 10.3 teraflops, running up to 2.23 GHz. Memory and storage respectively dial in at 16 GB and 825 GB. The PS5 supports both PS The PS5 supports both 4K and 8K resolution screens. The Xbox Series X also features an AMD Zen 2 CPU, but clocks in at 3.8 GHz instead. The console boasts a similar AMD custom GPU with 12 teraflops and 1.825 GHz.<\/p>\n<p>Memory is the same as that of the PS5\u2019s, coming in at 16 GB. But the default storage is where the system has an edge, bringing out a massive 1 TB hard drive. Like the PS5, the Series X also supports 4K and 8K resolution screens as well. Those of course, are just the numbers. Therefore, it remains to be seen exactly how the performance plays out in practice. It\u2019s worth noting that both systems have incorporated ray-tracing technology, something that\u2019s used to make light and shadows look better in-game. Both systems also offer 3D audio output for immersive experiences.<\/p>\n<\/blockquote>\n<p>Next, we want to run the sample audio through Amazon Transcribe using its generic speech engine and compare the text output to the ground truth transcript. We\u2019ve produced a partial snippet of the side-by-side comparison and highlighted errors for visibility. Compared to the ground truth transcript, the default Amazon Transcribe transcript showed a Word Error Rate (WER) of 31.87%. This WER shouldn\u2019t be interpreted as a full representation of the Amazon Transcribe service performance. It is just one instance for a very specific example audio. Note that accuracy is a measure of 100 minus WER. So the lower the WER, the higher the accuracy. For more information about calculating WER, see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Word_error_rate\" target=\"_blank\" rel=\"noopener noreferrer\">Word Error Rate<\/a> on Wikipedia.<\/p>\n<p><em>*Note that this WER is not a representation of the Amazon Transcribe service performance. It is just one instance for a very specific and limited test example. All figures are for single-case and limited scope illustration purposes.<\/em><\/p>\n<p>The following text is the human-generated ground-truth reference transcript:<\/p>\n<blockquote>\n<p>The 2020 holiday season is right around the corner. And with the way that the year\u2019s been going, we can all hope for a little excitement around the next-gen video game consoles coming out soon. So, what\u2019s the difference in <span><strong>hardware specs<\/strong><\/span> between the upcoming Playstation 5 and Xbox Series X? Well, let\u2019s take a look under the hoods of each next-gen gaming <span><strong>console<\/strong><\/span>. The PS5 features <span><strong>an AMD Zen 2 CPU<\/strong> <\/span>with up to 3.5 GHz frequency. It sports <span><strong>an AMD Radeon<\/strong><\/span> GPU that <span><strong>touts<\/strong> <\/span>10.3 teraflops, running up to 2.23 GHz. Memory and storage respectively <span><strong>dial in<\/strong><\/span> at 16 GB and 825 GB. The PS5 supports both PS The PS5 supports both 4K and <span><strong>8K resolution<\/strong> <\/span>screens. Meanwhile, the Xbox Series X also features an <span><strong>AMD Zen 2 CPU<\/strong><\/span>, but clocks in at <span><strong>3.8 GHz<\/strong><\/span> instead. <span><strong>The console boasts a similar AMD custom<\/strong><\/span> GPU with 12 teraflops and 1.825 GHz. Memory is the same as that of the PS5\u2019s, <span><strong>coming in at<\/strong><\/span> 16 GB. But the default storage is where the system has an edge, bringing out a massive 1 TB hard drive. Like the PS5, the <span><strong>Series X<\/strong><\/span> also supports 4K and 8K resolution screens as well. Those of course, are just the numbers. Therefore, it remains to be seen exactly how the performance plays out in practice. It\u2019s worth noting that both systems have incorporated ray-tracing technology, something that\u2019s used to make light and shadows look better in-game. Both systems also offer 3D audio output for immersive experiences.<\/p>\n<\/blockquote>\n<p>The following text is the machine-generated transcript by the Amazon Transcribe generic speech engine:<\/p>\n<blockquote>\n<p>the 2020 holiday season is right around the corner. And with the way that the years been going, we can all hope for a little excitement around the next Gen video game consoles coming out soon. So what\u2019s the difference in <span><strong>heart respects<\/strong><\/span> between the upcoming PlayStation five and Xbox Series X? Well, let\u2019s take a look under the hood of each of these <span><strong>Consul\u2019s<\/strong><\/span>. The PS five features in <span><strong>a M descend to CPU<\/strong> <\/span>with up to 3.5 gigahertz frequency is sports <span><strong>and AM the radio on<\/strong><\/span> GPU that <span><strong>tells<\/strong> <\/span>10.3 teraflops running up to 2.23 gigahertz memory and storage, respectively. <span><strong>Dahlin<\/strong> <\/span>at 16 gigabytes in 825 gigabytes. The PS five supports both PS. The PS five supports both four K and <span><strong>A K resolutions<\/strong><\/span>. Meanwhile, the Xbox Series X also features an <span><strong>AM descend to CPU<\/strong><\/span>, but clocks in at <span><strong>three point take. It hurts<\/strong><\/span> instead, <span><strong>the cost. Almost a similar AMG custom<\/strong><\/span> GPU, with 12 teraflops and 1.825 gigahertz memory, is the same as out of the PS. Fives <span><strong>come in at<\/strong> <\/span>16 gigabytes, but the default storage is where the system has an edge. Bring out a massive one terabyte hard drive. Like the PS five. The <span><strong>Siri\u2019s X<\/strong> <\/span>also supports four K and eight K resolution screens as well. Those, of course, are just the numbers. Therefore, it remains to be seen exactly how the performance plays out. In practice. It\u2019s worth noting that both systems have incorporated Ray tracing technology, something that\u2019s used to make light and shadows look better in game. Both systems also offer three D audio output for immersive experiences.<\/p>\n<\/blockquote>\n<p>Although the Amazon Transcribe generic speech model has done a decent job of transcribing the audio, it\u2019s more likely that a tailored custom model can yield higher transcription accuracy.<\/p>\n<h2>Solution overview<\/h2>\n<p>In this post, we walk you through how to train a custom language model and evaluate how its transcription output compares to the reference transcript. You complete the following high-level steps:<\/p>\n<ol>\n<li>Prepare your training data.<\/li>\n<li>Train your CLM.<\/li>\n<li>Use your CLM to transcribe audio.<\/li>\n<li>Evaluate the results by comparing the CLM transcript against the generic model transcript\u2019s accuracy.<\/li>\n<\/ol>\n<h3>Preparing your training data<\/h3>\n<p>Before we begin, it\u2019s important to distinguish between <em>training data<\/em> and <em>tuning data<\/em>.<\/p>\n<p>Training data for CLM typically includes text data that is specific to your domain. Some examples of training data could include relevant text content from your website, training manuals, sales and marketing collateral, or other text sources.<\/p>\n<p>Meanwhile, human-annotated audio transcripts of actual phone calls or media content that are directly relevant to your use case can be used as tuning data. Ideally, both training and tuning data ought to be domain-specific, but in practice you may only have a small amount of audio transcriptions available. In that case, transcriptions should be used as tuning data. If more transcription data is available, it can and should be used as part of the training set as well. For more information about the difference between training and tuning data, see <a href=\"https:\/\/docs.aws.amazon.com\/transcribe\/latest\/dg\/custom-language-models.html\" target=\"_blank\" rel=\"noopener noreferrer\">Improving Domain-Specific Transcription Accuracy with Custom Language Models<\/a>.<\/p>\n<p>Like many domains, video gaming has its own set of technical jargon, syntax, and speech dynamics that may make the use of a general speech recognition engine suboptimal. To build a custom model, we first need data that\u2019s representative of the domain. For this use case, we want free form text from the video gaming industry. We use a variety of publicly available information from a variety of sources about video gaming. For convenience, we\u2019ve compiled that training and tuning set, which you can <a href=\"https:\/\/aws-ml-blog.s3.amazonaws.com\/artifacts\/Building-Custom-Language-Models\/training_tuning_data.tar.gz\">download<\/a> to follow along. Keep in mind that the nature, quality, and quantity of your training data has a dramatic impact on the resultant custom model you build. All else equal, it\u2019s better to have more data than less.<\/p>\n<p>As a general guideline, your training and tuning datasets should meet the following parameters:<\/p>\n<ul>\n<li>Is in plain text (it\u2019s not a file such as a Microsoft Word document, CSV file, or PDF).<\/li>\n<li>Has a single sentence per line.<\/li>\n<li>Is encoded in UTF-8.<\/li>\n<li>Doesn\u2019t contain any formatting characters, such as HTML tags.<\/li>\n<li>Is less than 2 GB in size if you intend to use the file as training data. You can provide a maximum of 2 GB of training data.<\/li>\n<li>Is less than 200 MB in size if you intend to use the file as tuning data. You can provide a maximum of 200 MB of optional tuning data.<\/li>\n<\/ul>\n<p>The following test is a partial snippet of our example training set:<\/p>\n<blockquote>\n<p>The PS5 will feature a custom eight-core AMD Zen 2 CPU clocked at 3.5GHz (variable frequency) and a custom GPU based on AMD\u2019s RDNA 2 architecture hardware that promises 10.28 teraflops and 36 compute units clocked at 2.23GHz (also variable frequency).<\/p>\n<p>It\u2019ll also have 16GB of GDDR6 RAM and a custom 825GB SSD that Sony has previously promised will offer super-fast loading times in gameplay, via Eurogamer.<\/p>\n<p>One of the biggest technical updates in the PS5 was already announced last year: a switch to SSD storage for the console\u2019s main hard drive, which Sony says will result in dramatically faster load times.<\/p>\n<p>A previous demo showed Spider-Man loading levels in less than a second on the PS5, compared to the roughly eight seconds it took on a PS4.PlayStation hardware lead Mark Cerny dove into some of the details about those SSD goals at the announcement.<\/p>\n<p>Where it took a PS4 around 20 seconds to load a single gigabyte of data, the goal with the PS5\u2019s SSD was to enable loading<\/p>\n<p>five gigabytes of data in a single second.<\/p>\n<\/blockquote>\n<h3>Training your custom language model<\/h3>\n<p>In the following steps, we show you how to train a CLM using base training data and a tuning split. Using a tuning split is entirely optional. If you prefer not to train a CLM using a tuning split, you can skip step 5.<\/p>\n<ol>\n<li>\n<a href=\"https:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/user-guide\/upload-objects.html\" target=\"_blank\" rel=\"noopener noreferrer\">Upload<\/a> your training data (and\/or tuning data) to their respective <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) buckets.<\/li>\n<li>On the Amazon Transcribe console, for <strong>Name<\/strong>, enter a name for your custom model so you can reference it for later use.<\/li>\n<li>For <strong>Base model<\/strong>, choose a model type that matches your use case, based on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sampling_(signal_processing)\" target=\"_blank\" rel=\"noopener noreferrer\">audio sample rate<\/a>.<\/li>\n<li>For <strong>Training data<\/strong>, enter the appropriate S3 bucket where you previously uploaded your training data.<\/li>\n<li>For <strong>Tuning data<\/strong>, enter the appropriate S3 bucket where you uploaded your tuning data. (This step is optional)<\/li>\n<li>For <strong>Access permissions<\/strong>, designate the appropriate <a href=\"https:\/\/docs.aws.amazon.com\/transcribe\/latest\/dg\/training-data-permissions.html\" target=\"_blank\" rel=\"noopener noreferrer\">access permissions<\/a>.<\/li>\n<li>Choose <strong>Train model<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16314\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/24\/1-Train-the-model.jpg\" alt=\"\" width=\"900\" height=\"1179\"><\/p>\n<p>Amazon Transcribe service does the heavy lifting of automatically training a custom model for you.<\/p>\n<p>To track the progress of the model training, go to the <strong>Custom language models<\/strong> page on the Amazon Transcribe console. The status indicates if the training is in progress, complete, or failed.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16315\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/24\/2-Screenshot-4.jpg\" alt=\"\" width=\"900\" height=\"187\"><\/p>\n<h3>Using your custom language model to transcribe audio<\/h3>\n<p>When your custom model training is complete, you\u2019re ready to use it. Simply<a href=\"https:\/\/aws.amazon.com\/getting-started\/hands-on\/create-audio-transcript-transcribe\/\" target=\"_blank\" rel=\"noopener noreferrer\"> start a typical transcription job<\/a> as you would using Amazon Transcribe. However, this time, we want to invoke the custom language model we just trained, as opposed to using the default speech engine. For this post, we assume you already have familiarity with how to run a typical transcription job, so we only call out the new component: for <strong>Model type<\/strong>, select <strong>Custom language model<\/strong>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-16316\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/24\/3-Specify-job-details.jpg\" alt=\"\" width=\"900\" height=\"965\"><\/p>\n<h3>Evaluating the results<\/h3>\n<p>When your transcription job is complete, it\u2019s time to see how well the CLM performed. We can evaluate the output transcript against the human-annotated reference transcript, just as we had compared the generic Amazon Transcribe machine output against the human-annotated reference transcript. We actually made two CLMs (one without tuning split and one with tuning split), which we showcase as a full summary in the following table.<\/p>\n<table border=\"1px\" cellpadding=\"5px\">\n<tbody>\n<tr>\n<td width=\"365\"><strong>Transcription Type<\/strong><\/td>\n<td width=\"200\"><strong>WER (%)<\/strong><\/td>\n<td width=\"200\"><strong>Accuracy (100-WER)<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"365\">Amazon Transcribe generic model<\/td>\n<td width=\"200\">31.34%<\/td>\n<td width=\"200\">68.66%<\/td>\n<\/tr>\n<tr>\n<td width=\"365\">Amazon Transcribe CLM with no tuning split<\/td>\n<td width=\"200\">26.19%<\/td>\n<td width=\"200\">73.81%<\/td>\n<\/tr>\n<tr>\n<td width=\"365\">Amazon Transcribe CLM with tuning split<\/td>\n<td width=\"200\">20.23%<\/td>\n<td width=\"200\">79.77%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><em>\u00a0<\/em>A lower WER is better. These WERs aren\u2019t representative of overall Amazon Transcribe performance. All numbers are relative to demonstrate the point of using custom models over generic models, and are specific only to this singular audio sample.<\/p>\n<p>The WER reductions are pretty significant! As you can see, although AmazonTranscribe\u2019s generic engine performed decently in transcribing the sample audio from the video gaming domain, the CLM we built using training data performed 5% better. And the CLM built using training data and a tuning split performed approximately 11% better. These comparative results are unsurprising because the more relevant training and tuning that a model experiences, the more tailored it is to the specific domain and use case.<\/p>\n<p>To give a qualitative visual comparison, we\u2019ve taken a snippet of the transcript from each CLM\u2019s output and put it against the original human annotated reference transcript to share a qualitative comparison of the different terms that were recognized by each model. We used green highlights to show the progressive accuracy improvements in each iteration.<\/p>\n<p>The following text is the human-generated ground truth reference transcript:<\/p>\n<blockquote>\n<p>The 2020 holiday season is right around the corner. And with the way that the year\u2019s been going, we can all hope for a little excitement around the next-gen video game consoles coming out soon. So, what\u2019s the difference in <span><strong>hardware specs<\/strong><\/span> between the upcoming Playstation 5 and Xbox Series X? Well, let\u2019s take a look under the hoods of each next-gen gaming <span><strong>console<\/strong><\/span>. The PS5 features an <span><strong>AMD Zen 2 CPU<\/strong> <\/span>with up to 3.5 GHz frequency. It sports<span><strong> an AMD Radeon<\/strong><\/span> GPU that <span><strong>touts<\/strong> <\/span>10.3 teraflops, running up to 2.23 GHz. Memory and storage respectively <span><strong>dial in<\/strong> <\/span>at 16 GB and 825 GB. The PS5 supports both PS The PS5 supports both 4K and <span><strong>8K resolution<\/strong> <\/span>screens. Meanwhile, the Xbox Series X also features an <span><strong>AMD Zen 2 CPU<\/strong><\/span>, but clocks in at <span><strong>3.8 GHz<\/strong> <\/span>instead. <span><strong>The console boasts a similar AMD custom<\/strong> <\/span>GPU with 12 teraflops and 1.825 GHz. Memory is the same as that of the PS5\u2019s, <span><strong>coming in at<\/strong> <\/span>16 GB. But the default storage is where the system has an edge, bringing out a massive 1 TB hard drive. Like the PS5, the <span><strong>Series X<\/strong> <\/span>also supports 4K and 8K resolution screens as well. Those of course, are just the numbers. Therefore, it remains to be seen exactly how the performance plays out in practice. It\u2019s worth noting that both systems have incorporated ray-tracing technology, something that\u2019s used to make light and shadows look better in-game. Both systems alsooffer 3D audio output for immersive experiences.<\/p>\n<\/blockquote>\n<p>The following text is the machine transcription output by Amazon Transcribe\u2019s generic speech engine:<\/p>\n<blockquote>\n<p>the 2020 holiday season is right around the corner. And with the way that the years been going, we can all hope for a little excitement around the next Gen video game consoles coming out soon. So what\u2019s the difference in <span><strong>heart respects<\/strong><\/span> between the upcoming PlayStation five and Xbox Series X? Well, let\u2019s take a look under the hood of each of these <span><strong>Consul\u2019s<\/strong><\/span>. The PS five features in <span><strong>a M descend to CPU<\/strong> <\/span>with up to 3.5 gigahertz frequency is sports <span><strong>and AM the radio on<\/strong> <\/span>GPU that <span><strong>tells<\/strong> <\/span>10.3 teraflops running up to 2.23 gigahertz memory and storage, respectively. <span><strong>Dahlin<\/strong> <\/span>at 16 gigabytes in 825 gigabytes. The PS five supports both PS. The PS five supports both four K and <span><strong>A K resolutions<\/strong><\/span>. Meanwhile, the Xbox Series X also features an <span><strong>AM descend to CPU<\/strong><\/span>, but clocks in at <span><strong>three point take. It hurts<\/strong> <\/span>instead, <span><strong>the cost. Almost a similar AMG custom<\/strong> <\/span>GPU, with 12 teraflops and 1.825 gigahertz memory, is the same as out of the PS. Fives <span><strong>come in at<\/strong> <\/span>16 gigabytes, but the default storage is where the system has an edge. Bring out a massive one terabyte hard drive. Like the PS five. The <span><strong>Siri\u2019s X<\/strong> <\/span>also supports four K and eight K resolution screens as well. Those, of course, are just the numbers. Therefore, it remains to be seen exactly how the performance plays out. In practice. It\u2019s worth noting that both systems have incorporated Ray tracing technology, something that\u2019s used to make light and shadows look better in game. Both systems also offer three D audio output for immersive experiences.<\/p>\n<\/blockquote>\n<p>The following text is the machine transcription output by CLM (base training, no tuning split):<\/p>\n<blockquote>\n<p>the 2020 holiday season is right around the corner. And with the way that the years been going, we can all hope for a little excitement around the next Gen videogame consoles coming out soon. So what\u2019s the difference in <span><strong>hardware specs<\/strong><\/span> between the upcoming PlayStation five and Xbox Series X? Well, let\u2019s take a look under the hood of each of these <span><strong>consoles<\/strong><\/span>. The PS five features <span><strong>an A M D Zen two CPU<\/strong> <\/span>with up to 3.5 gigahertz frequency it sports <span><strong>and am the radio<\/strong> <\/span>GPU. That <span><strong>tells<\/strong> <\/span>10.3 teraflops running up to 2.23 gigahertz memory and storage, respectively, <span><strong>Dahlin<\/strong> <\/span>at 16 gigabytes and 825 gigabytes, The PS five supports both PS. The PS five supports both four K and <span><strong>eight K resolutions<\/strong><\/span>. Meanwhile, the Xbox Series X also features an <span><strong>A M D Zen two CPU<\/strong><\/span>, but clocks in at 3.8 hertz. Instead. <span><strong>The console boasts a similar a AMD custom<\/strong> <\/span>GPU, with 12 teraflops and 1.825 hertz. Memory is the same as out of the PS five\u2019s <span><strong>come in at<\/strong> <\/span>16 gigabytes, but the default storage is where the system has an edge. Bring out a massive one terabyte hard drive. Like the PS five, the <span><strong>Series X<\/strong> <\/span>also supports four K and eight K resolution screens as well. Those, of course, are just the numbers. Therefore, it remains to be seen exactly how the performance plays out. In practice. It\u2019s worth noting that both systems have incorporated Ray tracing technology, something that\u2019s used to make light and shadows look better in game. Both systems also offer three D audio output for immersive experiences<\/p>\n<\/blockquote>\n<p>The following text is the machine transcription output by CLM (base training, with tuning split):<\/p>\n<blockquote>\n<p>the 2020 holiday season is right around the corner. And with the way that the year\u2019s been going, we can all hope for a little excitement around the next Gen video game consoles coming out soon. So what\u2019s the difference in <span><strong>hardware specs<\/strong> <\/span>between the upcoming PlayStation five and Xbox Series X? Well, let\u2019s take a look under the hoods of each of these <span><strong>consoles<\/strong><\/span>. The PS five features an <span><strong>AMG Zen two CPU<\/strong> <\/span>with up to 3.5 givers frequency it sports an <span><strong>AMD Radeon<\/strong><\/span> GPU that<span><strong> touts<\/strong><\/span> 10.3 teraflops running up to 2.23 gigahertz memory and storage, respectively. Dial in at 16 gigabytes and 825 gigabytes. The PS five supports both PS. The PS five supports both four K and eight K resolutions. Meanwhile, the Xbox Series X also features an <span><strong>AMG Zen two<\/strong><\/span> CPU, but clocks in at 3.8 had hurts. Instead,<span><strong> the console boasts a similar AMD custom<\/strong><\/span>. GPU, with 12 teraflops and 1.825 gigahertz memory is the same as out of the PS five\u2019s <span><strong>coming in at<\/strong><\/span> 16 gigabytes. But the default storage is where the system has an edge, bringing out a massive one terabyte hard drive. Like the PS five, the <span><strong>Series X<\/strong><\/span> also supports four K and eight K resolution screens as well. Those, of course, are just the numbers. Therefore, it remains to be seen exactly how the performance plays out. In practice. It\u2019s worth noting that both systems have incorporated Ray tracing technology, something that\u2019s used to make light and shadows look better in game. Both systems also offer three D audio output for immersive experiences.<\/p>\n<\/blockquote>\n<p>The difference in improvement varies according to your use case and the quality of your training data and tuning set. Experimentation is encouraged. In general, the more training and tuning that your custom model undergoes, the better the performance. CLM doesn\u2019t guarantee 100% accuracy, but it can offer significant performance improvements over generic speech recognition models.<\/p>\n<h2>Best practices<\/h2>\n<p>It\u2019s important to note that the resultant custom language model depends directly on what you use as your training dataset. All else equal, the closer the representation of your training data to real use cases, the more performant your custom model is. Moreover, more data is always preferred. For more information about the general guidelines, see <a href=\"https:\/\/docs.aws.amazon.com\/transcribe\/latest\/dg\/custom-language-models.html\" target=\"_blank\" rel=\"noopener noreferrer\">Improving Domain-Specific Transcription Accuracy with Custom Language Models<\/a>.<\/p>\n<p>The Amazon Transcribe CLM doesn\u2019t charge for model training, so feel free to experiment. In a single AWS account, you can train up to 10 custom models to address different domains, use cases, or new training datasets. After you have your CLM, you can choose which transcription jobs to utilize your CLM. You only incur an additional CLM charge for the transcription jobs in which you apply a custom language model.<\/p>\n<h2>Conclusion<\/h2>\n<p>CLMs can be a powerful capability when it comes to improving transcription accuracy for domain-specific use cases. The new feature is available in all <a href=\"https:\/\/aws.amazon.com\/about-aws\/global-infrastructure\/regions_az\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Regions<\/a> where Amazon Transcribe already operates. At the time of this writing, the feature only supports US English. Additional language support will come with time. Start training your own custom models by visiting <a href=\"https:\/\/aws.amazon.com\/transcribe\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Transcribe<\/a> and checking out <a href=\"https:\/\/docs.aws.amazon.com\/transcribe\/latest\/dg\/custom-language-models.html\" target=\"_blank\" rel=\"noopener noreferrer\">Improving Domain-Specific Transcription Accuracy with Custom Language Models<\/a>.<\/p>\n<h2>Related Resources<\/h2>\n<p>For additional resources, see the following:<\/p>\n<p>\u00a0<\/p>\n<hr>\n<h3>About the Authors<\/h3>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-16317 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/24\/PaulZhao.jpg\" alt=\"\" width=\"101\" height=\"140\"> Paul Zhao<\/strong> is Lead Product Manager at AWS Machine Learning. He manages speech recognition services like Amazon Transcribe and Amazon Transcribe Medical. He was formerly a serial entrepreneur, having launched, operated, and exited two successful businesses in the areas of IoT and FinTech, respectively.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-16318 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/24\/VivekGovindan.jpg\" alt=\"\" width=\"100\" height=\"110\"><\/strong><strong>Vivek Govindan<\/strong> is Senior Software Development engineer at AWS Machine Learning. Outside of work, Vivek is an ardent soccer fan.<\/p>\n<p>\u00a0<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/building-custom-language-models-to-supercharge-speech-to-text-performance-for-amazon-transcribe\/<\/p>\n","protected":false},"author":0,"featured_media":321,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/320"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=320"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/320\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/321"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=320"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=320"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=320"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}