{"id":2887,"date":"2023-02-24T19:48:30","date_gmt":"2023-02-24T19:48:30","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2023\/02\/24\/how-ai-is-transforming-genomics\/"},"modified":"2023-02-24T19:48:30","modified_gmt":"2023-02-24T19:48:30","slug":"how-ai-is-transforming-genomics","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2023\/02\/24\/how-ai-is-transforming-genomics\/","title":{"rendered":"How AI Is Transforming Genomics"},"content":{"rendered":"<div data-url=\"https:\/\/blogs.nvidia.com\/blog\/2023\/02\/24\/how-ai-is-transforming-genomics\/\" data-title=\"How AI Is Transforming Genomics\" data-hashtags=\"\">\n<p>Advancements in whole genome sequencing have ignited a revolution in digital biology.<\/p>\n<p>Genomics programs across the world are gaining momentum as the cost of high-throughput, next-generation sequencing has declined.<\/p>\n<p>Whether used for sequencing <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/02\/18\/guinness-world-record-fastest-dna-sequencing\/\" target=\"_blank\" rel=\"noopener\">critical-care patients with rare diseases<\/a> or in <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/01\/26\/uk-biobank-advances-genomics-research-clara-parabricks\/\" target=\"_blank\" rel=\"noopener\">population-scale genetics research<\/a>, whole genome sequencing is becoming a fundamental step in clinical workflows and drug discovery.<\/p>\n<p>But genome sequencing is just the first step. Analyzing genome sequencing data requires accelerated compute, data science and AI to read and understand the genome. With the <a href=\"https:\/\/venturebeat.com\/games\/jensen-huang-qa-why-moores-law-is-dead-and-smart-design-is-replacing-it\/\" target=\"_blank\" rel=\"noopener\">end of Moore\u2019s law<\/a>, the observation that there\u2019s a doubling every two years in the number of transistors in an integrated circuit, new computing approaches are necessary to lower the cost of data analysis, increase the throughput and accuracy of reads, and ultimately unlock the full potential of the human genome.<\/p>\n<h2><b>An Explosion in <\/b><b>Bioinformatics<\/b><b> Data<\/b><\/h2>\n<p>Sequencing an individual\u2019s whole genome generates roughly 100 gigabytes of raw data. That more than doubles after the genome is sequenced using complex algorithms and applications such as deep learning and natural language processing.<\/p>\n<p>As the cost of sequencing a human genome continues to decrease, volumes of sequencing data are exponentially increasing.<\/p>\n<p>An estimated <a href=\"https:\/\/www.genome.gov\/about-genomics\/fact-sheets\/Genomic-Data-Science\" target=\"_blank\" rel=\"noopener\">40 exabytes<\/a> will be required to store all human genome data by 2025. As a reference, that\u2019s 8x more storage than would be required to store every word spoken in history.<\/p>\n<p>Many genome analysis pipelines are <a href=\"https:\/\/www.nature.com\/articles\/s41576-022-00551-z\" target=\"_blank\" rel=\"noopener\">struggling to keep up<\/a> with the expansive levels of raw data being generated.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-62552\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/02\/genomics-graphic.png\" alt=\"\" width=\"512\" height=\"211\"><\/p>\n<h2><b>Accelerated <\/b><b>Genome Sequencing Analysis<\/b> <b>Workflows<\/b><\/h2>\n<p>Sequencing analysis is complicated and computationally intensive, with numerous steps required to identify genetic variants in a human genome.<\/p>\n<p>Deep learning is becoming important for base calling right within the genomic instrument using RNN- and convolutional neural network (CNN)-based models. Neural networks interpret image and signal data generated by instruments and infer the 3 billion nucleotide pairs of the human genome. This is improving the accuracy of the reads and ensuring that base calling occurs closer to real time, further hastening the entire genomics workflow, from sample to variant call format to final report.<\/p>\n<p>For secondary genomic analysis, alignment technologies use a reference genome to assist with piecing a genome back together after the sequencing of DNA fragments.<\/p>\n<p><a href=\"https:\/\/docs.nvidia.com\/clara\/parabricks\/4.0.0\/Documentation\/ToolDocs\/man_fq2bam.html#man-fq2bam\" target=\"_blank\" rel=\"noopener\">BWA-MEM<\/a>, a leading algorithm for alignment, is helping researchers rapidly map DNA sequence reads to a reference genome. <a href=\"https:\/\/docs.nvidia.com\/clara\/parabricks\/4.0.0\/Documentation\/ToolDocs\/man_starfusion.html#man-starfusion\" target=\"_blank\" rel=\"noopener\">STAR<\/a> is another gold-standard alignment algorithm used for RNA-seq data that delivers accurate, ultrafast alignment to better understand gene expressions.<\/p>\n<p>The dynamic programming algorithm Smith-Waterman is also widely used for alignment, a step that\u2019s accelerated 35x on the <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/technologies\/hopper-architecture\/\" target=\"_blank\" rel=\"noopener\">NVIDIA H100 Tensor Core GPU<\/a>, which includes a dynamic programming accelerator.<\/p>\n<h2><b>Uncovering <\/b><b>Genetic Variants<\/b><\/h2>\n<p>One of the most critical stages of sequencing projects is variant calling, where researchers identify differences between a patient\u2019s sample and the reference genome. This helps clinicians determine what genetic disease a critically ill patient might have, or helps researchers look across a population to discover new drug targets. These variants can be single-nucleotide changes, small insertions and deletions, or complex rearrangements.<\/p>\n<p>GPU-optimized and -accelerated callers such as the <a href=\"https:\/\/gatk.broadinstitute.org\/hc\/en-us\" target=\"_blank\" rel=\"noopener\">Broad Institute\u2019s GATK<\/a> \u2014 a genome analysis toolkit for germline variant calling \u2014 increase speed of analysis. To help researchers remove false positives in GATK results, NVIDIA collaborated with the Broad Institute to introduce <a href=\"https:\/\/gatk.broadinstitute.org\/hc\/en-us\/articles\/10064202674971-Introducing-NVIDIA-s-NVScoreVariants-a-new-deep-learning-tool-for-filtering-variants-\" target=\"_blank\" rel=\"noopener\">NVScoreVariants<\/a>, a deep learning tool for filtering variants using CNNs.<\/p>\n<p>Deep learning-based variant callers such as Google\u2019s <a href=\"https:\/\/docs.nvidia.com\/clara\/parabricks\/4.0.0\/Documentation\/ToolDocs\/man_deepvariant.html#man-deepvariant\" target=\"_blank\" rel=\"noopener\">DeepVariant<\/a> increase accuracy of calls, without the need for a separate filtering step. DeepVariant uses a CNN architecture to call variants. It can be retrained to fine-tune for enhanced accuracy with each genomic platform\u2019s outputs.<\/p>\n<p>Secondary analysis software in the <a href=\"https:\/\/www.nvidia.com\/en-us\/clara\/genomics\/\" target=\"_blank\" rel=\"noopener\">NVIDIA Clara Parabricks<\/a> suite of tools has accelerated these variant callers <a href=\"https:\/\/www.biorxiv.org\/content\/biorxiv\/early\/2022\/07\/21\/2022.07.20.498972.full.pdf\" target=\"_blank\" rel=\"noopener\">up to 80x<\/a>. For example, germline HaplotypeCaller\u2019s runtime is reduced from 16 hours in a CPU-based environment to less than five minutes with GPU-accelerated Clara Parabricks.<\/p>\n<h2><b>Accelerating the Next Wave of Genomics<\/b><\/h2>\n<p>NVIDIA is helping to enable the next wave of genomics by powering both short- and long-read sequencing platforms with accelerated AI base calling and variant calling. Industry leaders and startups are working with NVIDIA to push the boundaries of whole genome sequencing.<\/p>\n<p>For example, biotech company <a href=\"https:\/\/www.pacb.com\/\" target=\"_blank\" rel=\"noopener\">PacBio<\/a> recently announced the <a href=\"https:\/\/www.pacb.com\/press_releases\/pacbio-announces-revio-a-revolutionary-new-long-read-sequencing-system-designed-to-provide-15-times-more-hifi-data-and-human-genomes-at-scale-for-under-1000\/\" target=\"_blank\" rel=\"noopener\">Revio<\/a> system, a new long-read sequencing system featuring NVIDIA Tensor Core GPUs. Enabled by a 20x increase in computing power relative to prior systems, Revio is designed to sequence human genomes with high-accuracy long reads at scale for under $1,000.<\/p>\n<p><a href=\"https:\/\/nanoporetech.com\/\" target=\"_blank\" rel=\"noopener\">Oxford Nanopore Technologies<\/a> offers the only single technology that can sequence any-length DNA or RNA fragments in real time. These features allow the rapid discovery of more genetic variation. Seattle Children\u2019s Hospital recently used the high-throughput nanopore sequencing instrument PromethION to understand a genetic disorder in the first few hours of a newborn\u2019s life.<\/p>\n<p><a href=\"https:\/\/www.ultimagenomics.com\/\" target=\"_blank\" rel=\"noopener\">Ultima Genomics<\/a> is offering high-throughput whole genome sequencing at just $100 per sample, and <a href=\"https:\/\/singulargenomics.com\/\" target=\"_blank\" rel=\"noopener\">Singular Genomics<\/a>\u2019 G4 is the most powerful benchtop system.<\/p>\n<h2><b>Learn More<\/b><\/h2>\n<p>At <a href=\"https:\/\/www.nvidia.com\/gtc\/\" target=\"_blank\" rel=\"noopener\">NVIDIA GTC<\/a>, a free AI conference taking place online March 20-23, speakers from PacBio, Oxford Nanopore, Genomic England, KAUST, Stanford, Argonne National Labs and other leading institutions will share the <a href=\"https:\/\/register.nvidia.com\/flow\/nvidia\/gtcspring2023\/attendeeportal\/page\/sessioncatalog?tab.catalogallsessionstab=16566177511100015Kus&amp;search.topic=1652219616233002cRyq\" target=\"_blank\" rel=\"noopener\">latest AI advances in genomic sequencing<\/a>, analysis and genomic <a href=\"https:\/\/blogs.nvidia.com\/blog\/2023\/01\/26\/what-are-large-language-models-used-for\/\" target=\"_blank\" rel=\"noopener\">large language models<\/a> for understanding gene expression.<\/p>\n<p>The conference features a <a href=\"https:\/\/www.nvidia.com\/gtc\/keynote\/\" target=\"_blank\" rel=\"noopener\">keynote from NVIDIA founder and CEO Jensen Huang<\/a> on Tuesday, March 21, at 8 a.m. PT.<\/p>\n<p><a href=\"https:\/\/www.nvidia.com\/en-us\/clara\/genomics\/\" target=\"_blank\" rel=\"noopener\">NVIDIA Clara Parabricks<\/a> is free for students and researchers. <a href=\"https:\/\/www.nvidia.com\/en-us\/clara\/genomics\/free-trial-vs-buy\/\" target=\"_blank\" rel=\"noopener\">Get started today<\/a> or <a href=\"https:\/\/www.nvidia.com\/en-us\/launchpad\/ai\/accelerated-genomic-analysis-with-clara-parabricks\/\" target=\"_blank\" rel=\"noopener\">try a free hands-on lab<\/a> to experience the toolkit in action.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/2023\/02\/24\/how-ai-is-transforming-genomics\/<\/p>\n","protected":false},"author":0,"featured_media":2888,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2887"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=2887"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2887\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/2888"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=2887"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=2887"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=2887"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}