{"id":2165,"date":"2022-06-20T06:38:38","date_gmt":"2022-06-20T06:38:38","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/06\/20\/the-kings-swedish-ai-rewrites-the-book-in-scandinavia\/"},"modified":"2022-06-20T06:38:38","modified_gmt":"2022-06-20T06:38:38","slug":"the-kings-swedish-ai-rewrites-the-book-in-scandinavia","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/06\/20\/the-kings-swedish-ai-rewrites-the-book-in-scandinavia\/","title":{"rendered":"The King\u2019s Swedish: AI Rewrites the Book in Scandinavia"},"content":{"rendered":"<div data-url=\"https:\/\/blogs.nvidia.com\/blog\/2022\/06\/19\/ai-sweden-nlp\/\" data-title=\"The King\u2019s Swedish: AI Rewrites the Book in Scandinavia\" data-hashtags=\"\">\n<p>If the King of Sweden wants help drafting his annual Christmas speech this year, he could ask the same AI model that\u2019s available to his 10 million subjects.<\/p>\n<p>As a test, researchers prompted the model, called GPT-SW3, to draft one of the royal messages, and it did a pretty good job, according to Magnus Sahlgren, who heads research in natural language understanding at AI Sweden, a consortium kickstarting the country\u2019s journey into the machine learning era.<\/p>\n<p>\u201cLater, our minister of digitalization visited us and asked the model to generate arguments for political positions and it came up with some really clever ones \u2014 and he intuitively understood how to prompt the model to generate good text,\u201d Sahlgren said.<\/p>\n<p>Early successes inspired work on an even larger and more powerful version of the language model they hope will serve any citizen, company or government agency in Scandinavia.<\/p>\n<h2><b>A Multilingual Model<\/b><\/h2>\n<p>The current version packs 3.6 billion parameters and is smart enough to do a few cool things in Swedish. Sahlgren\u2019s team aims to train a state-of-the-art model with a whopping 175 billion parameters that can handle all sorts of language tasks in the Nordic languages of Swedish, Danish, Norwegian and, it hopes, Icelandic, too.<\/p>\n<p>For example, a startup can use it to automatically generate product descriptions for an e-commerce website given only the products\u2019 names. Government agencies can use it to quickly classify and route questions from citizens.<\/p>\n<p>Companies can ask it to rapidly summarize reports so they can react fast. Hospitals can run distilled versions of the model privately on their own systems to improve patient care.<\/p>\n<p>\u201cIt\u2019s a foundational model we will provide as a service for whatever tasks people want to solve,\u201d said Sahlgren, who\u2019s been working at the intersection of language and machine learning since he earned his Ph.D. in computational linguistics in 2006.<\/p>\n<h2><b>Permission to Speak Freely<\/b><\/h2>\n<p>It\u2019s a capability increasingly seen as a strategic asset, a keystone of digital sovereignty in a world that speaks thousands of languages across nearly 200 countries.<\/p>\n<p>Most language services today focus on Chinese or English, the world\u2019s two most-spoken tongues. They\u2019re typically created in China or the U.S., and they aren\u2019t free.<\/p>\n<p>\u201cIt\u2019s important for us to have models built in Sweden for Sweden,\u201d Sahlgren said.<\/p>\n<h2><b>Small Team, Super System<\/b><\/h2>\n<p>\u201cWe\u2019re a small country and a core team of about six people, yet we can build a state-of-the-art resource like this for people to use,\u201d he added.<\/p>\n<p>That\u2019s because Sweden has a powerful engine in <a href=\"https:\/\/blogs.nvidia.com\/blog\/2021\/03\/23\/ai-supercomputer-sweden\/\">BerzeLiUs<\/a>, a 300-petaflops AI supercomputer at Link\u00f6ping University. It trained the initial GPT-SW3 model using just 16 of the 60 nodes in the <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-superpod\/\">NVIDIA DGX SuperPOD<\/a>.<\/p>\n<p>The next model may exercise all the system\u2019s nodes. Such super-sized jobs require super software like the <a href=\"https:\/\/developer.nvidia.com\/nvidia-nemo\">NVIDIA NeMo Megatron framework<\/a>.<\/p>\n<p>\u201cIt lets us scale our training up to the full supercomputer, and we\u2019ve been lucky enough to have access to experts in the NeMo development team \u2014 without NVIDIA it would have been so much more complicated to come this far,\u201d he said.<\/p>\n<h2><b>A Workflow for Any Language<\/b><\/h2>\n<p>NVIDIA\u2019s engineers created a recipe based on NeMo and an emerging process called p-tuning that optimizes massive models fast, and it\u2019s geared to work with any language.<\/p>\n<p>In one early test, a model nearly doubled its accuracy after NVIDIA engineers applied the techniques.<\/p>\n<figure id=\"attachment_57747\" aria-describedby=\"caption-attachment-57747\" class=\"wp-caption alignleft\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/06\/Magnus-Sahlgren.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/06\/Magnus-Sahlgren-400x306.jpg\" alt=\"Magnus Sahlgren, AI Sweden\" width=\"400\" height=\"306\"><\/p>\n<p><\/a><figcaption id=\"caption-attachment-57747\" class=\"wp-caption-text\">Magnus Sahlgren<\/figcaption><\/figure>\n<p>What\u2019s more, it requires one-tenth the data, slashing the need for tens of thousands of hand-labeled records. That opens the door for users to fine-tune a model with the relatively small, industry-specific datasets they have at hand.<\/p>\n<p>\u201cWe hope to inspire a lot of entrepreneurship in industry, startups and the public using our technology to develop their own apps and services,\u201d said Sahlgren.<\/p>\n<h2><strong>Writing the Next Chapter<\/strong><\/h2>\n<p>Meanwhile, NVIDIA\u2019s developers are already working on ways to make the enabling software better.<\/p>\n<p>One test shows great promise for training new capabilities using widely available English datasets into models designed for any language. In another effort, they\u2019re using the p-tuning techniques in inference jobs so models can learn on the fly.<\/p>\n<p>Zenodia Charpy, a senior solutions architect at NVIDIA based in Gothenburg, shares the enthusiasm of the AI Sweden team she supports. \u201cWe\u2019ve only just begun trying new and better methods to tackle these large language challenges \u2014 there\u2019s much more to come,\u201d she said.<\/p>\n<p>The GPT-SW3 model will be made available by the end of year via an early access program. To apply, contact francisca.hoyer@ai.se.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/2022\/06\/19\/ai-sweden-nlp\/<\/p>\n","protected":false},"author":0,"featured_media":2166,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2165"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=2165"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2165\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/2166"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=2165"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=2165"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=2165"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}