{"id":41,"date":"2020-08-17T07:53:40","date_gmt":"2020-08-17T07:53:40","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/08\/17\/expanding-scientific-portfolios-and-adapting-to-a-changing-world-with-amazon-personalize\/"},"modified":"2020-08-17T07:53:40","modified_gmt":"2020-08-17T07:53:40","slug":"expanding-scientific-portfolios-and-adapting-to-a-changing-world-with-amazon-personalize","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/08\/17\/expanding-scientific-portfolios-and-adapting-to-a-changing-world-with-amazon-personalize\/","title":{"rendered":"Expanding scientific portfolios and adapting to a changing world with Amazon Personalize"},"content":{"rendered":"<div id=\"\">\n<p><em>This is a guest blog post by David A. Smith at Thermo Fisher. In their own words, \u201cThermo Fisher Scientific is the world leader in serving science. Our Mission is to enable our customers to make the world healthier, cleaner, and safer. Whether our customers are accelerating life sciences research, solving complex analytical challenges, improving patient diagnostics and therapies, or increasing productivity in their laboratories, we are here to support them\u201d<\/em><\/p>\n<p>Researchers in Life Sciences perform increasingly complex work in an industry that\u2019s changing at an accelerated pace. With the recent focus on the COVID-19 pandemic, scientists around the world are under the microscope as they work to deliver a cure. At Thermo Fisher, our driving principle is to provide these researchers, and others like them, the tools and materials they need to study the world\u2019s most pressing problems.<\/p>\n<p>The specialized products we sell have always necessitated personalized customer experiences. We sell nearly every type of product related to scientific work, from everyday essentials like labware and chemical reagents to specialized instrumentation for genetic sequencing. Our goal is to let our customers know that they can get everything they need at Thermo Fisher. Traditionally, we approached this problem via dedicated commercial sales teams trained to handle specific products. In today\u2019s world, customer data comes from many different touchpoints, which makes it increasingly difficult for our sales teams to understand which products their customers need to do their research.<\/p>\n<p>Over the last three years, my team has maintained a custom portal for these sales teams where they can see data for every part of their customers\u2019 journey. This fast-moving environment presents a unique opportunity for us to use data science to deliver personalized product recommendations that target the right products for the right customers at the right time.<\/p>\n<p>In this post, we discuss how and why we decided to use <a href=\"https:\/\/aws.amazon.com\/personalize\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Personalize<\/a> and how that decision has empowered our team to deliver highly personalized, multi-channel content in an ever-developing ecosystem.<\/p>\n<h2>First-generation recommendations<\/h2>\n<p>Our team initially developed a rules-based recommendation system based on content curated by in-house scientists and run using SQL queries within our <a href=\"https:\/\/aws.amazon.com\/redshift\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Redshift<\/a> cluster.<\/p>\n<p>We had this system in place for a year, and it worked well, but as our data volume grew, our team was spending more and more time maintaining the system. We felt that our current infrastructure wasn\u2019t keeping up, and we wanted to migrate to a completely serverless infrastructure for improved scalability and fault tolerance. The following diagram illustrates our existing recommendations infrastructure.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-13989\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/07\/22\/thermo-fisher-personalize-4.jpg\" alt=\"\" width=\"800\" height=\"154\"><\/p>\n<p>Another risk we identified was that these recommendations relied on an internal content creation process to understand where products fit in the customer journey. Although this was a powerful tool, we struggled to provide high-quality recommendations for new or recently introduced products. This is a classic \u201ccold-start\u201d problem for recommender systems, and one of our requirements for any new system was that it could surface new items without additional maintenance.<\/p>\n<h2>Custom recommendations<\/h2>\n<p>Our team initially looked at third-party vendors to help improve our recommendations. However, we found that purchasing a solution would be costly to implement and force us to sacrifice some of the flexibility required to operate in a commercial organization. We quickly decided against buying an off the-shelf solution.<\/p>\n<p>The consensus was that we would build a custom machine learning (ML)-based system from scratch. We explored a few different options, including hierarchical recurrent neural network (HRNN) models. Eventually, we settled on a factorization machine model as the best combination of performance, ease of implementation, and scalability.<\/p>\n<h2>Personalized recommendations<\/h2>\n<p>About 8 weeks later, we were wrapping up the initial phases of model development and validation. The new system was performing well. We had significantly improved our predictions, and we were getting good feedback from some sample recommendations we had sent out.<\/p>\n<p>We were gearing up to productionize our new solution when our team learned about Amazon Personalize. It was immediately apparent to us that Amazon Personalize had the ideal balance of flexibility, scalability, and measurability we were looking for when we had evaluated off-the-shelf solutions 2 months prior.<\/p>\n<p>We decided to run some initial tests with Amazon Personalize to see how it performed on real data and get a feel for how much effort would be required to implement it. It took 2 days to prepare the data, train a model, and begin generating high-quality recommendations.<\/p>\n<h2>Bringing the test together<\/h2>\n<p>As a team that had recently planned for 4\u20136 weeks spent deploying our custom model into production, this was very attractive. For me, the data scientist responsible for successfully designing, building, and evaluating a completely homegrown solution, it was less attractive. I was excited about finally deploying our custom solution, and I was proud of its performance. We eventually decided to put the two models head-to-head, with the winner determined by the best combination of model performance, scalability, and flexibility.<\/p>\n<p>Like any proud parent, I immediately set out to prove the custom model was better. I designed 32 tests for each model, and, over the next week, I ran each test on over 100 different slices of data to see which performed better on a holdout dataset. The deeper and more expressive neural network models provided by Amazon Personalize did a better job of predicting user behavior over roughly 80% of the testing criteria.<\/p>\n<p>If you\u2019re a data scientist, this story might make you cringe, but it has a happy ending. Designing this testing process forced me to examine our data even more deeply and creatively than I had while building our custom recommendation system. I was able to rapidly test all the different hypotheses and use the results to develop a deep understanding of each model\u2019s relative strengths and weaknesses related to the business problem we initially set out to solve.<\/p>\n<p>Our team couldn\u2019t have performed such a thorough analysis if we were also managing the infrastructure required for deep learning models. As a team, we had the choice to either spend 6\u20138 weeks deploying our custom model or 2 weeks implementing a recommender system using Amazon Personalize.<\/p>\n<h2>Serverless infrastructure<\/h2>\n<p>Scalability and fault tolerance were our main priorities when designing the infrastructure for our scientific product recommendations. We also wanted a system that would allow us to visually monitor progress and track errors.<\/p>\n<p>We opted to use <a href=\"https:\/\/aws.amazon.com\/step-functions\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Step Functions<\/a> to build the backbone of our recommendations inference pipeline with customized <a href=\"http:\/\/aws.amazon.com\/lambda\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Lambda<\/a> functions to pull data from our Amazon Redshift cluster, prepare the datasets for ingestion by Amazon Personalize, and trigger and monitor Amazon Personalize jobs. The following graph illustrates this inference pipeline.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-13990\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/07\/22\/thermo-fisher-personalize-5.gif\" alt=\"\" width=\"900\" height=\"982\"><\/p>\n<h2>Flexibility in a changing world<\/h2>\n<p>Like many companies, our customers changed their habits significantly when the COVID-19 pandemic struck and businesses around the world shifted to work-from-home policies. There was a new demand to increase multi-channel targeting using email advertising campaigns.<\/p>\n<p>Our team received a request to use the recommendation system we built with Amazon Personalize for targeted product email recommendations. Although we had never planned for this, it only took us a week to take our existing serverless inference pipeline and modify it to build, test, and validate an entirely new inference pipeline tuned specifically to email recommendations. Pivoting quickly is always challenging, but our commitment to building scalable and flexible infrastructure allowed us to overcome many of the challenges traditionally faced by teams when managing ML deployments and infrastructure. The following diagram illustrates the architecture of the email inference pipeline.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-13991\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/07\/22\/thermo-fisher-personalize-6.jpg\" alt=\"\" width=\"1000\" height=\"385\"><\/p>\n<p>Despite the short turnaround time, the emails we\u2019ve sent out following these recommendations have performed significantly better than previous baselines.<\/p>\n<p>Looking back, it\u2019s clear to me that we would have had significantly more difficulty meeting this request if we had opted to deploy our custom factorization machine model instead of using Amazon Personalize.<\/p>\n<h2>Conclusion<\/h2>\n<p>Thermo Fisher is constantly striving to help scientists around the world solve some of our greatest challenges. With Amazon Personalize, we\u2019ve dramatically improved our ability to understand the work our customers do and serve them personalized experiences via multiple channels. Using Amazon Personalize has allowed us to focus on solving difficult problems instead of managing ML infrastructure.<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p>David A. Smith is a data scientist for Thermo Fisher Scientific based out of Carlsbad, California. He works with cross-organizational teams to design, build, and deploy automated models to drive customer intelligence and create business value. His interests include NLP, serverless ML, and blockchain technology. Outside of work, you can find David rock climbing, playing tennis, or swimming with his dog.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/expanding-scientific-portfolios-and-adapting-to-a-changing-world-with-amazon-personalize\/<\/p>\n","protected":false},"author":1,"featured_media":42,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/41"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=41"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/41\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/42"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=41"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=41"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=41"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}