{"id":420,"date":"2020-10-18T20:13:08","date_gmt":"2020-10-18T20:13:08","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/10\/18\/softmax-activation-function-with-python\/"},"modified":"2020-10-18T20:13:08","modified_gmt":"2020-10-18T20:13:08","slug":"softmax-activation-function-with-python","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/10\/18\/softmax-activation-function-with-python\/","title":{"rendered":"Softmax Activation Function with Python"},"content":{"rendered":"<div id=\"\">\n<p><strong>Softmax<\/strong> is a mathematical function that converts a vector of numbers into a vector of probabilities, where the probabilities of each value are proportional to the relative scale of each value in the vector.<\/p>\n<p>The most common use of the softmax function in applied machine learning is in its use as an activation function in a neural network model. Specifically, the network is configured to output N values, one for each class in the classification task, and the softmax function is used to normalize the outputs, converting them from weighted sum values into probabilities that sum to one. Each value in the output of the softmax function is interpreted as the probability of membership for each class.<\/p>\n<p>In this tutorial, you will discover the softmax activation function used in neural network models.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Linear and Sigmoid activation functions are inappropriate for multi-class classification tasks.<\/li>\n<li>Softmax can be thought of as a softened version of the argmax function that returns the index of the largest value in a list.<\/li>\n<li>How to implement the softmax function from scratch in Python and how to convert the output into a class label.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_10604\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-10604\" loading=\"lazy\" class=\"size-full wp-image-10604\" src=\"https:\/\/3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com\/wp-content\/uploads\/2020\/07\/Softmax-Activation-Function-with-Python.jpg\" alt=\"Softmax Activation Function with Python\" width=\"800\" height=\"450\"><\/p>\n<p id=\"caption-attachment-10604\" class=\"wp-caption-text\">Softmax Activation Function with Python<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/ian-arlett\/36340268755\/\">Ian D. Keating<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Predicting Probabilities With Neural Networks<\/li>\n<li>Max, Argmax, and Softmax<\/li>\n<li>Softmax Activation Function<\/li>\n<\/ol>\n<h2>Predicting Probabilities With Neural Networks<\/h2>\n<p>Neural network models can be used to model classification predictive modeling problems.<\/p>\n<p>Classification problems are those that involve predicting a class label for a given input. A standard approach to modeling classification problems is to use a model to predict the probability of class membership. That is, given an example, what is the probability of it belonging to each of the known class labels?<\/p>\n<ul>\n<li>For a binary classification problem, a <a href=\"https:\/\/machinelearningmastery.com\/discrete-probability-distributions-for-machine-learning\/\">Binomial probability distribution<\/a> is used. This is achieved using a network with a single node in the output layer that predicts the probability of an example belonging to class 1.<\/li>\n<li>For a multi-class classification problem, a <a href=\"https:\/\/machinelearningmastery.com\/discrete-probability-distributions-for-machine-learning\/\">Multinomial probability<\/a> is used. This is achieved using a network with one node for each class in the output layer and the sum of the predicted probabilities equals one.<\/li>\n<\/ul>\n<p>A neural network model requires an activation function in the output layer of the model to make the prediction.<\/p>\n<p>There are different activation functions to choose from; let\u2019s look at a few.<\/p>\n<h3>Linear Activation Function<\/h3>\n<p>One approach to predicting class membership probabilities is to use a linear activation.<\/p>\n<p>A linear activation function is simply the sum of the weighted input to the node, required as input for any activation function. As such, it is often referred to as \u201c<em>no activation function<\/em>\u201d as no additional transformation is performed.<\/p>\n<p>Recall that a <a href=\"https:\/\/machinelearningmastery.com\/what-is-probability\/\">probability<\/a> or a likelihood is a numeric value between 0 and 1.<\/p>\n<p>Given that no transformation is performed on the weighted sum of the input, it is possible for the linear activation function to output any numeric value. This makes the linear activation function inappropriate for predicting probabilities for either the binomial or multinomial case.<\/p>\n<h3>Sigmoid Activation Function<\/h3>\n<p>Another approach to predicting class membership probabilities is to use a sigmoid activation function.<\/p>\n<p>This function is also called the logistic function. Regardless of the input, the function always outputs a value between 0 and 1. The form of the function is an S-shape between 0 and 1 with the vertical or middle of the \u201c<em>S<\/em>\u201d at 0.5.<\/p>\n<p>This allows very large values given as the weighted sum of the input to be output as 1.0 and very small or negative values to be mapped to 0.0.<\/p>\n<p>The sigmoid activation is an ideal activation function for a binary classification problem where the output is interpreted as a Binomial probability distribution.<\/p>\n<p>The sigmoid activation function can also be used as an activation function for multi-class classification problems where classes are non-mutually exclusive. These are often referred to as a multi-label classification rather than multi-class classification.<\/p>\n<p>The sigmoid activation function is not appropriate for multi-class classification problems with mutually exclusive classes where a multinomial probability distribution is required.<\/p>\n<p>Instead, an alternate activation is required called the <strong>softmax function<\/strong>.<\/p>\n<h2>Max, Argmax, and Softmax<\/h2>\n<h3>Max Function<\/h3>\n<p>The maximum, or \u201c<em>max<\/em>,\u201d mathematical function returns the largest numeric value for a list of numeric values.<\/p>\n<p>We can implement this using the <em>max()<\/em> Python function; for example:<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f8ca1b699c59406218746\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# example of the max of a list of numbers<br \/>\n# define data<br \/>\ndata = [1, 3, 2]<br \/>\n# calculate the max of the list<br \/>\nresult = max(data)<br \/>\nprint(result)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># example of the max of a list of numbers<\/span><\/p>\n<p><span class=\"crayon-p\"># define data<\/span><\/p>\n<p><span class=\"crayon-v\">data<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-p\"># calculate the max of the list<\/span><\/p>\n<p><span class=\"crayon-v\">result<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">max<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">result<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0002 seconds] --><\/p>\n<p>Running the example returns the largest value \u201c3\u201d from the list of numbers.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<h3>Argmax Function<\/h3>\n<p>The argmax, or \u201c<em>arg max<\/em>,\u201d mathematical function returns the index in the list that contains the largest value.<\/p>\n<p>Think of it as the meta version of max: one level of indirection above max, pointing to the position in the list that has the max value rather than the value itself.<\/p>\n<p>We can implement this using the <a href=\"https:\/\/docs.scipy.org\/doc\/numpy\/reference\/generated\/numpy.argmax.html\">argmax() NumPy function<\/a>; for example:<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f8ca1b699c5f425684376\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# example of the argmax of a list of numbers<br \/>\nfrom numpy import argmax<br \/>\n# define data<br \/>\ndata = [1, 3, 2]<br \/>\n# calculate the argmax of the list<br \/>\nresult = argmax(data)<br \/>\nprint(result)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># example of the argmax of a list of numbers<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">argmax<\/span><\/p>\n<p><span class=\"crayon-p\"># define data<\/span><\/p>\n<p><span class=\"crayon-v\">data<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-p\"># calculate the argmax of the list<\/span><\/p>\n<p><span class=\"crayon-v\">result<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">argmax<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">result<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0002 seconds] --><\/p>\n<p>Running the example returns the list index value \u201c1\u201d that points to the array index [1] that contains the largest value in the list \u201c3\u201d.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<h3>Softmax Function<\/h3>\n<p>The softmax, or \u201c<em>soft max<\/em>,\u201d mathematical function can be thought to be a probabilistic or \u201c<em>softer<\/em>\u201d version of the argmax function.<\/p>\n<blockquote>\n<p>The term softmax is used because this activation function represents a smooth version of the winner-takes-all activation model in which the unit with the largest input has output +1 while all other units have output 0.<\/p>\n<\/blockquote>\n<p>\u2014 Page 238, <a href=\"https:\/\/amzn.to\/2TQYuDo\">Neural Networks for Pattern Recognition<\/a>, 1995.<\/p>\n<p>From a probabilistic perspective, if the <em>argmax()<\/em> function returns 1 in the previous section, it returns 0 for the other two array indexes, giving full weight to index 1 and no weight to index 0 and index 2 for the largest value in the list [1, 3, 2].<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>What if we were less sure and wanted to express the argmax probabilistically, with likelihoods?<\/p>\n<p>This can be achieved by scaling the values in the list and converting them into probabilities such that all values in the returned list sum to 1.0.<\/p>\n<p>This can be achieved by calculating the exponent of each value in the list and dividing it by the sum of the exponent values.<\/p>\n<ul>\n<li>probability = exp(value) \/ sum v in list exp(v)<\/li>\n<\/ul>\n<p>For example, we can turn the first value \u201c1\u201d in the list [1, 3, 2] into a probability as follows:<\/p>\n<ul>\n<li>probability = exp(1) \/ (exp(1) + exp(3) + exp(2))<\/li>\n<li>probability = exp(1) \/ (exp(1) + exp(3) + exp(2))<\/li>\n<li>probability = 2.718281828459045 \/ 30.19287485057736<\/li>\n<li>probability = 0.09003057317038046<\/li>\n<\/ul>\n<p>We can demonstrate this for each value in the list [1, 3, 2] in Python as follows:<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f8ca1b699c62576655471\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# transform values into probabilities<br \/>\nfrom math import exp<br \/>\n# calculate each probability<br \/>\np1 = exp(1) \/ (exp(1) + exp(3) + exp(2))<br \/>\np2 = exp(3) \/ (exp(1) + exp(3) + exp(2))<br \/>\np3 = exp(2) \/ (exp(1) + exp(3) + exp(2))<br \/>\n# report probabilities<br \/>\nprint(p1, p2, p3)<br \/>\n# report sum of probabilities<br \/>\nprint(p1 + p2 + p3)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># transform values into probabilities<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">math <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">exp<\/span><\/p>\n<p><span class=\"crayon-p\"># calculate each probability<\/span><\/p>\n<p><span class=\"crayon-v\">p1<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">\/<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">p2<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">\/<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">p3<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">\/<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># report probabilities<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">p1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">p2<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">p3<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># report sum of probabilities<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">p1<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">p2<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">p3<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0005 seconds] --><\/p>\n<p>Running the example converts each value in the list into a probability and reports the values, then confirms that all probabilities sum to the value 1.0.<\/p>\n<p>We can see that most weight is put on index 1 (67 percent) with less weight on index 2 (24 percent) and even less on index 0 (9 percent).<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f8ca1b699c63577273288\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n0.09003057317038046 0.6652409557748219 0.24472847105479767<br \/>\n1.0<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p>0.09003057317038046 0.6652409557748219 0.24472847105479767<\/p>\n<p>1.0<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>This is the softmax function.<\/p>\n<p>We can implement it as a function that takes a list of numbers and returns the softmax or multinomial probability distribution for the list.<\/p>\n<p>The example below implements the function and demonstrates it on our small list of numbers.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f8ca1b699c64682593461\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# example of a function for calculating softmax for a list of numbers<br \/>\nfrom numpy import exp<\/p>\n<p># calculate the softmax of a vector<br \/>\ndef softmax(vector):<br \/>\n\te = exp(vector)<br \/>\n\treturn e \/ e.sum()<\/p>\n<p># define data<br \/>\ndata = [1, 3, 2]<br \/>\n# convert list of numbers to a list of probabilities<br \/>\nresult = softmax(data)<br \/>\n# report the probabilities<br \/>\nprint(result)<br \/>\n# report the sum of the probabilities<br \/>\nprint(sum(result))<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># example of a function for calculating softmax for a list of numbers<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">exp<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># calculate the softmax of a vector<\/span><\/p>\n<p><span class=\"crayon-e\">def <\/span><span class=\"crayon-e\">softmax<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">vector<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">e<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">exp<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">vector<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-st\">return<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">e<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">\/<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">e<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">sum<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># define data<\/span><\/p>\n<p><span class=\"crayon-v\">data<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-p\"># convert list of numbers to a list of probabilities<\/span><\/p>\n<p><span class=\"crayon-v\">result<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">softmax<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># report the probabilities<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">result<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># report the sum of the probabilities<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-e\">sum<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">result<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0004 seconds] --><\/p>\n<p>Running the example reports roughly the same numbers with minor differences in precision.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f8ca1b699c65712611479\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n[0.09003057 0.66524096 0.24472847]<br \/>\n1.0<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p>[0.09003057 0.66524096 0.24472847]<\/p>\n<p>1.0<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>Finally, we can use the built-in <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.special.softmax.html\">softmax() NumPy function<\/a> to calculate the softmax for an array or list of numbers, as follows:<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f8ca1b699c66891484152\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# example of calculating the softmax for a list of numbers<br \/>\nfrom scipy.special import softmax<br \/>\n# define data<br \/>\ndata = [1, 3, 2]<br \/>\n# calculate softmax<br \/>\nresult = softmax(data)<br \/>\n# report the probabilities<br \/>\nprint(result)<br \/>\n# report the sum of the probabilities<br \/>\nprint(sum(result))<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># example of calculating the softmax for a list of numbers<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">scipy<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">special <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">softmax<\/span><\/p>\n<p><span class=\"crayon-p\"># define data<\/span><\/p>\n<p><span class=\"crayon-v\">data<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-p\"># calculate softmax<\/span><\/p>\n<p><span class=\"crayon-v\">result<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">softmax<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># report the probabilities<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">result<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># report the sum of the probabilities<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-e\">sum<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">result<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0002 seconds] --><\/p>\n<p>Running the example, again, we get very similar results with very minor differences in precision.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f8ca1b699c67377892542\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n[0.09003057 0.66524096 0.24472847]<br \/>\n0.9999999999999997<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p>[0.09003057 0.66524096 0.24472847]<\/p>\n<p>0.9999999999999997<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>Now that we are familiar with the softmax function, let\u2019s look at how it is used in a neural network model.<\/p>\n<h2>Softmax Activation Function<\/h2>\n<p>The softmax function is used as the activation function in the output layer of neural network models that predict a multinomial probability distribution.<\/p>\n<p>That is, softmax is used as the activation function for multi-class classification problems where class membership is required on more than two class labels.<\/p>\n<blockquote>\n<p>Any time we wish to represent a probability distribution over a discrete variable with n possible values, we may use the softmax function. This can be seen as a generalization of the sigmoid function which was used to represent a probability distribution over a binary variable.<\/p>\n<\/blockquote>\n<p>\u2014 Page 184, <a href=\"https:\/\/amzn.to\/33iMC06\">Deep Learning<\/a>, 2016.<\/p>\n<p>The function can be used as an activation function for a hidden layer in a neural network, although this is less common. It may be used when the model internally needs to choose or weight multiple different inputs at a bottleneck or concatenation layer.<\/p>\n<blockquote>\n<p>Softmax units naturally represent a probability distribution over a discrete variable with k possible values, so they may be used as a kind of switch.<\/p>\n<\/blockquote>\n<p>\u2014 Page 196, <a href=\"https:\/\/amzn.to\/33iMC06\">Deep Learning<\/a>, 2016.<\/p>\n<p>In the Keras deep learning library with a three-class classification task, use of softmax in the output layer may look as follows:<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f8ca1b699c68586811498\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\nmodel.add(Dense(3, activation=&#8217;softmax&#8217;))<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">add<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-e\">Dense<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">activation<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;softmax&#8217;<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>By definition, the softmax activation will output one value for each node in the output layer. The output values will represent (or can be interpreted as) probabilities and the values sum to 1.0.<\/p>\n<p>When modeling a multi-class classification problem, the data must be prepared. The target variable containing the class labels is first label encoded, meaning that an integer is applied to each class label from 0 to N-1, where N is the number of class labels.<\/p>\n<p>The label encoded (or integer encoded) target variables are then one-hot encoded. This is a probabilistic representation of the class label, much like the softmax output. A vector is created with a position for each class label and the position. All values are marked 0 (impossible) and a 1 (certain) is used to mark the position for the class label.<\/p>\n<p>For example, three class labels will be integer encoded as 0, 1, and 2. Then encoded to vectors as follows:<\/p>\n<ul>\n<li>Class 0: [1, 0, 0]<\/li>\n<li>Class 1: [0, 1, 0]<\/li>\n<li>Class 2: [0, 0, 1]<\/li>\n<\/ul>\n<p>This is called a <a href=\"https:\/\/machinelearningmastery.com\/why-one-hot-encode-data-in-machine-learning\/\">one-hot encoding<\/a>.<\/p>\n<p>It represents the expected multinomial probability distribution for each class used to correct the model under supervised learning.<\/p>\n<p>The softmax function will output a probability of class membership for each class label and attempt to best approximate the expected target for a given input.<\/p>\n<p>For example, if the integer encoded class 1 was expected for one example, the target vector would be:<\/p>\n<p>The softmax output might look as follows, which puts the most weight on class 1 and less weight on the other classes.<\/p>\n<ul>\n<li>[0.09003057 0.66524096 0.24472847]<\/li>\n<\/ul>\n<p>The error between the expected and predicted multinomial probability distribution is often calculated using cross-entropy, and this error is then used to update the model. This is called the cross-entropy loss function.<\/p>\n<p>For more on cross-entropy for calculating the difference between probability distributions, see the tutorial:<\/p>\n<p>We may want to convert the probabilities back into an integer encoded class label.<\/p>\n<p>This can be achieved using the <em>argmax()<\/em> function that returns the index of the list with the largest value. Given that the class labels are integer encoded from 0 to N-1, the argmax of the probabilities will always be the integer encoded class label.<\/p>\n<ul>\n<li>class integer = argmax([0.09003057 0.66524096 0.24472847])<\/li>\n<li>class integer = 1<\/li>\n<\/ul>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Books<\/h3>\n<h3>APIs<\/h3>\n<h3>Articles<\/h3>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered the softmax activation function used in neural network models.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Linear and Sigmoid activation functions are inappropriate for multi-class classification tasks.<\/li>\n<li>Softmax can be thought of as a softened version of the argmax function that returns the index of the largest value in a list.<\/li>\n<li>How to implement the softmax function from scratch in Python and how to convert the output into a class label.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>Ask your questions in the comments below and I will do my best to answer.<\/p>\n<div class=\"widget_text awac-wrapper\" id=\"custom_html-75\">\n<div class=\"widget_text awac widget custom_html-75\">\n<div class=\"textwidget custom-html-widget\">\n<div>\n<h2>Develop Deep Learning Projects with Python!<\/h2>\n<p><a href=\"\/deep-learning-with-python\/\" rel=\"nofollow\"><img decoding=\"async\" src=\"https:\/\/3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com\/wp-content\/uploads\/2016\/05\/DeepLearningWithPython-220.png\" alt=\"Deep Learning with Python\" align=\"left\"><\/a><\/p>\n<h4>\u00a0What If You Could Develop A Network in Minutes<\/h4>\n<p>&#8230;with just a few lines of Python<\/p>\n<p>Discover how in my new Ebook: <br \/><a href=\"\/deep-learning-with-python\/\" rel=\"nofollow\">Deep Learning With Python<\/a><\/p>\n<p>It covers <strong>end-to-end projects<\/strong> on topics like:<br \/><em>Multilayer Perceptrons<\/em>,\u00a0<em>Convolutional Nets<\/em> and\u00a0<em>Recurrent Neural Nets<\/em>, and more&#8230;<\/p>\n<h4>Finally Bring Deep Learning To<br \/>Your Own Projects<\/h4>\n<p>Skip the Academics. Just\u00a0Results.<\/p>\n<p><a href=\"\/deep-learning-with-python\/\" class=\"woo-sc-button  red\"><span class=\"woo-\">See What&#8217;s Inside<\/span><\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/machinelearningmastery.com\/softmax-activation-function-with-python\/<\/p>\n","protected":false},"author":0,"featured_media":421,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/420"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=420"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/420\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/421"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=420"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=420"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=420"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}