{"id":1453,"date":"2022-01-07T03:18:31","date_gmt":"2022-01-07T03:18:31","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/01\/07\/anomaly-detection-with-isolation-forest-and-kernel-density-estimation\/"},"modified":"2022-01-07T03:18:31","modified_gmt":"2022-01-07T03:18:31","slug":"anomaly-detection-with-isolation-forest-and-kernel-density-estimation","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/01\/07\/anomaly-detection-with-isolation-forest-and-kernel-density-estimation\/","title":{"rendered":"Anomaly Detection with Isolation Forest and Kernel Density Estimation"},"content":{"rendered":"<div id=\"\">\n<p>Anomaly detection is to find data points that deviate from the norm. In other words, those are the points that do not follow expected patterns. Outliers and exceptions are terms used to describe unusual data. Anomaly detection is important in a variety of fields because it gives valuable and actionable insights. An abnormality in an MR imaging scan, for instance, might indicate tumorous region in the brain, while an anomalous readout from a manufacturing plant sensor could indicate a broken component.<\/p>\n<p>After going through this tutorial, you will be able to:<\/p>\n<ul>\n<li>Define and understand the anomaly detection.<\/li>\n<li>Implement the anomaly detection algorithms to analyze and interpret the results.<\/li>\n<li>See hidden patterns in any data that may lead to an anomalous behavior.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_13172\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-13172\" class=\"size-full wp-image-13172\" data-cfsrc=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/12\/katherine-chase-BzF1XBy5xOc-unsplash.jpg\" alt=\"\" width=\"800\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13172\" class=\"size-full wp-image-13172\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/12\/katherine-chase-BzF1XBy5xOc-unsplash.jpg\" alt=\"\" width=\"800\"><\/p>\n<p id=\"caption-attachment-13172\" class=\"wp-caption-text\">Anomaly Detection with Isolation Forest and Kernel Density Estimation<br \/>Photo by <a href=\"https:\/\/unsplash.com\/photos\/BzF1XBy5xOc\">Katherine Chase<\/a>. Some rights reserved.<\/p>\n<\/div>\n<h2><strong>What is Anomaly Detection?<\/strong><\/h2>\n<p>An outlier is simply a data point that deviates considerably from the rest of the data points in a particular dataset. Similarly, anomaly detection is the process that helps us to identify the data outliers, or points that deviate considerably from the bulk of other data points.<\/p>\n<p>When it comes to large datasets, there may include very complex patterns that cannot be detected by simply looking at the data. Therefore, in order to implement a critical machine learning application, the study of anomaly detection is of great significance.<\/p>\n<h2><strong>Types of Anomalies<\/strong><\/h2>\n<p>In data science domain, we have three different ways to classify anomalies. Understanding them correctly may have a big impact on how you handle anomalies.<\/p>\n<ul>\n<li><strong>Point or Global Anomalies: <\/strong>Corresponding to the data points that differ significantly from the rest of the data points, global anomalies are known to be the most common form of anomalies. Usually, global anomalies are found very far away from the mean or median of any data distribution.<\/li>\n<li><strong>Contextual\u00a0or\u00a0Conditional Anomalies: <\/strong>These anomalies have values that differ dramatically from those of the other data points in the same context. Anomalies in one dataset may not be anomalies in another.<\/li>\n<li><strong>Collective Anomalies:<\/strong> The outlier objects that are tightly clustered because they have the same outlier character are referred to as collective outliers. For example, your server is not under a cyber-attack on a daily basis, therefore, it would be consider as an outlier.<\/li>\n<\/ul>\n<p>While there are a number of techniques used for anomaly detection, let\u2019s implement a few to understand how they can be used for various use cases.<\/p>\n<h2>Isolation Forest<\/h2>\n<p>Just like the random forests, <strong>isolation forests<\/strong> are built using decision trees. They are implemented in an unsupervised fashion as there are no pre-defined labels. Isolation forests were designed with the idea that anomalies are \u201cfew and distinct\u201d data points in a dataset.<\/p>\n<p>Recall that decision trees are built using information criteria such as Gini index or entropy. The obviously different groups are separated at the root of the tree and deeper into the branches, the subtler distinctions are identified. Based on randomly picked characteristics, an isolation forest processes the randomly subsampled data in a tree structure. Samples that reach further into the tree and require more cuts to separate them have a very little probability that they are anomalies. Likewise, samples that are found on the shorter branches of the tree are more likely to be anomalies, since the tree found it simpler to distinguish them from the other data.<\/p>\n<p>In this session, we will implement isolation forest in Python to understand how it detects anomalies in a dataset. We all are aware of the incredible scikit-learn API that provides various APIs for easy implementations. Hence, we will be using it to apply Isolation Forests to demonstrate its effectiveness for anomaly detection.<\/p>\n<p>First off, let\u2019s load up the necessary libraries and packages.<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b69953482185122\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nfrom sklearn.datasets import make_blobs<br \/>\nfrom numpy import quantile, random, where<br \/>\nfrom sklearn.ensemble import IsolationForest<br \/>\nimport matplotlib.pyplot as plt<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">make_blobs<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">quantile<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">where<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">ensemble <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">IsolationForest<\/span><\/p>\n<p><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">matplotlib<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">pyplot <\/span><span class=\"crayon-st\">as<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">plt<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<h3><strong>Data Preparation<\/strong><\/h3>\n<p>We\u2019ll be using <code>make_blob()<\/code> function to create a dataset with random data points.<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b69959933236261\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nrandom.seed(3)<br \/>\nX, _ = make_blobs(n_samples=300, centers=1, cluster_std=.3, center_box=(20, 5))<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">seed<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">_<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">make_blobs<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_samples<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">300<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">centers<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">cluster_std<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">center_box<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">20<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p>Let\u2019s visualize the dataset plot to see the data points separated randomly in a sample space.<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b6995a304094353\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nplt.scatter(X[:, 0], X[:, 1], marker=&#8221;o&#8221;, c=_, s=25, edgecolor=&#8221;k&#8221;)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">scatter<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">marker<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8220;o&#8221;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">c<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">_<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">s<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">25<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">edgecolor<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8220;k&#8221;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p><img loading=\"lazy\" width=\"644\" height=\"410\" class=\"aligncenter size-full wp-image-13166\" data-cfsrc=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/12\/anomaly-1.png\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"644\" height=\"410\" class=\"aligncenter size-full wp-image-13166\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/12\/anomaly-1.png\"><\/p>\n<h3><strong>Defining and Fitting the Isolation Forest Model for Prediction<\/strong><\/h3>\n<p>As mentioned, we\u2019ll use <code>IsolationForest<\/code> class from the scikit-learn API to define our model. In the class arguments, we\u2019ll set the number of estimators and the contamination value. Then we\u2019ll use the <code>fit_predict()<\/code> function to get the predictions for the dataset by fitting it to the model.<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b6995b800412033\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nIF = IsolationForest(n_estimators=100, contamination=.03)<br \/>\npredictions = IF.fit_predict(X)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-st\">IF<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">IsolationForest<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_estimators<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">100<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">contamination<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-cn\">03<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">predictions<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-st\">IF<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit_predict<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p>Now, let\u2019s extract the negative values as outliers and plot the results with anomalies highlighted in a color.<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b6995c208943553\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\noutlier_index = where(predictions==-1)<br \/>\nvalues = X[outlier_index]<\/p>\n<p>plt.scatter(X[:,0], X[:,1])<br \/>\nplt.scatter(values[:,0], values[:,1], color=&#8217;y&#8217;)<br \/>\nplt.show()<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-v\">outlier_index<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">where<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">predictions<\/span><span class=\"crayon-o\">==<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">values<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-v\">outlier_index<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">scatter<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">scatter<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">values<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">values<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">color<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;y&#8217;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">show<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p><img loading=\"lazy\" width=\"642\" height=\"420\" class=\"aligncenter size-full wp-image-13167\" data-cfsrc=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/12\/anomaly-2.png\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"642\" height=\"420\" class=\"aligncenter size-full wp-image-13167\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/12\/anomaly-2.png\"><\/p>\n<p>Putting all these together, the following is the complete code:<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b6995d613589404\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nfrom sklearn.datasets import make_blobs<br \/>\nfrom numpy import quantile, random, where<br \/>\nfrom sklearn.ensemble import IsolationForest<br \/>\nimport matplotlib.pyplot as plt<\/p>\n<p>random.seed(3)<br \/>\nX, _ = make_blobs(n_samples=300, centers=1, cluster_std=.3, center_box=(20, 5))<br \/>\nplt.scatter(X[:, 0], X[:, 1], marker=&#8221;o&#8221;, c=_, s=25, edgecolor=&#8221;k&#8221;)<\/p>\n<p>IF = IsolationForest(n_estimators=100, contamination=.03)<br \/>\npredictions = IF.fit_predict(X)<\/p>\n<p>outlier_index = where(predictions==-1)<br \/>\nvalues = X[outlier_index]<br \/>\nplt.scatter(X[:,0], X[:,1])<br \/>\nplt.scatter(values[:,0], values[:,1], color=&#8217;y&#8217;)<br \/>\nplt.show()<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">make_blobs<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">quantile<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">where<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">ensemble <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">IsolationForest<\/span><\/p>\n<p><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">matplotlib<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">pyplot <\/span><span class=\"crayon-st\">as<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">plt<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">seed<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">_<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">make_blobs<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_samples<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">300<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">centers<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">cluster_std<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">center_box<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">20<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">scatter<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">marker<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8220;o&#8221;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">c<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">_<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">s<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">25<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">edgecolor<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8220;k&#8221;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-st\">IF<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">IsolationForest<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_estimators<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">100<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">contamination<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-cn\">03<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">predictions<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-st\">IF<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit_predict<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">outlier_index<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">where<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">predictions<\/span><span class=\"crayon-o\">==<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">values<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-v\">outlier_index<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">scatter<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">scatter<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">values<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">values<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">color<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;y&#8217;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">show<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<h2><strong>Kernel Density Estimation<\/strong><\/h2>\n<p>If we consider the norm of a dataset should fit certain kind of probability distribution, the anomaly are those that we should see them rarely, or in a very low probability. Kernel density estimation is a technique that estimates the probability density function of the data points randomly in a sample space. With the density function, we can detect anomalies in a dataset.<\/p>\n<p>For implementation, we\u2019ll prepare data by creating a uniform distribution and then apply <code>KernelDensity<\/code> class from scikit-learn library to detect outliers.<\/p>\n<p>To start, we\u2019ll load necessary libraries and packages.<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b6995f904127695\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nfrom sklearn.neighbors import KernelDensity<br \/>\nfrom numpy import where, random, array, quantile<br \/>\nfrom sklearn.preprocessing import scale<br \/>\nimport matplotlib.pyplot as plt<br \/>\nfrom sklearn.datasets import load_boston<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">neighbors <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">KernelDensity<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">where<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-t\">array<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">quantile<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">preprocessing <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">scale<\/span><\/p>\n<p><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">matplotlib<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">pyplot <\/span><span class=\"crayon-st\">as<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">plt<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">load_boston<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<h3><strong>Prepare and Plot the Data<\/strong><\/h3>\n<p>Let\u2019s write a simple function to prepare the dataset. A randomly generated data will be used as a target dataset.<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b69960566682920\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nrandom.seed(135)<br \/>\ndef prepData(N):<br \/>\n    X = []<br \/>\n    for i in range(n):<br \/>\n        A = i\/1000 + random.uniform(-4, 3)<br \/>\n        R = random.uniform(-5, 10)<br \/>\n        if(R &gt;= 8.6):<br \/>\n            R = R + 10<br \/>\n        elif(R &lt; (-4.6)):<br \/>\n            R = R +(-9)<br \/>\n        X.append([A + R])<br \/>\n    return array(X)<\/p>\n<p>n = 500<br \/>\nX = prepData(n)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">seed<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">135<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">def <\/span><span class=\"crayon-e\">prepData<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">N<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-st\">for<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">i<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-st\">in<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">range<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">A<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">i<\/span><span class=\"crayon-o\">\/<\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">uniform<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">4<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">uniform<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-st\">if<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">&gt;=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">8.6<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">10<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-e\">elif<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">&lt;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">4.6<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">9<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">append<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-v\">A<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\">\u00a0\u00a0 <\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-st\">return<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-t\">array<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">n<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">500<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">prepData<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p>Let\u2019s visualize the plot to check the dataset.<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b69961579024670\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nx_ax = range(n)<br \/>\nplt.plot(x_ax, X)<br \/>\nplt.show()<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-v\">x_ax<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">range<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">plot<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">x_ax<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">show<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p><img loading=\"lazy\" width=\"624\" height=\"418\" class=\"aligncenter size-full wp-image-13168\" data-cfsrc=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/12\/anomaly-3.png\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"624\" height=\"418\" class=\"aligncenter size-full wp-image-13168\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/12\/anomaly-3.png\"><\/p>\n<h3><strong>Prepare and Fit the Kernel Density Function for Prediction<\/strong><\/h3>\n<p>We\u2019ll use scikit-learn API to prepare and fit the model. Then use <code>score_sample()<\/code> function to get the scores of samples in the dataset. Next, we\u2019ll use <code>quantile()<\/code> function to obtain the threshold value.<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b69962368918243\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nkern_dens = KernelDensity()<br \/>\nkern_dens.fit(X)<\/p>\n<p>scores = kern_dens.score_samples(X)<br \/>\nthreshold = quantile(scores, .02)<br \/>\nprint(threshold)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-v\">kern_dens<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">KernelDensity<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">kern_dens<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">scores<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">kern_dens<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">score_samples<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">threshold<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">quantile<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-cn\">02<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">threshold<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p>Samples with equal or lower scores than the obtained threshold will be detected, and then visualized with anomalies highlighted in a color:<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b69966857654774\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nidx = where(scores &lt;= threshold)<br \/>\nvalues = X[idx]<\/p>\n<p>plt.plot(x_ax, X)<br \/>\nplt.scatter(idx,values, color=&#8217;r&#8217;)<br \/>\nplt.show()<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-v\">idx<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">where<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">&lt;=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">threshold<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">values<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-v\">idx<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">plot<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">x_ax<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">scatter<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">idx<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-v\">values<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">color<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;r&#8217;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">show<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p><img loading=\"lazy\" width=\"622\" height=\"420\" class=\"aligncenter size-full wp-image-13169\" data-cfsrc=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/12\/anomaly-4.png\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"622\" height=\"420\" class=\"aligncenter size-full wp-image-13169\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/12\/anomaly-4.png\"><\/p>\n<p>Putting all these together, the following is the complete code:<\/p>\n<div id=\"urvanov-syntax-highlighter-61d7a80b69967312000535\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nfrom sklearn.neighbors import KernelDensity<br \/>\nfrom numpy import where, random, array, quantile<br \/>\nfrom sklearn.preprocessing import scale<br \/>\nimport matplotlib.pyplot as plt<br \/>\nfrom sklearn.datasets import load_boston<\/p>\n<p>random.seed(135)<br \/>\ndef prepData(N):<br \/>\n    X = []<br \/>\n    for i in range(n):<br \/>\n        A = i\/1000 + random.uniform(-4, 3)<br \/>\n        R = random.uniform(-5, 10)<br \/>\n        if(R &gt;= 8.6):<br \/>\n            R = R + 10<br \/>\n        elif(R &lt; (-4.6)):<br \/>\n            R = R +(-9)<br \/>\n        X.append([A + R])<br \/>\n    return array(X)<\/p>\n<p>n = 500<br \/>\nX = prepData(n)<\/p>\n<p>x_ax = range(n)<br \/>\nplt.plot(x_ax, X)<br \/>\nplt.show() <\/p>\n<p>kern_dens = KernelDensity()<br \/>\nkern_dens.fit(X)<\/p>\n<p>scores = kern_dens.score_samples(X)<br \/>\nthreshold = quantile(scores, .02)<br \/>\nprint(threshold)<\/p>\n<p>idx = where(scores &lt;= threshold)<br \/>\nvalues = X[idx]<br \/>\nplt.plot(x_ax, X)<br \/>\nplt.scatter(idx,values, color=&#8217;r&#8217;)<br \/>\nplt.show()<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<p>18<\/p>\n<p>19<\/p>\n<p>20<\/p>\n<p>21<\/p>\n<p>22<\/p>\n<p>23<\/p>\n<p>24<\/p>\n<p>25<\/p>\n<p>26<\/p>\n<p>27<\/p>\n<p>28<\/p>\n<p>29<\/p>\n<p>30<\/p>\n<p>31<\/p>\n<p>32<\/p>\n<p>33<\/p>\n<p>34<\/p>\n<p>35<\/p>\n<p>36<\/p>\n<p>37<\/p>\n<p>38<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">neighbors <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">KernelDensity<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">where<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-t\">array<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">quantile<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">preprocessing <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">scale<\/span><\/p>\n<p><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">matplotlib<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">pyplot <\/span><span class=\"crayon-st\">as<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">plt<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">load_boston<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">seed<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">135<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">def <\/span><span class=\"crayon-e\">prepData<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">N<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-st\">for<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">i<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-st\">in<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">range<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">A<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">i<\/span><span class=\"crayon-o\">\/<\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">uniform<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">4<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">uniform<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-st\">if<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">&gt;=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">8.6<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">10<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-e\">elif<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">&lt;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">4.6<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">9<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">append<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-v\">A<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">+<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">R<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\">\u00a0\u00a0 <\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-st\">return<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-t\">array<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">n<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">500<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">prepData<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">x_ax<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">range<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">plot<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">x_ax<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">show<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">kern_dens<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">KernelDensity<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">kern_dens<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">scores<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">kern_dens<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">score_samples<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">threshold<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">quantile<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-cn\">02<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">threshold<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">idx<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">where<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">&lt;=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">threshold<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">values<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-v\">idx<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">plot<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">x_ax<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">scatter<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">idx<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-v\">values<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">color<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;r&#8217;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">plt<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">show<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<h2><b>Further Reading<\/b><\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3><b>APIs<\/b><\/h3>\n<h2><strong>Summary<\/strong><\/h2>\n<p>In this tutorial, you discovered how to detect anomalies in your dataset.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to define anomalies and their different types<\/li>\n<li>What is Isolation Forest and how to use it for anomaly detection<\/li>\n<li>What is Kernel Density Estimation and how to use it for anomaly detection<\/li>\n<\/ul>\n<div class=\"widget_text awac-wrapper\" id=\"custom_html-76\">\n<div class=\"widget_text awac widget custom_html-76\">\n<div class=\"textwidget custom-html-widget\">\n<div>\n<h2>Discover How Machine Learning Algorithms Work!<\/h2>\n<p><a href=\"\/master-machine-learning-algorithms\/\" rel=\"nofollow\"><img width=\"220\" height=\"311\" data-cfstyle=\"border: 0;\" data-cfsrc=\"\/wp-content\/uploads\/2016\/03\/MasterMachineLearningAlgorithms-small.png\" alt=\"Mater Machine Learning Algorithms\" align=\"left\"><img decoding=\"async\" loading=\"lazy\" width=\"220\" height=\"311\" src=\"\/wp-content\/uploads\/2016\/03\/MasterMachineLearningAlgorithms-small.png\" alt=\"Mater Machine Learning Algorithms\" align=\"left\"><\/a><\/p>\n<h4>See How Algorithms Work in Minutes<\/h4>\n<p>&#8230;with just arithmetic and simple examples<\/p>\n<p>Discover how in my new Ebook: <br \/><a href=\"\/master-machine-learning-algorithms\/\" rel=\"nofollow\">Master Machine Learning Algorithms<\/a><\/p>\n<p>It covers <strong>explanations<\/strong> and <strong>examples<\/strong> of <strong>10 top algorithms<\/strong>, like:<br \/><em>Linear Regression<\/em>, <em>k-Nearest Neighbors<\/em>, <em>Support Vector Machines<\/em> and much more&#8230;<\/p>\n<h4>Finally, Pull Back the Curtain on<br \/>Machine Learning Algorithms<\/h4>\n<p>Skip the Academics. Just Results.<\/p>\n<p><a href=\"\/master-machine-learning-algorithms\/\" class=\"woo-sc-button  red\"><span class=\"woo-\">See What&#8217;s Inside<\/span><\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/machinelearningmastery.com\/anomaly-detection-with-isolation-forest-and-kernel-density-estimation\/<\/p>\n","protected":false},"author":0,"featured_media":1454,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1453"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1453"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1453\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1454"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1453"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1453"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1453"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}