{"id":474,"date":"2020-10-29T18:25:01","date_gmt":"2020-10-29T18:25:01","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/10\/29\/how-to-develop-a-random-subspace-ensemble-with-python\/"},"modified":"2020-10-29T18:25:01","modified_gmt":"2020-10-29T18:25:01","slug":"how-to-develop-a-random-subspace-ensemble-with-python","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/10\/29\/how-to-develop-a-random-subspace-ensemble-with-python\/","title":{"rendered":"How to Develop a Random Subspace Ensemble With Python"},"content":{"rendered":"<div id=\"\">\n<p><strong>Random Subspace Ensemble<\/strong> is a machine learning algorithm that combines the predictions from multiple decision trees trained on different subsets of columns in the training dataset.<\/p>\n<p>Randomly varying the columns used to train each contributing member of the ensemble has the effect of introducing diversity into the ensemble and, in turn, can lift performance over using a single decision tree.<\/p>\n<p>It is related to other ensembles of decision trees such as bootstrap aggregation (bagging) that creates trees using different samples of rows from the training dataset, and random forest that combines ideas from bagging and the random subspace ensemble.<\/p>\n<p>Although decision trees are often used, the general random subspace method can be used with any machine learning model whose performance varies meaningfully with the choice of input features.<\/p>\n<p>In this tutorial, you will discover how to develop random subspace ensembles for classification and regression.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Random subspace ensembles are created from decision trees fit on different samples of features (columns) in the training dataset.<\/li>\n<li>How to use the random subspace ensemble for classification and regression with scikit-learn.<\/li>\n<li>How to explore the effect of random subspace model hyperparameters on model performance.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_11151\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-11151\" loading=\"lazy\" class=\"size-full wp-image-11151\" src=\"https:\/\/3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com\/wp-content\/uploads\/2020\/11\/How-to-Develop-a-Random-Subspace-Ensemble-With-Python.jpg\" alt=\"How to Develop a Random Subspace Ensemble With Python\" width=\"800\" height=\"537\"><\/p>\n<p id=\"caption-attachment-11151\" class=\"wp-caption-text\">How to Develop a Random Subspace Ensemble With Python<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/mars_\/18002370619\/\">Marsel Minga<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Random Subspace Ensemble<\/li>\n<li>Random Subspace Ensemble via Bagging\n<ol>\n<li>Random Subspace Ensemble for Classification<\/li>\n<li>Random Subspace Ensemble for Regression<\/li>\n<\/ol>\n<\/li>\n<li>Random Subspace Ensemble Hyperparameters\n<ol>\n<li>Explore Number of Trees<\/li>\n<li>Explore Number of Features<\/li>\n<li>Explore Alternate Algorithm<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h2>Random Subspace Ensemble<\/h2>\n<p>A predictive modeling problem consists of one or more input variables and a target variable.<\/p>\n<p>A variable is a column in the data and is also often referred to as a feature. We can consider all input features together as defining an n-dimensional vector space, where n is the number of input features and each example (input row of data) is a point in the feature space.<\/p>\n<p>This is a common conceptualization in machine learning and as input feature spaces become larger, the distance between points in the space increases, known generally as the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Curse_of_dimensionality\">curse of dimensionality<\/a>.<\/p>\n<p>A subset of input features can, therefore, be thought of as a subset of the input feature space, or a subspace.<\/p>\n<p>Selecting features is a way of defining a subspace of the input feature space. For example, <a href=\"https:\/\/machinelearningmastery.com\/feature-selection-with-real-and-categorical-data\/\">feature selection<\/a> refers to an attempt to reduce the number of dimensions of the input feature space by selecting a subset of features to keep or a subset of features to delete, often based on their relationship to the target variable.<\/p>\n<p>Alternatively, we can select random subsets of input features to define random subspaces. This can be used as the basis for an ensemble learning algorithm, where a model can be fit on each random subspace of features. This is referred to as a random subspace ensemble or the <strong>random subspace method<\/strong>.<\/p>\n<blockquote>\n<p>The training data is usually described by a set of features. Different subsets of features, or called subspaces, provide different views on the data. Therefore, individual learners trained from different subspaces are usually diverse.<\/p>\n<\/blockquote>\n<p>\u2014 Page 116, <a href=\"https:\/\/amzn.to\/2XZzrjG\">Ensemble Methods<\/a>, 2012.<\/p>\n<p>It was proposed by Tin Kam Ho in the 1998 paper titled \u201c<a href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/709601\/\">The Random Subspace Method For Constructing Decision Forests<\/a>\u201d where a decision tree is fit on each random subspace.<\/p>\n<p>More generally, it is a diversity technique for ensemble learning that belongs to a class of methods that change the training dataset for each model in the attempt to reduce the correlation between the predictions of the models in the ensemble.<\/p>\n<p>The procedure is as simple as selecting a random subset of input features (columns) for each model in the ensemble and fitting the model on the model in the entire training dataset. It can be augmented with additional changes, such as using a bootstrap or random sample of the rows in training dataset.<\/p>\n<blockquote>\n<p>The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/709601\/\">The Random Subspace Method For Constructing Decision Forests<\/a>, 1998.<\/p>\n<p>As such, the random subspace ensemble is related to bootstrap aggregation (bagging) that introduces diversity by training each model, often a decision tree, on a different random sample of the training dataset, with replacement (e.g. the bootstrap sampling method). The random forest ensemble may also be considered a hybrid of both the bagging and random subset ensemble methods.<\/p>\n<blockquote>\n<p>Algorithms that use different feature subsets are commonly referred to as random subspace methods \u2026<\/p>\n<\/blockquote>\n<p>\u2014 Page 21, <a href=\"https:\/\/amzn.to\/2C7syo5\">Ensemble Machine Learning<\/a>, 2012.<\/p>\n<p>The random subspace method can be used with any machine learning algorithm, although it is well suited to models that are sensitive to large changes to the input features, such as decision trees and k-nearest neighbors.<\/p>\n<p>It is appropriate for datasets that have a large number of input features, as it can result in good performance with good efficiency. If the dataset contains many irrelevant input features, it may be better to use feature selection as a data preparation technique as the prevalence of irrelevant features in subspaces may hurt the performance of the ensemble.<\/p>\n<blockquote>\n<p>For data with a lot of redundant features, training a learner in a subspace will be not only effective but also efficient.<\/p>\n<\/blockquote>\n<p>\u2014 Page 116, <a href=\"https:\/\/amzn.to\/2XZzrjG\">Ensemble Methods<\/a>, 2012.<\/p>\n<p>Now that we are familiar with the random subspace ensemble, let\u2019s explore how we can implement the approach.<\/p>\n<h2>Random Subspace Ensemble via Bagging<\/h2>\n<p>We can implement the random subspace ensemble using bagging in scikit-learn.<\/p>\n<p>Bagging is provided via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.BaggingRegressor.html\">BaggingRegressor<\/a> and <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.BaggingClassifier.html\">BaggingClassifier<\/a> classes.<\/p>\n<p>We can configure bagging to be a random subspace ensemble by setting the \u201c<em>bootstrap<\/em>\u201d argument to \u201c<em>False<\/em>\u201d to turn off sampling of the training dataset rows and setting the maximum number of features to a given value via the \u201c<em>max_features<\/em>\u201d argument.<\/p>\n<p>The default model for bagging is a decision tree, but it can be changed to any model we like.<\/p>\n<p>We can demonstrate using bagging to implement a random subspace ensemble with decision trees for classification and regression.<\/p>\n<h3>Random Subspace Ensemble for Classification<\/h3>\n<p>In this section, we will look at developing a random subspace ensemble using bagging for a classification problem.<\/p>\n<p>First, we can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">make_classification() function<\/a> to create a synthetic binary classification problem with 1,000 examples and 20 input features.<\/p>\n<p>The complete example is listed below.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b08558009d057849845\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# test classification dataset<br \/>\nfrom sklearn.datasets import make_classification<br \/>\n# define dataset<br \/>\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)<br \/>\n# summarize the dataset<br \/>\nprint(X.shape, y.shape)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># test classification dataset<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">make<\/span><span class=\"crayon-sy\">_<\/span>classification<\/p>\n<p><span class=\"crayon-p\"># define dataset<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">make_classification<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_samples<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">20<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_informative<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">15<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_redundant<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># summarize the dataset<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">shape<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">shape<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0013 seconds] --><\/p>\n<p>Running the example creates the dataset and summarizes the shape of the input and output components.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>Next, we can configure a bagging model to be a random subspace ensemble for decision trees on this dataset.<\/p>\n<p>Each model will be fit on a random subspace of 10 input features, chosen arbitrarily.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800a4547425062\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# define the random subspace ensemble model<br \/>\nmodel = BaggingClassifier(bootstrap=False, max_features=10)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># define the random subspace ensemble model<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">BaggingClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">bootstrap<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">False<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">max_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0002 seconds] --><\/p>\n<p>We will evaluate the model using repeated stratified k-fold cross-validation, with three repeats and 10 folds. We will report the mean and standard deviation of the accuracy of the model across all repeats and folds.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800a5112960161\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# evaluate random subspace ensemble via bagging for classification<br \/>\nfrom numpy import mean<br \/>\nfrom numpy import std<br \/>\nfrom sklearn.datasets import make_classification<br \/>\nfrom sklearn.model_selection import cross_val_score<br \/>\nfrom sklearn.model_selection import RepeatedStratifiedKFold<br \/>\nfrom sklearn.ensemble import BaggingClassifier<br \/>\n# define dataset<br \/>\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)<br \/>\n# define the random subspace ensemble model<br \/>\nmodel = BaggingClassifier(bootstrap=False, max_features=10)<br \/>\n# define the evaluation method<br \/>\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)<br \/>\n# evaluate the model on the dataset<br \/>\nn_scores = cross_val_score(model, X, y, scoring=&#8217;accuracy&#8217;, cv=cv, n_jobs=-1)<br \/>\n# report performance<br \/>\nprint(&#8216;Mean Accuracy: %.3f (%.3f)&#8217; % (mean(n_scores), std(n_scores)))<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># evaluate random subspace ensemble via bagging for classification<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">mean<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">std<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">make_classification<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">cross_val_score<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">RepeatedStratifiedKFold<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">ensemble <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">BaggingClassifier<\/span><\/p>\n<p><span class=\"crayon-p\"># define dataset<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">make_classification<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_samples<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">20<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_informative<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">15<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_redundant<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># define the random subspace ensemble model<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">BaggingClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">bootstrap<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">False<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">max_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># define the evaluation method<\/span><\/p>\n<p><span class=\"crayon-v\">cv<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">RepeatedStratifiedKFold<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_splits<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_repeats<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># evaluate the model on the dataset<\/span><\/p>\n<p><span class=\"crayon-v\">n_scores<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">cross_val_score<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">scoring<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;accuracy&#8217;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_jobs<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># report performance<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;Mean Accuracy: %.3f (%.3f)&#8217;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">%<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-e\">mean<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_scores<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">std<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_scores<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0006 seconds] --><\/p>\n<p>Running the example reports the mean and standard deviation accuracy of the model.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see the random subspace ensemble with default hyperparameters achieves a classification accuracy of about 85.4 percent on this test dataset.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800a6923174632\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nMean Accuracy: 0.854 (0.039)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p>Mean Accuracy: 0.854 (0.039)<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>We can also use the random subspace ensemble model as a final model and make predictions for classification.<\/p>\n<p>First, the ensemble is fit on all available data, then the <em>predict()<\/em> function can be called to make predictions on new data.<\/p>\n<p>The example below demonstrates this on our binary classification dataset.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800a7462205136\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# make predictions using random subspace ensemble via bagging for classification<br \/>\nfrom sklearn.datasets import make_classification<br \/>\nfrom sklearn.ensemble import BaggingClassifier<br \/>\n# define dataset<br \/>\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)<br \/>\n# define the model<br \/>\nmodel = BaggingClassifier(bootstrap=False, max_features=10)<br \/>\n# fit the model on the whole dataset<br \/>\nmodel.fit(X, y)<br \/>\n# make a single prediction<br \/>\nrow = [[-4.7705504,-1.88685058,-0.96057964,2.53850317,-6.5843005,3.45711663,-7.46225013,2.01338213,-0.45086384,-1.89314931,-2.90675203,-0.21214568,-0.9623956,3.93862591,0.06276375,0.33964269,4.0835676,1.31423977,-2.17983117,3.1047287]]<br \/>\nyhat = model.predict(row)<br \/>\nprint(&#8216;Predicted Class: %d&#8217; % yhat[0])<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># make predictions using random subspace ensemble via bagging for classification<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">make_classification<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">ensemble <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">BaggingClassifier<\/span><\/p>\n<p><span class=\"crayon-p\"># define dataset<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">make_classification<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_samples<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">20<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_informative<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">15<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_redundant<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># define the model<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">BaggingClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">bootstrap<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">False<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">max_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># fit the model on the whole dataset<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># make a single prediction<\/span><\/p>\n<p><span class=\"crayon-v\">row<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">4.7705504<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1.88685058<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.96057964<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">2.53850317<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">6.5843005<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">3.45711663<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">7.46225013<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">2.01338213<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.45086384<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1.89314931<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">2.90675203<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.21214568<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.9623956<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">3.93862591<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">0.06276375<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">0.33964269<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">4.0835676<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">1.31423977<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">2.17983117<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">3.1047287<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-v\">yhat<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">predict<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">row<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;Predicted Class: %d&#8217;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">%<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">yhat<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0006 seconds] --><\/p>\n<p>Running the example fits the random subspace ensemble model on the entire dataset and is then used to make a prediction on a new row of data, as we might when using the model in an application.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>Now that we are familiar with using bagging for classification, let\u2019s look at the API for regression.<\/p>\n<h3>Random Subspace Ensemble for Regression<\/h3>\n<p>In this section, we will look at using bagging for a regression problem.<\/p>\n<p>First, we can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_regression.html\">make_regression() function<\/a> to create a synthetic regression problem with 1,000 examples and 20 input features.<\/p>\n<p>The complete example is listed below.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800a9377352924\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# test regression dataset<br \/>\nfrom sklearn.datasets import make_regression<br \/>\n# define dataset<br \/>\nX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=5)<br \/>\n# summarize the dataset<br \/>\nprint(X.shape, y.shape)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># test regression dataset<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">make<\/span><span class=\"crayon-sy\">_<\/span>regression<\/p>\n<p><span class=\"crayon-p\"># define dataset<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">make_regression<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_samples<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">20<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_informative<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">15<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">noise<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">0.1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># summarize the dataset<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">shape<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">shape<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0003 seconds] --><\/p>\n<p>Running the example creates the dataset and summarizes the shape of the input and output components.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>Next, we can evaluate a random subspace ensemble via bagging on this dataset.<\/p>\n<p>As before, we must configure bagging to use all rows of the training dataset and specify the number of input features to randomly select.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800ab739598503\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# define the model<br \/>\nmodel = BaggingRegressor(bootstrap=False, max_features=10)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># define the model<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">BaggingRegressor<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">bootstrap<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">False<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">max_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>As we did with the last section, we will evaluate the model using repeated k-fold cross-validation, with three repeats and 10 folds. We will report the mean absolute error (MAE) of the model across all repeats and folds. The scikit-learn library makes the MAE negative so that it is maximized instead of minimized. This means that larger negative MAE are better and a perfect model has a MAE of 0.<\/p>\n<p>The complete example is listed below.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800ac365702849\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# evaluate random subspace ensemble via bagging for regression<br \/>\nfrom numpy import mean<br \/>\nfrom numpy import std<br \/>\nfrom sklearn.datasets import make_regression<br \/>\nfrom sklearn.model_selection import cross_val_score<br \/>\nfrom sklearn.model_selection import RepeatedKFold<br \/>\nfrom sklearn.ensemble import BaggingRegressor<br \/>\n# define dataset<br \/>\nX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=5)<br \/>\n# define the model<br \/>\nmodel = BaggingRegressor(bootstrap=False, max_features=10)<br \/>\n# define the evaluation procedure<br \/>\ncv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)<br \/>\n# evaluate the model<br \/>\nn_scores = cross_val_score(model, X, y, scoring=&#8217;neg_mean_absolute_error&#8217;, cv=cv, n_jobs=-1, error_score=&#8217;raise&#8217;)<br \/>\n# report performance<br \/>\nprint(&#8216;MAE: %.3f (%.3f)&#8217; % (mean(n_scores), std(n_scores)))<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># evaluate random subspace ensemble via bagging for regression<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">mean<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">std<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">make_regression<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">cross_val_score<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">RepeatedKFold<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">ensemble <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">BaggingRegressor<\/span><\/p>\n<p><span class=\"crayon-p\"># define dataset<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">make_regression<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_samples<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">20<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_informative<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">15<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">noise<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">0.1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># define the model<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">BaggingRegressor<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">bootstrap<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">False<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">max_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># define the evaluation procedure<\/span><\/p>\n<p><span class=\"crayon-v\">cv<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">RepeatedKFold<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_splits<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_repeats<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># evaluate the model<\/span><\/p>\n<p><span class=\"crayon-v\">n_scores<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">cross_val_score<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">scoring<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;neg_mean_absolute_error&#8217;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_jobs<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">error_score<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;raise&#8217;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># report performance<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;MAE: %.3f (%.3f)&#8217;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">%<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-e\">mean<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_scores<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">std<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_scores<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0006 seconds] --><\/p>\n<p>Running the example reports the mean and standard deviation accuracy of the model.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the bagging ensemble with default hyperparameters achieves a MAE of about 114.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>We can also use the random subspace ensemble model as a final model and make predictions for regression.<\/p>\n<p>First, the ensemble is fit on all available data, then the predict() function can be called to make predictions on new data.<\/p>\n<p>The example below demonstrates this on our regression dataset.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800ae006124613\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# random subspace ensemble via bagging for making predictions for regression<br \/>\nfrom sklearn.datasets import make_regression<br \/>\nfrom sklearn.ensemble import BaggingRegressor<br \/>\n# define dataset<br \/>\nX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=5)<br \/>\n# define the model<br \/>\nmodel = BaggingRegressor(bootstrap=False, max_features=10)<br \/>\n# fit the model on the whole dataset<br \/>\nmodel.fit(X, y)<br \/>\n# make a single prediction<br \/>\nrow = [[0.88950817,-0.93540416,0.08392824,0.26438806,-0.52828711,-1.21102238,-0.4499934,1.47392391,-0.19737726,-0.22252503,0.02307668,0.26953276,0.03572757,-0.51606983,-0.39937452,1.8121736,-0.00775917,-0.02514283,-0.76089365,1.58692212]]<br \/>\nyhat = model.predict(row)<br \/>\nprint(&#8216;Prediction: %d&#8217; % yhat[0])<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># random subspace ensemble via bagging for making predictions for regression<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">make_regression<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">ensemble <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">BaggingRegressor<\/span><\/p>\n<p><span class=\"crayon-p\"># define dataset<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">make_regression<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_samples<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">20<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_informative<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">15<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">noise<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">0.1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># define the model<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">BaggingRegressor<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">bootstrap<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">False<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">max_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># fit the model on the whole dataset<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># make a single prediction<\/span><\/p>\n<p><span class=\"crayon-v\">row<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">0.88950817<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.93540416<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">0.08392824<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">0.26438806<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.52828711<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1.21102238<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.4499934<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">1.47392391<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.19737726<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.22252503<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">0.02307668<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">0.26953276<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">0.03572757<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.51606983<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.39937452<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">1.8121736<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.00775917<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.02514283<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">0.76089365<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">1.58692212<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-v\">yhat<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">predict<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">row<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;Prediction: %d&#8217;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">%<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">yhat<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0006 seconds] --><\/p>\n<p>Running the example fits the random subspace ensemble model on the entire dataset and is then used to make a prediction on a new row of data, as we might when using the model in an application.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>Now that we are familiar with using the scikit-learn API to evaluate and use random subspace ensembles, let\u2019s look at configuring the model.<\/p>\n<h2>Random Subspace Ensemble Hyperparameters<\/h2>\n<p>In this section, we will take a closer look at some of the hyperparameters you should consider tuning for the random subspace ensemble and their effect on model performance.<\/p>\n<h3>Explore Number of Trees<\/h3>\n<p>An important hyperparameter for the random subspace method is the number of decision trees used in the ensemble. More trees will stabilize the variance of the model, countering the effect of the number of features selected by each tree that introduces diversity.<\/p>\n<p>The number of trees can be set via the \u201c<em>n_estimators<\/em>\u201d argument and defaults to 10.<\/p>\n<p>The example below explores the effect of the number of trees with values between 10 to 5,000.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800b0028066023\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# explore random subspace ensemble number of trees effect on performance<br \/>\nfrom numpy import mean<br \/>\nfrom numpy import std<br \/>\nfrom sklearn.datasets import make_classification<br \/>\nfrom sklearn.model_selection import cross_val_score<br \/>\nfrom sklearn.model_selection import RepeatedStratifiedKFold<br \/>\nfrom sklearn.ensemble import BaggingClassifier<br \/>\nfrom matplotlib import pyplot<\/p>\n<p># get the dataset<br \/>\ndef get_dataset():<br \/>\n\tX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)<br \/>\n\treturn X, y<\/p>\n<p># get a list of models to evaluate<br \/>\ndef get_models():<br \/>\n\tmodels = dict()<br \/>\n\tn_trees = [10, 50, 100, 500, 1000, 5000]<br \/>\n\tfor n in n_trees:<br \/>\n\t\tmodels[str(n)] = BaggingClassifier(n_estimators=n, bootstrap=False, max_features=10)<br \/>\n\treturn models<\/p>\n<p># evaluate a given model using cross-validation<br \/>\ndef evaluate_model(model):<br \/>\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)<br \/>\n\tscores = cross_val_score(model, X, y, scoring=&#8217;accuracy&#8217;, cv=cv, n_jobs=-1)<br \/>\n\treturn scores<\/p>\n<p># define dataset<br \/>\nX, y = get_dataset()<br \/>\n# get the models to evaluate<br \/>\nmodels = get_models()<br \/>\n# evaluate the models and store results<br \/>\nresults, names = list(), list()<br \/>\nfor name, model in models.items():<br \/>\n\tscores = evaluate_model(model)<br \/>\n\tresults.append(scores)<br \/>\n\tnames.append(name)<br \/>\n\tprint(&#8216;&gt;%s %.3f (%.3f)&#8217; % (name, mean(scores), std(scores)))<br \/>\n# plot model performance for comparison<br \/>\npyplot.boxplot(results, labels=names, showmeans=True)<br \/>\npyplot.show()<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<p>18<\/p>\n<p>19<\/p>\n<p>20<\/p>\n<p>21<\/p>\n<p>22<\/p>\n<p>23<\/p>\n<p>24<\/p>\n<p>25<\/p>\n<p>26<\/p>\n<p>27<\/p>\n<p>28<\/p>\n<p>29<\/p>\n<p>30<\/p>\n<p>31<\/p>\n<p>32<\/p>\n<p>33<\/p>\n<p>34<\/p>\n<p>35<\/p>\n<p>36<\/p>\n<p>37<\/p>\n<p>38<\/p>\n<p>39<\/p>\n<p>40<\/p>\n<p>41<\/p>\n<p>42<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># explore random subspace ensemble number of trees effect on performance<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">mean<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">std<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">make_classification<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">cross_val_score<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">RepeatedStratifiedKFold<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">ensemble <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">BaggingClassifier<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">matplotlib <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">pyplot<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># get the dataset<\/span><\/p>\n<p><span class=\"crayon-e\">def <\/span><span class=\"crayon-e\">get_dataset<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">make_classification<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_samples<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">20<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_informative<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">15<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_redundant<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-st\">return<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">y<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># get a list of models to evaluate<\/span><\/p>\n<p><span class=\"crayon-e\">def <\/span><span class=\"crayon-e\">get_models<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">models<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">dict<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">n_trees<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">50<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">100<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">500<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">5000<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-st\">for<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">n<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-st\">in<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_trees<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\t\t<\/span><span class=\"crayon-v\">models<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-e\">str<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">BaggingClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_estimators<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">n<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">bootstrap<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">False<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">max_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-st\">return<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">models<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># evaluate a given model using cross-validation<\/span><\/p>\n<p><span class=\"crayon-e\">def <\/span><span class=\"crayon-e\">evaluate_model<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">RepeatedStratifiedKFold<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_splits<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_repeats<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">cross_val_score<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">scoring<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;accuracy&#8217;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_jobs<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-st\">return<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">scores<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># define dataset<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">get_dataset<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># get the models to evaluate<\/span><\/p>\n<p><span class=\"crayon-v\">models<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">get_models<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># evaluate the models and store results<\/span><\/p>\n<p><span class=\"crayon-v\">results<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">names<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">list<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">list<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-st\">for<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">name<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">model <\/span><span class=\"crayon-st\">in<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">models<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">items<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">evaluate_model<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">results<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">append<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">names<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">append<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">name<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;&gt;%s %.3f (%.3f)&#8217;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">%<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">name<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">mean<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">std<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># plot model performance for comparison<\/span><\/p>\n<p><span class=\"crayon-v\">pyplot<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">boxplot<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">results<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">labels<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">names<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">showmeans<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">True<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">pyplot<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">show<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0013 seconds] --><\/p>\n<p>Running the example first reports the mean accuracy for each configured number of decision trees.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that that performance appears to continue to improve as the number of ensemble members is increased to 5,000.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800bb926754285\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&gt;10 0.853 (0.030)<br \/>\n&gt;50 0.885 (0.038)<br \/>\n&gt;100 0.891 (0.034)<br \/>\n&gt;500 0.894 (0.036)<br \/>\n&gt;1000 0.894 (0.034)<br \/>\n&gt;5000 0.896 (0.033)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p>&gt;10 0.853 (0.030)<\/p>\n<p>&gt;50 0.885 (0.038)<\/p>\n<p>&gt;100 0.891 (0.034)<\/p>\n<p>&gt;500 0.894 (0.036)<\/p>\n<p>&gt;1000 0.894 (0.034)<\/p>\n<p>&gt;5000 0.896 (0.033)<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>A box and whisker plot is created for the distribution of accuracy scores for each configured number of trees.<\/p>\n<p>We can see the general trend of further improvement with the number of decision trees used in the ensemble.<\/p>\n<div id=\"attachment_11149\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-11149\" loading=\"lazy\" class=\"size-full wp-image-11149\" src=\"https:\/\/3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Size-vs.-Classification-Accuracy.png\" alt=\"Box Plot of Random Subspace Ensemble Size vs. Classification Accuracy\" width=\"1280\" height=\"960\"><\/p>\n<p id=\"caption-attachment-11149\" class=\"wp-caption-text\">Box Plot of Random Subspace Ensemble Size vs. Classification Accuracy<\/p>\n<\/div>\n<h3>Explore Number of Features<\/h3>\n<p>The number of features selected for each random subspace controls the diversity of the ensemble.<\/p>\n<p>Fewer features mean more diversity, whereas more features mean less diversity. More diversity may require more trees to reduce the variance of predictions made by the model.<\/p>\n<p>We can vary the diversity of the ensemble by varying the number of random features selected by setting the \u201c<em>max_features<\/em>\u201d argument.<\/p>\n<p>The example below varies the value from 1 to 20 with a fixed number of trees in the ensemble.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800bc586838297\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# explore random subspace ensemble number of features effect on performance<br \/>\nfrom numpy import mean<br \/>\nfrom numpy import std<br \/>\nfrom sklearn.datasets import make_classification<br \/>\nfrom sklearn.model_selection import cross_val_score<br \/>\nfrom sklearn.model_selection import RepeatedStratifiedKFold<br \/>\nfrom sklearn.ensemble import BaggingClassifier<br \/>\nfrom matplotlib import pyplot<\/p>\n<p># get the dataset<br \/>\ndef get_dataset():<br \/>\n\tX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)<br \/>\n\treturn X, y<\/p>\n<p># get a list of models to evaluate<br \/>\ndef get_models():<br \/>\n\tmodels = dict()<br \/>\n\tfor n in range(1,21):<br \/>\n\t\tmodels[str(n)] = BaggingClassifier(n_estimators=100, bootstrap=False, max_features=n)<br \/>\n\treturn models<\/p>\n<p># evaluate a given model using cross-validation<br \/>\ndef evaluate_model(model):<br \/>\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)<br \/>\n\tscores = cross_val_score(model, X, y, scoring=&#8217;accuracy&#8217;, cv=cv, n_jobs=-1)<br \/>\n\treturn scores<\/p>\n<p># define dataset<br \/>\nX, y = get_dataset()<br \/>\n# get the models to evaluate<br \/>\nmodels = get_models()<br \/>\n# evaluate the models and store results<br \/>\nresults, names = list(), list()<br \/>\nfor name, model in models.items():<br \/>\n\tscores = evaluate_model(model)<br \/>\n\tresults.append(scores)<br \/>\n\tnames.append(name)<br \/>\n\tprint(&#8216;&gt;%s %.3f (%.3f)&#8217; % (name, mean(scores), std(scores)))<br \/>\n# plot model performance for comparison<br \/>\npyplot.boxplot(results, labels=names, showmeans=True)<br \/>\npyplot.show()<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<p>18<\/p>\n<p>19<\/p>\n<p>20<\/p>\n<p>21<\/p>\n<p>22<\/p>\n<p>23<\/p>\n<p>24<\/p>\n<p>25<\/p>\n<p>26<\/p>\n<p>27<\/p>\n<p>28<\/p>\n<p>29<\/p>\n<p>30<\/p>\n<p>31<\/p>\n<p>32<\/p>\n<p>33<\/p>\n<p>34<\/p>\n<p>35<\/p>\n<p>36<\/p>\n<p>37<\/p>\n<p>38<\/p>\n<p>39<\/p>\n<p>40<\/p>\n<p>41<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># explore random subspace ensemble number of features effect on performance<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">mean<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">std<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">make_classification<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">cross_val_score<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">RepeatedStratifiedKFold<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">ensemble <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">BaggingClassifier<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">matplotlib <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">pyplot<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># get the dataset<\/span><\/p>\n<p><span class=\"crayon-e\">def <\/span><span class=\"crayon-e\">get_dataset<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">make_classification<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_samples<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">20<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_informative<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">15<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_redundant<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-st\">return<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">y<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># get a list of models to evaluate<\/span><\/p>\n<p><span class=\"crayon-e\">def <\/span><span class=\"crayon-e\">get_models<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">models<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">dict<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-st\">for<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">n<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-st\">in<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">range<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-cn\">21<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\t\t<\/span><span class=\"crayon-v\">models<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-e\">str<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">BaggingClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_estimators<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">100<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">bootstrap<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">False<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">max_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">n<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-st\">return<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">models<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># evaluate a given model using cross-validation<\/span><\/p>\n<p><span class=\"crayon-e\">def <\/span><span class=\"crayon-e\">evaluate_model<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">RepeatedStratifiedKFold<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_splits<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_repeats<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">cross_val_score<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">scoring<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;accuracy&#8217;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_jobs<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-st\">return<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">scores<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># define dataset<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">get_dataset<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># get the models to evaluate<\/span><\/p>\n<p><span class=\"crayon-v\">models<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">get_models<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># evaluate the models and store results<\/span><\/p>\n<p><span class=\"crayon-v\">results<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">names<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">list<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">list<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-st\">for<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">name<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">model <\/span><span class=\"crayon-st\">in<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">models<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">items<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">evaluate_model<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">results<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">append<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-v\">names<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">append<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">name<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\t<\/span><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;&gt;%s %.3f (%.3f)&#8217;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">%<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">name<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">mean<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">std<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">scores<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># plot model performance for comparison<\/span><\/p>\n<p><span class=\"crayon-v\">pyplot<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">boxplot<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">results<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">labels<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">names<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">showmeans<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">True<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">pyplot<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">show<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0013 seconds] --><\/p>\n<p>Running the example first reports the mean accuracy for each number of features.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that perhaps using 8 to 11 features in the random subspaces might be appropriate on this dataset when using 100 decision trees. This might suggest increasing the number of trees to a large value first, then tuning the number of features selected in each subset.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800bf652539170\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&gt;1 0.607 (0.036)<br \/>\n&gt;2 0.771 (0.042)<br \/>\n&gt;3 0.837 (0.036)<br \/>\n&gt;4 0.858 (0.037)<br \/>\n&gt;5 0.869 (0.034)<br \/>\n&gt;6 0.883 (0.033)<br \/>\n&gt;7 0.887 (0.038)<br \/>\n&gt;8 0.894 (0.035)<br \/>\n&gt;9 0.893 (0.035)<br \/>\n&gt;10 0.885 (0.038)<br \/>\n&gt;11 0.892 (0.034)<br \/>\n&gt;12 0.883 (0.036)<br \/>\n&gt;13 0.881 (0.044)<br \/>\n&gt;14 0.875 (0.038)<br \/>\n&gt;15 0.869 (0.041)<br \/>\n&gt;16 0.861 (0.044)<br \/>\n&gt;17 0.851 (0.041)<br \/>\n&gt;18 0.831 (0.046)<br \/>\n&gt;19 0.815 (0.046)<br \/>\n&gt;20 0.801 (0.049)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<p>18<\/p>\n<p>19<\/p>\n<p>20<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p>&gt;1 0.607 (0.036)<\/p>\n<p>&gt;2 0.771 (0.042)<\/p>\n<p>&gt;3 0.837 (0.036)<\/p>\n<p>&gt;4 0.858 (0.037)<\/p>\n<p>&gt;5 0.869 (0.034)<\/p>\n<p>&gt;6 0.883 (0.033)<\/p>\n<p>&gt;7 0.887 (0.038)<\/p>\n<p>&gt;8 0.894 (0.035)<\/p>\n<p>&gt;9 0.893 (0.035)<\/p>\n<p>&gt;10 0.885 (0.038)<\/p>\n<p>&gt;11 0.892 (0.034)<\/p>\n<p>&gt;12 0.883 (0.036)<\/p>\n<p>&gt;13 0.881 (0.044)<\/p>\n<p>&gt;14 0.875 (0.038)<\/p>\n<p>&gt;15 0.869 (0.041)<\/p>\n<p>&gt;16 0.861 (0.044)<\/p>\n<p>&gt;17 0.851 (0.041)<\/p>\n<p>&gt;18 0.831 (0.046)<\/p>\n<p>&gt;19 0.815 (0.046)<\/p>\n<p>&gt;20 0.801 (0.049)<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>A box and whisker plot is created for the distribution of accuracy scores for each number of random subset features.<\/p>\n<p>We can see a general trend of increasing accuracy to a point and a steady decrease in performance after 11 features.<\/p>\n<div id=\"attachment_11150\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-11150\" loading=\"lazy\" class=\"size-full wp-image-11150\" src=\"https:\/\/3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com\/wp-content\/uploads\/2020\/11\/Box-Plot-of-Random-Subspace-Ensemble-Features-vs.-Classification-Accuracy.png\" alt=\"Box Plot of Random Subspace Ensemble Features vs. Classification Accuracy\" width=\"1280\" height=\"960\"><\/p>\n<p id=\"caption-attachment-11150\" class=\"wp-caption-text\">Box Plot of Random Subspace Ensemble Features vs. Classification Accuracy<\/p>\n<\/div>\n<h3>Explore Alternate Algorithm<\/h3>\n<p>Decision trees are the most common algorithm used in a random subspace ensemble.<\/p>\n<p>The reason for this is that they are easy to configure and work well on most problems.<\/p>\n<p>Other algorithms can be used to construct random subspaces and must be configured to have a modestly high variance. One example is the k-nearest neighbors algorithm where the <em>k<\/em> value can be set to a low value.<\/p>\n<p>The algorithm used in the ensemble is specified via the \u201c<em>base_estimator<\/em>\u201d argument and must be set to an instance of the algorithm and algorithm configuration to use.<\/p>\n<p>The example below demonstrates using a <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.neighbors.KNeighborsClassifier.html\">KNeighborsClassifier<\/a> as the base algorithm used in the random subspace ensemble via the bagging class. Here, the algorithm is used with default hyperparameters where k is set to 5.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800c0159908102\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# define the model<br \/>\nmodel = BaggingClassifier(base_estimator=KNeighborsClassifier(), bootstrap=False, max_features=10)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># define the model<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">BaggingClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">base_estimator<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-e\">KNeighborsClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">bootstrap<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">False<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">max_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>The complete example is listed below.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f9b0855800c9175574247\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# evaluate random subspace ensemble with knn algorithm for classification<br \/>\nfrom numpy import mean<br \/>\nfrom numpy import std<br \/>\nfrom sklearn.datasets import make_classification<br \/>\nfrom sklearn.model_selection import cross_val_score<br \/>\nfrom sklearn.model_selection import RepeatedStratifiedKFold<br \/>\nfrom sklearn.neighbors import KNeighborsClassifier<br \/>\nfrom sklearn.ensemble import BaggingClassifier<br \/>\n# define dataset<br \/>\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=5)<br \/>\n# define the model<br \/>\nmodel = BaggingClassifier(base_estimator=KNeighborsClassifier(), bootstrap=False, max_features=10)<br \/>\n# define the evaluation procedure<br \/>\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)<br \/>\n# evaluate the model<br \/>\nn_scores = cross_val_score(model, X, y, scoring=&#8217;accuracy&#8217;, cv=cv, n_jobs=-1, error_score=&#8217;raise&#8217;)<br \/>\n# report performance<br \/>\nprint(&#8216;Accuracy: %.3f (%.3f)&#8217; % (mean(n_scores), std(n_scores)))<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<p>18<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># evaluate random subspace ensemble with knn algorithm for classification<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">mean<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">numpy <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">std<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">datasets <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">make_classification<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">cross_val_score<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">RepeatedStratifiedKFold<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">neighbors <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">KNeighborsClassifier<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">ensemble <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">BaggingClassifier<\/span><\/p>\n<p><span class=\"crayon-p\"># define dataset<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">make_classification<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_samples<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1000<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">20<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_informative<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">15<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_redundant<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># define the model<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">BaggingClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">base_estimator<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-e\">KNeighborsClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">bootstrap<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-t\">False<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">max_features<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># define the evaluation procedure<\/span><\/p>\n<p><span class=\"crayon-v\">cv<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">RepeatedStratifiedKFold<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_splits<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_repeats<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># evaluate the model<\/span><\/p>\n<p><span class=\"crayon-v\">n_scores<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">cross_val_score<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">scoring<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;accuracy&#8217;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">cv<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_jobs<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">error_score<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8216;raise&#8217;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># report performance<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;Accuracy: %.3f (%.3f)&#8217;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">%<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-e\">mean<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_scores<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">std<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">n_scores<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0007 seconds] --><\/p>\n<p>Running the example reports the mean and standard deviation accuracy of the model.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see the random subspace ensemble with KNN and default hyperparameters achieves a classification accuracy of about 90 percent on this test dataset.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.14 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Papers<\/h3>\n<h3>Books<\/h3>\n<h3>APIs<\/h3>\n<h3>Articles<\/h3>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to develop random subspace ensembles for classification and regression.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Random subspace ensembles are created from decision trees fit on different samples of features (columns) in the training dataset.<\/li>\n<li>How to use the random subspace ensemble for classification and regression with scikit-learn.<\/li>\n<li>How to explore the effect of random subspace model hyperparameters on model performance.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>Ask your questions in the comments below and I will do my best to answer.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/machinelearningmastery.com\/random-subspace-ensemble-with-python\/<\/p>\n","protected":false},"author":0,"featured_media":475,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/474"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=474"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/474\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/475"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=474"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=474"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=474"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}