{"id":185,"date":"2020-09-06T23:22:14","date_gmt":"2020-09-06T23:22:14","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/09\/06\/auto-sklearn-for-automated-machine-learning-in-python\/"},"modified":"2020-09-06T23:22:14","modified_gmt":"2020-09-06T23:22:14","slug":"auto-sklearn-for-automated-machine-learning-in-python","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/09\/06\/auto-sklearn-for-automated-machine-learning-in-python\/","title":{"rendered":"Auto-Sklearn for Automated Machine Learning in Python"},"content":{"rendered":"<div id=\"\">\n<p id=\"last-modified-info\">Last Updated on September 7, 2020<\/p>\n<p>Automated Machine Learning (AutoML) refers to techniques for automatically discovering well-performing models for predictive modeling tasks with very little user involvement.<\/p>\n<p><strong>Auto-Sklearn<\/strong> is an open-source library for performing AutoML in Python. It makes use of the popular Scikit-Learn machine learning library for data transforms and machine learning algorithms and uses a Bayesian Optimization search procedure to efficiently discover a top-performing model pipeline for a given dataset.<\/p>\n<p>In this tutorial, you will discover how to use Auto-Sklearn for AutoML with Scikit-Learn machine learning algorithms in Python.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Auto-Sklearn is an open-source library for AutoML with scikit-learn data preparation and machine learning models.<\/li>\n<li>How to use Auto-Sklearn to automatically discover top-performing models for classification tasks.<\/li>\n<li>How to use Auto-Sklearn to automatically discover top-performing models for regression tasks.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_10472\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-10472\" loading=\"lazy\" class=\"size-full wp-image-10472\" src=\"https:\/\/3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com\/wp-content\/uploads\/2020\/06\/Auto-Sklearn-for-Automated-Machine-Learning-in-Python.jpg\" alt=\"Auto-Sklearn for Automated Machine Learning in Python\" width=\"800\" height=\"534\"><\/p>\n<p id=\"caption-attachment-10472\" class=\"wp-caption-text\">Auto-Sklearn for Automated Machine Learning in Python<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/89654772@N05\/25558362110\/\">Richard<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into four parts; they are:<\/p>\n<ol>\n<li>AutoML With Auto-Sklearn<\/li>\n<li>Install and Using Auto-Sklearn<\/li>\n<li>Auto-Sklearn for Classification<\/li>\n<li>Auto-Sklearn for Regression<\/li>\n<\/ol>\n<h2>AutoML With Auto-Sklearn<\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Automated_machine_learning\">Automated Machine Learning<\/a>, or AutoML for short, is a process of discovering the best-performing pipeline of data transforms, model, and model configuration for a dataset.<\/p>\n<p>AutoML often involves the use of sophisticated optimization algorithms, such as <a href=\"https:\/\/machinelearningmastery.com\/what-is-bayesian-optimization\/\">Bayesian Optimization<\/a>, to efficiently navigate the space of possible models and model configurations and quickly discover what works well for a given predictive modeling task. It allows non-expert machine learning practitioners to quickly and easily discover what works well or even best for a given dataset with very little technical background or direct input.<\/p>\n<p><a href=\"https:\/\/automl.github.io\/auto-sklearn\/master\/\">Auto-Sklearn<\/a> is an open-source Python library for AutoML using machine learning models from the scikit-learn machine learning library.<\/p>\n<p>It was developed by <a href=\"https:\/\/ml.informatik.uni-freiburg.de\/people\/feurer\/index.html\">Matthias Feurer<\/a>, et al. and described in their 2015 paper titled \u201c<a href=\"https:\/\/papers.nips.cc\/paper\/5872-efficient-and-robust-automated-machine-learning\">Efficient and Robust Automated Machine Learning<\/a>.\u201d<\/p>\n<blockquote>\n<p>\u2026 we introduce a robust new AutoML system based on scikit-learn (using 15 classifiers, 14 feature preprocessing methods, and 4 data preprocessing methods, giving rise to a structured hypothesis space with 110 hyperparameters).<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/papers.nips.cc\/paper\/5872-efficient-and-robust-automated-machine-learning\">Efficient and Robust Automated Machine Learning<\/a>, 2015.<\/p>\n<p>The benefit of Auto-Sklearn is that, in addition to discovering the data preparation and model that performs for a dataset, it also is able to learn from models that performed well on similar datasets and is able to automatically create an ensemble of top-performing models discovered as part of the optimization process.<\/p>\n<blockquote>\n<p>This system, which we dub AUTO-SKLEARN, improves on existing AutoML methods by automatically taking into account past performance on similar datasets, and by constructing ensembles from the models evaluated during the optimization.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/papers.nips.cc\/paper\/5872-efficient-and-robust-automated-machine-learning\">Efficient and Robust Automated Machine Learning<\/a>, 2015.<\/p>\n<p>The authors provide a useful depiction of their system in the paper, provided below.<\/p>\n<div id=\"attachment_10471\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-10471\" loading=\"lazy\" class=\"size-full wp-image-10471\" src=\"https:\/\/3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com\/wp-content\/uploads\/2020\/03\/Overview-of-the-Auto-Sklearn-System.png\" alt=\"Overview of the Auto-Sklearn System\" width=\"800\" height=\"147\"><\/p>\n<p id=\"caption-attachment-10471\" class=\"wp-caption-text\">Overview of the Auto-Sklearn System.<br \/>Taken from: Efficient and Robust Automated Machine Learning, 2015.<\/p>\n<\/div>\n<h2>Install and Using Auto-Sklearn<\/h2>\n<p>The first step is to install the Auto-Sklearn library, which can be achieved using pip, as follows:<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10c7505286621\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nsudo pip install autosklearn<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p>sudo pip install autosklearn<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>Once installed, we can import the library and print the version number to confirm it was installed successfully:<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10cd855275392\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# print autosklearn version<br \/>\nimport autosklearn<br \/>\nprint(&#8216;autosklearn: %s&#8217; % autosklearn.__version__)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># print autosklearn version<\/span><\/p>\n<p><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">autosklearn<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;autosklearn: %s&#8217;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">%<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">autosklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">__version__<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>Running the example prints the version number.<\/p>\n<p>Your version number should be the same or higher.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>Using Auto-Sklearn is straightforward.<\/p>\n<p>Depending on whether your prediction task is classification or regression, you create and configure an instance of the <a href=\"https:\/\/automl.github.io\/auto-sklearn\/master\/api.html#classification\">AutoSklearnClassifier<\/a> or <a href=\"https:\/\/automl.github.io\/auto-sklearn\/master\/api.html#regression\">AutoSklearnRegressor<\/a> class, fit it on your dataset, and that\u2019s it. The resulting model can then be used to make predictions directly or saved to file (using pickle) for later use.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10d1527104574\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# define search<br \/>\nmodel = AutoSklearnClassifier()<br \/>\n# perform the search<br \/>\nmodel.fit(X_train, y_train)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># define search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">AutoSklearnClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># perform the search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_train<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>There are a ton of configuration options provided as arguments to the AutoSklearn class.<\/p>\n<p>By default, the search will use a train-test split of your dataset during the search, and this default is recommended both for speed and simplicity.<\/p>\n<p>Importantly, you should set the \u201c<em>n_jobs<\/em>\u201d argument to the number of cores in your system, e.g. 8 if you have 8 cores.<\/p>\n<p>The optimization process will run for as long as you allow, measure in minutes. By default, it will run for one hour.<\/p>\n<p>I recommend setting the \u201c<em>time_left_for_this_task<\/em>\u201d argument for the number of seconds you want the process to run. E.g. less than 5-10 minutes is probably plenty for many small predictive modeling tasks (sub 1,000 rows).<\/p>\n<p>We will use 5 minutes (300 seconds) for the examples in this tutorial. We will also limit the time allocated to each model evaluation to 30 seconds via the \u201c<em>per_run_time_limit<\/em>\u201d argument. For example:<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10d2596171716\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# define search<br \/>\nmodel = AutoSklearnClassifier(time_left_for_this_task=120, per_run_time_limit=30, n_jobs=8)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># define search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">AutoSklearnClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">time_left_for_this_task<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">120<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">per_run_time_limit<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">30<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_jobs<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">8<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>You can limit the algorithms considered in the search, as well as the data transforms.<\/p>\n<p>By default, the search will create an ensemble of top-performing models discovered as part of the search. Sometimes, this can lead to overfitting and can be disabled by setting the \u201c<em>ensemble_size<\/em>\u201d argument to 1 and \u201c<em>initial_configurations_via_metalearning<\/em>\u201d to 0.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10d4447101287\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# define search<br \/>\nmodel = AutoSklearnClassifier(ensemble_size=1, initial_configurations_via_metalearning=0)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># define search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">AutoSklearnClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">ensemble_size<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">initial_configurations_via_metalearning<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>At the end of a run, the list of models can be accessed, as well as other details.<\/p>\n<p>Perhaps the most useful feature is the <em>sprint_statistics()<\/em> function that summarizes the search and the performance of the final model.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10d5342116217\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# summarize performance<br \/>\nprint(model.sprint_statistics())<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># summarize performance<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">sprint_statistics<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>Now that we are familiar with the Auto-Sklearn library, let\u2019s look at some worked examples.<\/p>\n<h2>Auto-Sklearn for Classification<\/h2>\n<p>In this section, we will use Auto-Sklearn to discover a model for the sonar dataset.<\/p>\n<p>The sonar dataset is a standard machine learning dataset comprised of 208 rows of data with 60 numerical input variables and a target variable with two class values, e.g. binary classification.<\/p>\n<p>Using a test harness of repeated stratified 10-fold cross-validation with three repeats, a naive model can achieve an accuracy of about 53 percent. A top-performing model can achieve accuracy on this same test harness of about 88 percent. This provides the bounds of expected performance on this dataset.<\/p>\n<p>The dataset involves predicting whether sonar returns indicate a rock or simulated mine.<\/p>\n<p>No need to download the dataset; we will download it automatically as part of our worked examples.<\/p>\n<p>The example below downloads the dataset and summarizes its shape.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10d7229333059\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# summarize the sonar dataset<br \/>\nfrom pandas import read_csv<br \/>\n# load dataset<br \/>\nurl = &#8216;https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv&#8217;<br \/>\ndataframe = read_csv(url, header=None)<br \/>\n# split into input and output elements<br \/>\ndata = dataframe.values<br \/>\nX, y = data[:, :-1], data[:, -1]<br \/>\nprint(X.shape, y.shape)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># summarize the sonar dataset<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">pandas <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">read<\/span><span class=\"crayon-sy\">_<\/span>csv<\/p>\n<p><span class=\"crayon-p\"># load dataset<\/span><\/p>\n<p><span class=\"crayon-v\">url<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8216;https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv&#8217;<\/span><\/p>\n<p><span class=\"crayon-v\">dataframe<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">read_csv<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">url<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">header<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">None<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># split into input and output elements<\/span><\/p>\n<p><span class=\"crayon-v\">data<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">dataframe<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-i\">values<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">shape<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">shape<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0004 seconds] --><\/p>\n<p>Running the example downloads the dataset and splits it into input and output elements. As expected, we can see that there are 208 rows of data with 60 input variables.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>We will use Auto-Sklearn to find a good model for the sonar dataset.<\/p>\n<p>First, we will split the dataset into train and test sets and allow the process to find a good model on the training set, then later evaluate the performance of what was found on the holdout test set.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10dc083265640\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# split into train and test sets<br \/>\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># split into train and test sets<\/span><\/p>\n<p><span class=\"crayon-v\">X_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X_test<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_test<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">train_test_split<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">test_size<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">0.33<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0002 seconds] --><\/p>\n<p>The <em>AutoSklearnClassifier<\/em> is configured to run for 5 minutes with 8 cores and limit each model evaluation to 30 seconds.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10dd509039999\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# define search<br \/>\nmodel = AutoSklearnClassifier(time_left_for_this_task=5*60, per_run_time_limit=30, n_jobs=8)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># define search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">AutoSklearnClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">time_left_for_this_task<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-o\">*<\/span><span class=\"crayon-cn\">60<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">per_run_time_limit<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">30<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_jobs<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">8<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>The search is then performed on the training dataset.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10df230915741\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# perform the search<br \/>\nmodel.fit(X_train, y_train)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># perform the search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_train<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>Afterward, a summary of the search and best-performing model is reported.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10e0285756784\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# summarize<br \/>\nprint(model.sprint_statistics())<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># summarize<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">sprint_statistics<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>Finally, we evaluate the performance of the model that was prepared on the holdout test dataset.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10e2866282197\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# evaluate best model<br \/>\ny_hat = model.predict(X_test)<br \/>\nacc = accuracy_score(y_test, y_hat)<br \/>\nprint(&#8220;Accuracy: %.3f&#8221; % acc)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># evaluate best model<\/span><\/p>\n<p><span class=\"crayon-v\">y_hat<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">predict<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_test<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">acc<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">accuracy_score<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">y_test<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_hat<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8220;Accuracy: %.3f&#8221;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">%<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">acc<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0002 seconds] --><\/p>\n<p>Tying this together, the complete example is listed below.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10e3393984378\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# example of auto-sklearn for the sonar classification dataset<br \/>\nfrom pandas import read_csv<br \/>\nfrom sklearn.model_selection import train_test_split<br \/>\nfrom sklearn.preprocessing import LabelEncoder<br \/>\nfrom sklearn.metrics import accuracy_score<br \/>\nfrom autosklearn.classification import AutoSklearnClassifier<br \/>\n# load dataset<br \/>\nurl = &#8216;https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv&#8217;<br \/>\ndataframe = read_csv(url, header=None)<br \/>\n# print(dataframe.head())<br \/>\n# split into input and output elements<br \/>\ndata = dataframe.values<br \/>\nX, y = data[:, :-1], data[:, -1]<br \/>\n# minimally prepare dataset<br \/>\nX = X.astype(&#8216;float32&#8217;)<br \/>\ny = LabelEncoder().fit_transform(y.astype(&#8216;str&#8217;))<br \/>\n# split into train and test sets<br \/>\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)<br \/>\n# define search<br \/>\nmodel = AutoSklearnClassifier(time_left_for_this_task=5*60, per_run_time_limit=30, n_jobs=8)<br \/>\n# perform the search<br \/>\nmodel.fit(X_train, y_train)<br \/>\n# summarize<br \/>\nprint(model.sprint_statistics())<br \/>\n# evaluate best model<br \/>\ny_hat = model.predict(X_test)<br \/>\nacc = accuracy_score(y_test, y_hat)<br \/>\nprint(&#8220;Accuracy: %.3f&#8221; % acc)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<p>18<\/p>\n<p>19<\/p>\n<p>20<\/p>\n<p>21<\/p>\n<p>22<\/p>\n<p>23<\/p>\n<p>24<\/p>\n<p>25<\/p>\n<p>26<\/p>\n<p>27<\/p>\n<p>28<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># example of auto-sklearn for the sonar classification dataset<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">pandas <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">read_csv<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">train_test_split<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">preprocessing <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">LabelEncoder<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">metrics <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">accuracy_score<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">autosklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">classification <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-i\">AutoSklearnClassifier<\/span><\/p>\n<p><span class=\"crayon-p\"># load dataset<\/span><\/p>\n<p><span class=\"crayon-v\">url<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8216;https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv&#8217;<\/span><\/p>\n<p><span class=\"crayon-v\">dataframe<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">read_csv<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">url<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">header<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">None<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># print(dataframe.head())<\/span><\/p>\n<p><span class=\"crayon-p\"># split into input and output elements<\/span><\/p>\n<p><span class=\"crayon-v\">data<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">dataframe<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-i\">values<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-p\"># minimally prepare dataset<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">astype<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;float32&#8217;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">LabelEncoder<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit_transform<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">astype<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;str&#8217;<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># split into train and test sets<\/span><\/p>\n<p><span class=\"crayon-v\">X_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X_test<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_test<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">train_test_split<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">test_size<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">0.33<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># define search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">AutoSklearnClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">time_left_for_this_task<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-o\">*<\/span><span class=\"crayon-cn\">60<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">per_run_time_limit<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">30<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_jobs<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">8<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># perform the search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_train<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># summarize<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">sprint_statistics<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># evaluate best model<\/span><\/p>\n<p><span class=\"crayon-v\">y_hat<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">predict<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_test<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">acc<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">accuracy_score<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">y_test<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_hat<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8220;Accuracy: %.3f&#8221;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">%<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">acc<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0008 seconds] --><\/p>\n<p>Running the example will take about five minutes, given the hard limit we imposed on the run.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>At the end of the run, a summary is printed showing that 1,054 models were evaluated and the estimated performance of the final model was 91 percent.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10e5642638484\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nauto-sklearn results:<br \/>\nDataset name: f4c282bd4b56d4db7e5f7fe1a6a8edeb<br \/>\nMetric: accuracy<br \/>\nBest validation score: 0.913043<br \/>\nNumber of target algorithm runs: 1054<br \/>\nNumber of successful target algorithm runs: 952<br \/>\nNumber of crashed target algorithm runs: 94<br \/>\nNumber of target algorithms that exceeded the time limit: 8<br \/>\nNumber of target algorithms that exceeded the memory limit: 0<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p>auto-sklearn results:<\/p>\n<p>Dataset name: f4c282bd4b56d4db7e5f7fe1a6a8edeb<\/p>\n<p>Metric: accuracy<\/p>\n<p>Best validation score: 0.913043<\/p>\n<p>Number of target algorithm runs: 1054<\/p>\n<p>Number of successful target algorithm runs: 952<\/p>\n<p>Number of crashed target algorithm runs: 94<\/p>\n<p>Number of target algorithms that exceeded the time limit: 8<\/p>\n<p>Number of target algorithms that exceeded the memory limit: 0<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>We then evaluate the model on the holdout dataset and see that classification accuracy of 81.2 percent was achieved, which is reasonably skillful.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<h2>Auto-Sklearn for Regression<\/h2>\n<p>In this section, we will use Auto-Sklearn to discover a model for the auto insurance dataset.<\/p>\n<p>The auto insurance dataset is a standard machine learning dataset comprised of 63 rows of data with one numerical input variable and a numerical target variable.<\/p>\n<p>Using a test harness of repeated stratified 10-fold cross-validation with three repeats, a naive model can achieve a mean absolute error (MAE) of about 66. A top-performing model can achieve a MAE on this same test harness of about 28. This provides the bounds of expected performance on this dataset.<\/p>\n<p>The dataset involves predicting the total amount in claims (thousands of Swedish Kronor) given the number of claims for different geographical regions.<\/p>\n<p>No need to download the dataset; we will download it automatically as part of our worked examples.<\/p>\n<p>The example below downloads the dataset and summarizes its shape.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10e8921636813\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# summarize the auto insurance dataset<br \/>\nfrom pandas import read_csv<br \/>\n# load dataset<br \/>\nurl = &#8216;https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/auto-insurance.csv&#8217;<br \/>\ndataframe = read_csv(url, header=None)<br \/>\n# split into input and output elements<br \/>\ndata = dataframe.values<br \/>\nX, y = data[:, :-1], data[:, -1]<br \/>\nprint(X.shape, y.shape)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># summarize the auto insurance dataset<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">pandas <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">read<\/span><span class=\"crayon-sy\">_<\/span>csv<\/p>\n<p><span class=\"crayon-p\"># load dataset<\/span><\/p>\n<p><span class=\"crayon-v\">url<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8216;https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/auto-insurance.csv&#8217;<\/span><\/p>\n<p><span class=\"crayon-v\">dataframe<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">read_csv<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">url<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">header<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">None<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># split into input and output elements<\/span><\/p>\n<p><span class=\"crayon-v\">data<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">dataframe<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-i\">values<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">shape<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">shape<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0003 seconds] --><\/p>\n<p>Running the example downloads the dataset and splits it into input and output elements. As expected, we can see that there are 63 rows of data with one input variable.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>We will use Auto-Sklearn to find a good model for the auto insurance dataset.<\/p>\n<p>We can use the same process as was used in the previous section, although we will use the <em>AutoSklearnRegressor<\/em> class instead of the <em>AutoSklearnClassifier<\/em>.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10eb451275479\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# define search<br \/>\nmodel = AutoSklearnRegressor(time_left_for_this_task=5*60, per_run_time_limit=30, n_jobs=8)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># define search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">AutoSklearnRegressor<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">time_left_for_this_task<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-o\">*<\/span><span class=\"crayon-cn\">60<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">per_run_time_limit<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">30<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_jobs<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">8<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>By default, the regressor will optimize the R^2 metric.<\/p>\n<p>In this case, we are interested in the mean absolute error, or MAE, which we can specify via the \u201c<em>metric<\/em>\u201d argument when calling the <em>fit()<\/em> function.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10ed211653341\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n&#8230;<br \/>\n# perform the search<br \/>\nmodel.fit(X_train, y_train, metric=auto_mean_absolute_error)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-p\"># perform the search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">metric<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">auto_mean_absolute_error<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0001 seconds] --><\/p>\n<p>The complete example is listed below.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10ee895757146\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\n# example of auto-sklearn for the insurance regression dataset<br \/>\nfrom pandas import read_csv<br \/>\nfrom sklearn.model_selection import train_test_split<br \/>\nfrom sklearn.metrics import mean_absolute_error<br \/>\nfrom autosklearn.regression import AutoSklearnRegressor<br \/>\nfrom autosklearn.metrics import mean_absolute_error as auto_mean_absolute_error<br \/>\n# load dataset<br \/>\nurl = &#8216;https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/auto-insurance.csv&#8217;<br \/>\ndataframe = read_csv(url, header=None)<br \/>\n# split into input and output elements<br \/>\ndata = dataframe.values<br \/>\ndata = data.astype(&#8216;float32&#8217;)<br \/>\nX, y = data[:, :-1], data[:, -1]<br \/>\n# split into train and test sets<br \/>\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)<br \/>\n# define search<br \/>\nmodel = AutoSklearnRegressor(time_left_for_this_task=5*60, per_run_time_limit=30, n_jobs=8)<br \/>\n# perform the search<br \/>\nmodel.fit(X_train, y_train, metric=auto_mean_absolute_error)<br \/>\n# summarize<br \/>\nprint(model.sprint_statistics())<br \/>\n# evaluate best model<br \/>\ny_hat = model.predict(X_test)<br \/>\nmae = mean_absolute_error(y_test, y_hat)<br \/>\nprint(&#8220;MAE: %.3f&#8221; % mae)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<p>18<\/p>\n<p>19<\/p>\n<p>20<\/p>\n<p>21<\/p>\n<p>22<\/p>\n<p>23<\/p>\n<p>24<\/p>\n<p>25<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p><span class=\"crayon-p\"># example of auto-sklearn for the insurance regression dataset<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">pandas <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">read_csv<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">train_test_split<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">metrics <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">mean_absolute_error<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">autosklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">regression <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">AutoSklearnRegressor<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">autosklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">metrics <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">mean_absolute_error <\/span><span class=\"crayon-st\">as<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">auto_mean_absolute<\/span><span class=\"crayon-sy\">_<\/span>error<\/p>\n<p><span class=\"crayon-p\"># load dataset<\/span><\/p>\n<p><span class=\"crayon-v\">url<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8216;https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/auto-insurance.csv&#8217;<\/span><\/p>\n<p><span class=\"crayon-v\">dataframe<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">read_csv<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">url<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">header<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">None<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># split into input and output elements<\/span><\/p>\n<p><span class=\"crayon-v\">data<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">dataframe<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">values<\/span><\/p>\n<p><span class=\"crayon-v\">data<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">astype<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;float32&#8217;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-p\"># split into train and test sets<\/span><\/p>\n<p><span class=\"crayon-v\">X_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X_test<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_test<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">train_test_split<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">test_size<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">0.33<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># define search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">AutoSklearnRegressor<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">time_left_for_this_task<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-o\">*<\/span><span class=\"crayon-cn\">60<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">per_run_time_limit<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">30<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">n_jobs<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">8<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># perform the search<\/span><\/p>\n<p><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">metric<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">auto_mean_absolute_error<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># summarize<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">sprint_statistics<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-p\"># evaluate best model<\/span><\/p>\n<p><span class=\"crayon-v\">y_hat<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">predict<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_test<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">mae<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">mean_absolute_error<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">y_test<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_hat<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8220;MAE: %.3f&#8221;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">%<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">mae<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0008 seconds] --><\/p>\n<p>Running the example will take about five minutes, given the hard limit we imposed on the run.<\/p>\n<p>You might see some warning messages during the run and you can safely ignore them, such as:<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10f0393613426\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nTarget Algorithm returned NaN or inf as quality. Algorithm run is treated as CRASHED, cost is set to 1.0 for quality scenarios. (Change value through &#8220;cost_for_crash&#8221;-option.)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p>Target Algorithm returned NaN or inf as quality. Algorithm run is treated as CRASHED, cost is set to 1.0 for quality scenarios. (Change value through &#8220;cost_for_crash&#8221;-option.)<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>At the end of the run, a summary is printed showing that 1,759 models were evaluated and the estimated performance of the final model was a MAE of 29.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<div id=\"urvanov-syntax-highlighter-5f556d84b10f2286024825\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-mac print-yes notranslate\" data-settings=\" minimize scroll-mouseover\">\n<p><textarea class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly><br \/>\nauto-sklearn results:<br \/>\nDataset name: ff51291d93f33237099d48c48ee0f9ad<br \/>\nMetric: mean_absolute_error<br \/>\nBest validation score: 29.911203<br \/>\nNumber of target algorithm runs: 1759<br \/>\nNumber of successful target algorithm runs: 1362<br \/>\nNumber of crashed target algorithm runs: 394<br \/>\nNumber of target algorithms that exceeded the time limit: 3<br \/>\nNumber of target algorithms that exceeded the memory limit: 0<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\">\n<p>auto-sklearn results:<\/p>\n<p>Dataset name: ff51291d93f33237099d48c48ee0f9ad<\/p>\n<p>Metric: mean_absolute_error<\/p>\n<p>Best validation score: 29.911203<\/p>\n<p>Number of target algorithm runs: 1759<\/p>\n<p>Number of successful target algorithm runs: 1362<\/p>\n<p>Number of crashed target algorithm runs: 394<\/p>\n<p>Number of target algorithms that exceeded the time limit: 3<\/p>\n<p>Number of target algorithms that exceeded the memory limit: 0<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<\/div>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<p>We then evaluate the model on the holdout dataset and see that a MAE of 26 was achieved, which is a great result.<\/p>\n<p><!-- Urvanov Syntax Highlighter v2.8.13 --><\/p>\n<p><!-- [Format Time: 0.0000 seconds] --><\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to use Auto-Sklearn for AutoML with Scikit-Learn machine learning algorithms in Python.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Auto-Sklearn is an open-source library for AutoML with scikit-learn data preparation and machine learning models.<\/li>\n<li>How to use Auto-Sklearn to automatically discover top-performing models for classification tasks.<\/li>\n<li>How to use Auto-Sklearn to automatically discover top-performing models for regression tasks.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>Ask your questions in the comments below and I will do my best to answer.<\/p>\n<div class=\"widget_text awac-wrapper\" id=\"custom_html-78\">\n<div class=\"widget_text awac widget custom_html-78\">\n<div class=\"textwidget custom-html-widget\">\n<div>\n<h2>Discover Fast Machine Learning in Python!<\/h2>\n<p><a href=\"\/machine-learning-with-python\/\" rel=\"nofollow\"><img decoding=\"async\" src=\"https:\/\/3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com\/wp-content\/uploads\/2014\/07\/MachineLearningMasteryWithPython-220px.png\" alt=\"Master Machine Learning With Python\" align=\"left\"><\/a><\/p>\n<h4>Develop Your Own Models in Minutes<\/h4>\n<p>&#8230;with just a few lines of scikit-learn code<\/p>\n<p>Learn how in my new Ebook:<br \/><a href=\"\/machine-learning-with-python\/\" rel=\"nofollow\">Machine Learning Mastery With Python<\/a><\/p>\n<p>Covers <strong>self-study tutorials<\/strong> and <strong>end-to-end projects<\/strong> like:<br \/><em>Loading data<\/em>, <em>visualization<\/em>, <em>modeling<\/em>, <em>tuning<\/em>, and much more&#8230;<\/p>\n<h4>Finally Bring Machine Learning To<br \/>Your Own Projects<\/h4>\n<p>Skip the Academics. Just Results.<\/p>\n<p><a href=\"\/machine-learning-with-python\/\" class=\"woo-sc-button  red\"><span class=\"woo-\">See What&#8217;s Inside<\/span><\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/machinelearningmastery.com\/auto-sklearn-for-automated-machine-learning-in-python\/<\/p>\n","protected":false},"author":0,"featured_media":186,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/185"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=185"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/185\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/186"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=185"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}