{"id":55,"date":"2020-08-17T07:53:57","date_gmt":"2020-08-17T07:53:57","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/08\/17\/monotonicity-constraints-in-machine-learning\/"},"modified":"2020-08-17T07:53:57","modified_gmt":"2020-08-17T07:53:57","slug":"monotonicity-constraints-in-machine-learning","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/08\/17\/monotonicity-constraints-in-machine-learning\/","title":{"rendered":"Monotonicity constraints in machine learning"},"content":{"rendered":"<div readability=\"140.75048355899\">\n<p>In practical machine learning and data science tasks, an ML model is often used to quantify a global, semantically meaningful relationship between two or more values. For example, a hotel chain might want to use ML to optimize their pricing strategy and use a model to estimate the likelihood of a room being booked at a given price and day of the week. For a relationship like this the assumption is that, all other things being equal, a cheaper price is preferred by a user, so demand is higher at a lower price. However what might easily happen is that upon building the model, the data scientist discovers that the model is behaving unexpectedly: for example the model predicts that on Tuesdays, the clients would rather pay $110 than $100 for a room! The reason is that while there is an expected monotonic relationship between price and the likelihood of booking, the model is unable to (fully) capture it, due to noisiness of the data and confounds in it.<\/p>\n<p>Too often, such constraints  are ignored by practitioners, especially when non-linear models such as random forests, gradient boosted trees or neural networks are used. And while monotonicity constraints have been a topic of academic research for a long time (see a <a href=\"http:\/\/www.kdd.org\/exploration_files\/potharst.pdf\">survey paper<\/a> on monotonocity constraints for tree based methods), there has been lack of support from libraries, making the problem hard to tackle for practitioners.<\/p>\n<p>Luckily, in recent years there has been a lot of progress in various ML libraries to allow setting monotonicity constraints for the models, including in <a href=\"https:\/\/github.com\/Microsoft\/LightGBM\">LightGBM<\/a> and <a href=\"https:\/\/github.com\/dmlc\/xgboost\">XGBoost<\/a>, two of the most popular libraries for gradient boosted trees. Monotonicity constraints have also been built into <a href=\"https:\/\/github.com\/tensorflow\/lattice\">Tensorflow Lattice<\/a>, a library that implements a novel method for creating interpolated lookup tables.<\/p>\n<h2>Monotonicity constraints in LighGBM and XGBoost<\/h2>\n<p>For tree based methods (decision trees, random forests, gradient boosted trees), monotonicity can be forced during the model learning phase by not creating splits on monotonic features that would break the monotonicity constraint. <\/p>\n<p>In the following example, let\u2019s train too models using LightGBM on a toy dataset where we know the relationship between X and Y to be monotonic (but noisy) and compare the default and monotonic model.<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nimport numpy as np\r\nsize = 100\r\nx = np.linspace(0, 10, size) \r\ny = x**2 + 10 - (20 * np.random.random(size))\r\n<\/pre>\n<p><a href=\"http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/data_plot.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-4729\" src=\"http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/data_plot.png\" alt=\"\" width=\"393\" height=\"261\" srcset=\"http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/data_plot.png 393w, http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/data_plot-300x199.png 300w\" sizes=\"(max-width: 393px) 100vw, 393px\"><\/a><br \/>Let\u2019s fit a fit a gradient boosted model on this data, setting <code>min_child_samples<\/code> to 5. <\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nimport lightgbm as lgb\r\noverfit_model = lgb.LGBMRegressor(silent=False, min_child_samples=5)\r\noverfit_model.fit(x.reshape(-1,1), y)\r\n\r\n#predicted output from the model from the same input\r\nprediction = overfit_model.predict(x.reshape(-1,1))\r\n<\/pre>\n<p>The model will slightly overfit (due to small <code>min_child_samples<\/code>), which we can see from plotting the values of X against the predicted values of Y: the red line is not monotonic as we\u2019d like it to be.<\/p>\n<p><a href=\"http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/model_fit.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-4730\" src=\"http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/model_fit.png\" alt=\"\" width=\"380\" height=\"274\" srcset=\"http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/model_fit.png 380w, http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/model_fit-300x216.png 300w\" sizes=\"(max-width: 380px) 100vw, 380px\"><\/a><\/p>\n<p>Since we know that that the relationship between X and Y should be monotonic, we can set this constraint when specifying the model.<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nmonotone_model = lgb.LGBMRegressor(min_child_samples=5, \r\n                                   monotone_constraints=\"1\")\r\nmonotone_model.fit(x.reshape(-1,1), y)\r\n<\/pre>\n<p>The parameter monotone_constraints=\u201d1\u2033 states that the output should be monotonically increasing wrt. the first features (which in our case happens to be the only feature). After training the monotone model, we can see that the relationship is now strictly monotone.<br \/><a href=\"http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/monotone_model_fit.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-4731\" src=\"http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/monotone_model_fit.png\" alt=\"\" width=\"401\" height=\"256\" srcset=\"http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/monotone_model_fit.png 401w, http:\/\/blog.datadive.net\/wp-content\/uploads\/2018\/09\/monotone_model_fit-300x192.png 300w\" sizes=\"(max-width: 401px) 100vw, 401px\"><\/a><\/p>\n<p>And if we check the model performance, we can see that not only does the monotonicity constraint provide a more natural fit, but the model generalizes better as well (as expected). Measuring the mean squared error on new test data, we see that error is smaller for the monotone model.<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nfrom sklearn.metrics import mean_squared_error as mse\r\n\r\nsize = 1000000\r\nx = np.linspace(0, 10, size) \r\ny = x**2  -10 + (20 * np.random.random(size))\r\n\r\nprint (\"Default model mse\", mse(y, overfit_model.predict(x.reshape(-1,1))))\r\nprint (\"Monotone model mse\", mse(y, monotone_model.predict(x.reshape(-1,1))))\r\n<\/pre>\n<p><code><br \/>Default model mse 37.61501106522855<br \/>Monotone model mse 32.283051723268265<br \/><\/code><\/p>\n<h2>Other methods for enforcing monotonicity<\/h2>\n<p>Tree based methods are not the only option for setting monotonicity constraint in the data. One recent development in the field is <a href=\"https:\/\/github.com\/tensorflow\/lattice\"> Tensorflow Lattice<\/a>, which implements lattice based models that are essentially interpolated look-up tables that can approximate arbitrary input-output relationships in the data and which can optionally be monotonic. There is a thorough <a href=\"https:\/\/github.com\/tensorflow\/lattice\/blob\/master\/g3doc\/tutorial\/index.md\">tutorial<\/a> on it in Tensorflow Github.<\/p>\n<p>If a curve is already given, monotonic spline can be fit on the data, for example using the <a href=\"https:\/\/stat.ethz.ch\/R-manual\/R-devel\/library\/stats\/html\/splinefun.html\">splinefun<\/a> package.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>http:\/\/blog.datadive.net\/monotonicity-constraints-in-machine-learning\/<\/p>\n","protected":false},"author":1,"featured_media":56,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/55"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=55"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/55\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/56"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=55"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=55"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=55"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}