{"id":1339,"date":"2021-12-09T15:39:49","date_gmt":"2021-12-09T15:39:49","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/12\/09\/automating-machine-learning-with-images\/"},"modified":"2021-12-09T15:39:49","modified_gmt":"2021-12-09T15:39:49","slug":"automating-machine-learning-with-images","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/12\/09\/automating-machine-learning-with-images\/","title":{"rendered":"Automating Machine Learning with Images"},"content":{"rendered":"<div>\n<p>Following our previous posts on <strong><a href=\"https:\/\/bigml.com\/image-processing\/\" target=\"_blank\" rel=\"noreferrer noopener\">Image processing in BigML<\/a><\/strong>, the turn has arrived to discuss automation for datasets with images. As BigMLers will already know, BigML offers automation that can be used on the server-side thanks to the <a rel=\"noreferrer noopener\" href=\"https:\/\/bigml.com\/whizzml\" target=\"_blank\"><strong>WhizzML<\/strong><\/a> language, which has been designed especially for Machine Learning tasks, but it also offers <strong><a rel=\"noreferrer noopener\" href=\"https:\/\/bigml.com\/tools\/bindings\" target=\"_blank\">client-side bindings<\/a><\/strong> for many programming languages. In this post, we\u2019ll review the <strong><a rel=\"noreferrer noopener\" href=\"https:\/\/bigml.readthedocs.io\/en\/latest\/\" target=\"_blank\">Python bindings<\/a><\/strong> approach.<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" data-attachment-id=\"30251\" data-permalink=\"https:\/\/blog.bigml.com\/bigml_python_images_rrss\/\" data-orig-file=\"https:\/\/littleml.files.wordpress.com\/2021\/12\/bigml_python_images_rrss.jpg\" data-orig-size=\"1200,630\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"bigml_python_images_rrss\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/littleml.files.wordpress.com\/2021\/12\/bigml_python_images_rrss.jpg?w=300\" data-large-file=\"https:\/\/littleml.files.wordpress.com\/2021\/12\/bigml_python_images_rrss.jpg?w=810\" src=\"https:\/\/littleml.files.wordpress.com\/2021\/12\/bigml_python_images_rrss.jpg?w=1024\" alt=\"\" class=\"wp-image-30251\"><\/figure>\n<h2 id=\"the-file-field-duality-of-images\">The File-Field Duality of Images<\/h2>\n<p>From the Machine Learning point of view, each image is a source for many fields, like the light levels in some regions, their color information, or the shapes it contains. In that sense, we need to think of an image as a composed field, just as we could think about other composed fields like text or date-time fields. At the same time, all this information in packed in an image file, and, as you probably know, uploading any file to BigML generates a <strong>Source<\/strong> resource. This will also happen in this case for any uploaded image. But then, how do we manage to work with images as any other field? Well, simply by allowing <strong>Sources<\/strong> to contain other sources as components. That\u2019s why BigML now offers <strong>Composite Sources<\/strong>.<\/p>\n<h2 id=\"creating-composite-sources\">Creating Composite Sources<\/h2>\n<p>The simplest way to create a <strong>Composite Source<\/strong> from a list of images is to pack them in a compressed file and upload them as you would upload any file to BigML. Of course, if the file is located in your computer, you will need to use some kind of client-side binding to help you automate that. After <a href=\"https:\/\/bigml.readthedocs.io\/en\/latest\/#installation\">installing<\/a> the Python bindings and <a href=\"https:\/\/bigml.readthedocs.io\/en\/latest\/#authentication\">setting your credentials<\/a> as environment variables, the code to upload data to BigML will be:<\/p>\n<pre class=\"wp-block-preformatted\"><span>from<\/span> <span>bigml.api<\/span> <span>import<\/span> <span>BigML<br><\/span><span>api<\/span> <span>=<\/span> <span>BigML<\/span><span>()<br><\/span><span>composite_source<\/span> <span>=<\/span> <span>api<\/span><span>.<\/span><span>create_source<\/span><span>(<\/span><span>\"my_images.zip\"<\/span><span>)<\/span>\n<\/pre>\n<p>BigML opens the compressed file and creates one <strong>Source<\/strong> per image file plus a <strong>Composite Source<\/strong> that contains one image per row. In addition to that, if the images in the compressed file are organized in folders, a new <strong>label<\/strong> field is added to each row to store the name of the <strong>folder<\/strong> the image was in. Thus, if you upload a compressed file containing two folders: <em>cats<\/em> and <em>dogs<\/em> with the corresponding images,\u00a0 the <strong>Composite Source<\/strong> will automatically handle labeling for you.<\/p>\n<p>In order to apply some Machine Learning to the image information, the next step is deciding what kind of features will we extract from the image that might be relevant to our problem. The <strong>Composite Source<\/strong> is the right place to configure the <em>parsing<\/em> that will be used to extract the image information. BigML offers different <a href=\"https:\/\/bigml.com\/api\/sources\">options for image analysis<\/a>, and they can be applied at the source level to determine the interpretation of your image. Of course, you can choose to let BigML decide for you and go with defaults, but in case you want to change the type of analysis to use, just update your <strong>Composite Source<\/strong>.<\/p>\n<pre class=\"wp-block-preformatted\"><span>from<\/span> <span>bigml.api<\/span> <span>import<\/span> <span>BigML<br><\/span><span>api<\/span> <span>=<\/span> <span>BigML<\/span><span>()<br><\/span><span>composite_source<\/span> <span>=<\/span> <span>api<\/span><span>.<\/span><span>update_source<\/span><span>(<br><\/span><span>    composite_source<\/span><span>,<\/span>\n<span>    {<\/span><span>\"image_analysis\"<\/span><span>:<\/span> <span>{<\/span>\n<span>     \"enabled\"<\/span><span>:<\/span> <span>True<\/span><span>,<\/span>\n<span>     \"extracted_features\"<\/span><span>:<\/span> <span>[<\/span><span>\"average_pixels\"<\/span><span>]}})<\/span>\n<\/pre>\n<p>That configuration will cause every image in your <strong>Dataset<\/strong> to be represented by the the average intensity of pixels in different regions (see the <a href=\"https:\/\/bigml.com\/api\/sources\">API documents<\/a> to learn about the available options and their derived features). Of course, you can apply more than one type of extraction to your images. The <em>extracted_features<\/em> attribute will accept a list of the options that you want to use, and features will be generated when a <strong>Dataset<\/strong> is built from your <strong>Composite Source<\/strong>.<\/p>\n<h2 id=\"annotating-your-images\">Annotating your Images<\/h2>\n<p>In order to be able to use images as one more field in your <strong>Datasets<\/strong>, you need to be able to assign other additional information to each image you upload to BigML. Let\u2019s start with the simple case of image classification.<\/p>\n<p>In order to train an image classifier, you\u2019ll need to provide the images and the label that you want to associate to each of them. The simplest way to do that is to upload a compressed file that contains, not only the images, but also a <strong>CSV<\/strong> or <strong>JSON<\/strong> file with the annotations. The annotations file should have at least a field that contains the path to the image file, as found in the compressed file, and another field where the label is stored.\u00a0<\/p>\n<pre class=\"wp-block-preformatted\"><span>from<\/span> <span>bigml.api<\/span> <span>import<\/span> <span>BigML<\/span>\n<span># my_annotated_images.zip contains a list of images plus an annotations file<\/span>\n<span># e.g.<\/span>\n<span># image1.jpg<\/span>\n<span># image2.jpg<\/span>\n<span># annotations.csv<\/span>\n\n<span>api<\/span> <span>=<\/span> <span>BigML<\/span><span>()<\/span>\n<span>composite_source<\/span> <span>=<\/span> <span>api<\/span><span>.<\/span><span>create_source<\/span><span>(<\/span><span>\"my_annotated_images.zip\"<\/span><span>)<\/span>\n \n<\/pre>\n<p>The result will be a <em>table+image<\/em> <strong>Composite Source<\/strong>, where each row will contain both the associated label and the Id of the <strong>Source<\/strong> created when uploading the corresponding image. This automated link between the contents of your CSV and the image <strong>Sources<\/strong> will happen when BigML is able to find a field that contains the names of the image files. Keep in mind, the CSV can contain any number of additional fields too, regardless of their type.<\/p>\n<h2 id=\"editing-annotations\">Editing Annotations<\/h2>\n<p>Probably you\u2019ve experienced the need to change or add new labels to an existing set of images. There are plenty of reasons for that: you did not finish your labeling yet, or you redefined your labels in order to improve your results. In that case, you should be able to change the contents of the labels associated to the images in your <strong>Composite Source<\/strong>. The <em>table+image<\/em> format will not allow you to alter its contents out of the box, but there\u2019s a different way of creating a <strong>Composite Source<\/strong> for annotated images that will:<\/p>\n<pre class=\"wp-block-preformatted\"><span>from<\/span> <span>bigml.api<\/span> <span>import<\/span> <span>BigML<br><\/span><span>api<\/span> <span>=<\/span> <span>BigML<\/span><span>()<br><\/span><span>composite_source<\/span> <span>=<\/span> <span>api<\/span><span>.<\/span><span>create_annotated_source<\/span><span>(<\/span><span>\"annotated_images.json\"<\/span><span>)<\/span><\/pre>\n<p>The <em>annotated_images.json<\/em> file should point to two files: the zip file, containing exclusively the images, and the annotations file that contains the labels. The structure for this annotations file would be:<\/p>\n<pre class=\"wp-block-preformatted\">{\"<strong>description<\/strong>\": \"Fruit images to test colour distributions\",<br>\"<strong>images_file<\/strong>\": \".\/fruits_hist.zip\",<br>\"<strong>new_fields<\/strong>\": [{\"name\": \"new_label\", \"optype\": \"categorical\"}],<br>\"<strong>source_id<\/strong>\": null,<br>\"<strong>annotations<\/strong>\": \".\/annotations_detail.json\"}<\/pre>\n<p>You can see that the <em>images_file<\/em> attribute points to the images zip file. The <em>source_id<\/em> attribute can be set to the <strong>Composite Source<\/strong> Id once it has been created. If used, images will not be uploaded again. A <em>new_fields<\/em> attribute declares the <em>new_label<\/em> field that will be added to the Source. The <em>annotations\u00a0<\/em>attribute points to the annotations file, which would be similar to this one:<\/p>\n<pre class=\"wp-block-preformatted\">[{\"<strong>file<\/strong>\": \"apples\/fruits1f.png\", \"<strong>new_label<\/strong>\": \"Green\"},<br>{\"<strong>file<\/strong>\": \"apples\/fruits1.png\", \"<strong>new_label<\/strong>\": \"Red\"},<br>{\"<strong>file<\/strong>\": \"berries\/fruits2e.png\", \"<strong>new_label<\/strong>\": \"Blue\"}]<\/pre>\n<p>By using this method, an editable <em>image<\/em> Composite Source is created, and its labels can be modified <strong>until a Dataset is built<\/strong> from it (or you decide to <strong>close<\/strong> the Source for editing).<\/p>\n<p>But why should we ever <em>close<\/em> the <strong>Source<\/strong> and avoid any further editing?<\/p>\n<h2 id=\"from-immutability-to-interpretability\">From Immutability to Interpretability<\/h2>\n<p>On top of a good perfomance, today\u2019s Machine Learning solutions need to provide accountability, interpretability and reproducibility. The fact that resources in BigML are immutable has always ensured those qualities. However, as shown in the previous section, <strong>Composite Sources<\/strong> allow some significant editing, like labeling our images. In order to maintain traceability of the Machine Learning models, we need to block that editing the minute we use the contents of the <strong>Source<\/strong> to produce other Machine Learning resources from them, that is, when building a <strong>Dataset<\/strong>. Once a Dataset is built from it, the Source is automatically closed (its <code>closed<\/code> attribute is set to <code>True<\/code>) for editing, and accepts no more modifications. If you need to modify a closed <strong>Source<\/strong>, you\u2019ll need to clone it.<\/p>\n<pre class=\"wp-block-preformatted\"><span>from<\/span> <span>bigml.api<\/span> <span>import<\/span> <span>BigML<\/span>\n<span>api<\/span> <span>=<\/span> <span>BigML<\/span><span>()<\/span>\n<span>closed_source<\/span> <span>=<\/span> <span>\"source\/61786d54520f9017b5000005\"<\/span> <span># this source is closed<\/span>\n<span>open_cloned_source<\/span> <span>=<\/span> <span>api<\/span><span>.<\/span><span>clone_source<\/span><span>(<\/span><span>closed_source<\/span><span>) <\/span><span><span># this one is open<\/span><\/span> <\/pre>\n<p>The resulting <strong>Source\u00a0<\/strong>is an open clone of the original one, ready to be changed again.<\/p>\n<p>As you\u2019ve seen, adding images to BigML\u2019s models is an easy task that starts by uploading them a<strong>s Composite Sources<\/strong>. From then on, images can be analyzed not only by <strong>Deepnets<\/strong> and their Convolutional Neural Networks but also by all kind of <strong>Supervised<\/strong> and <strong>Unsupervised<\/strong> models, like <strong>Clusters<\/strong> or <strong>Anomaly Detectors<\/strong>, thanks to BigML\u2019s automated feature extraction.\u00a0\u00a0<\/p>\n<h2 id=\"want-to-know-more-about-image-processing\">Want to know more about Image Processing?<\/h2>\n<p>We still have things to show about the BigML release and how building Image-based Machine Learning models will be a piece of cake from now on. Please\u00a0<a rel=\"noreferrer noopener\" href=\"https:\/\/bigml.com\/releases\/image-processing\" target=\"_blank\"><strong>visit the dedicated release page<\/strong><\/a>\u00a0and<strong>\u00a0join the FREE live webinar on Wednesday, December 15\u00a0<\/strong>at 8:30 AM PST \/ 10:30 AM CST \/ 5:30 PM CET to learn more.\u00a0<strong><a rel=\"noreferrer noopener\" href=\"https:\/\/attendee.gotowebinar.com\/register\/3316692637331486991\" target=\"_blank\">Register today,<\/a><\/strong>\u00a0<strong>space is limited!<\/strong><\/p>\n<div id=\"jp-post-flair\" class=\"sharedaddy sharedaddy-dark sd-like-enabled sd-sharing-enabled\">\n<div class=\"sharedaddy sd-block sd-like jetpack-likes-widget-wrapper jetpack-likes-widget-unloaded\" id=\"like-post-wrapper-30283844-29855-61b22278dee8f\" data-src=\"\/\/widgets.wp.com\/likes\/index.html?ver=20211208#blog_id=30283844&amp;post_id=29855&amp;origin=littleml.wordpress.com&amp;obj_id=30283844-29855-61b22278dee8f&amp;domain=blog.bigml.com\" data-name=\"like-post-frame-30283844-29855-61b22278dee8f\" data-title=\"Like or Reblog\">\n<h3 class=\"sd-title\">Like this:<\/h3>\n<p><span class=\"button\"><span>Like<\/span><\/span> <span class=\"loading\">Loading&#8230;<\/span><\/p>\n<p><span class=\"sd-text-color\"><\/span><a class=\"sd-link-color\"><\/a><\/div>\n<h3 class=\"jp-relatedposts-headline\"><em>Relacionado<\/em><\/h3>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blog.bigml.com\/2021\/12\/09\/automating-machine-learning-with-images\/<\/p>\n","protected":false},"author":0,"featured_media":1340,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1339"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1339"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1339\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1340"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1339"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1339"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1339"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}