{"id":1361,"date":"2021-12-14T18:03:07","date_gmt":"2021-12-14T18:03:07","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/12\/14\/train-and-deploy-a-fairmot-model-with-amazon-sagemaker\/"},"modified":"2021-12-14T18:03:07","modified_gmt":"2021-12-14T18:03:07","slug":"train-and-deploy-a-fairmot-model-with-amazon-sagemaker","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/12\/14\/train-and-deploy-a-fairmot-model-with-amazon-sagemaker\/","title":{"rendered":"Train and deploy a FairMOT model with Amazon SageMaker"},"content":{"rendered":"<div id=\"\">\n<p>Multi-object tracking (MOT) in video analysis is increasingly in demand in many industries, such as live sports, manufacturing, surveillance, and traffic monitoring. For example, in live sports, MOT can track soccer players in real time to analyze physical performance such as real-time speed and moving distance.<\/p>\n<p>Previously, most methods were designed to separate MOT into two tasks: object detection and association. The object detection task detects objects first. The association task extracts re-identification (re-ID) features from image regions for each detected object, and links each detected object through re-ID features to existing tracks or creates a new track. It\u2019s challenging to do real-time inference in an environment with a large number of objects. This is because two tasks extract features respectively and the association task needs to run re-ID feature extraction for each object. Some proposed one-shot MOT methods add a re-ID branch to the object detection network to conduct object detection and association simultaneously. This reduces the inference time, but sacrifices the tracking performance.<\/p>\n<p><a href=\"https:\/\/arxiv.org\/pdf\/2004.01888.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">FairMOT<\/a> is a one-shot tracking method with two homogeneous branches for detecting objects and extracting re-ID features. FairMOT has higher performance than the two-step methods\u2014it reaches a speed of about 30 FPS on the <a href=\"https:\/\/motchallenge.net\" target=\"_blank\" rel=\"noopener noreferrer\">MOT challenge datasets<\/a>. This improvement helps MOT find its way in many industrial scenarios.<\/p>\n<p><a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> is a fully managed service that provides every developer and data scientist with the ability to prepare, build, train, and deploy machine learning (ML) models quickly. SageMaker provides several built-in algorithms and container images that you can use to accelerate training and deployment of ML models. Additionally, custom algorithms such as FairMOT can also be supported via custom-built Docker container images.<\/p>\n<p>This post demonstrates how to train and deploy a FairMOT model with SageMaker, optimize it using hyperparameter tuning, and make predictions in real time as well as batch mode.<\/p>\n<h2>Overview of the solution<\/h2>\n<p>Our solution consists of the following high-level steps:<\/p>\n<ol>\n<li>Set up your resources.<\/li>\n<li>Use SageMaker to train a FairMOT model and tune hyperparameters on the <a href=\"https:\/\/motchallenge.net\" target=\"_blank\" rel=\"noopener noreferrer\">MOT challenge dataset<\/a>.<\/li>\n<li>Run real-time inference.<\/li>\n<li>Run batch inference.<\/li>\n<\/ol>\n<h2>Prerequisites<\/h2>\n<p>Before getting started, complete the following prerequisites:<\/p>\n<ol>\n<li><a href=\"https:\/\/aws.amazon.com\/premiumsupport\/knowledge-center\/create-and-activate-aws-account\/\" target=\"_blank\" rel=\"noopener noreferrer\">Create an AWS account<\/a> or use an existing AWS account.<\/li>\n<li>Make sure that you have a minimum of one ml.p3.16xlarge instance for the training job.<\/li>\n<li>Make sure that you have a minimum of one ml.p3.2xlarge instance for inference endpoint.<\/li>\n<li>Make sure that you have a minimum of one ml.p3.2xlarge instance for processing jobs.<\/li>\n<\/ol>\n<p>If this is your first time training a model, deploying a model, or running a processing job on the previously mentioned instance sizes, you must <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/regions-quotas.html\" target=\"_blank\" rel=\"noopener noreferrer\">request a service quota increase for SageMaker training job<\/a>.<\/p>\n<h2>Set up your resources<\/h2>\n<p>After you complete all the prerequisites, you\u2019re ready to deploy the necessary resources.<\/p>\n<ol>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/howitworks-create-ws.html\" target=\"_blank\" rel=\"noopener noreferrer\">Create a SageMaker notebook instance<\/a>. For this task, we recommend the ml.t3.medium instance type. The default volume size is 5 GB; you must increase the volume size to 100 GB. For your <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) role, choose an existing role or create a new role, and attach the <code>AmazonSageMakerFullAccess<\/code> and <code>AmazonElasticContainerRegistryPublicFullAccess<\/code> policies to the role.<\/li>\n<li>Clone the <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-multiple-object-tracking\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub repo<\/a> to the notebook you created.<\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/userguide\/creating-bucket.html\" target=\"_blank\" rel=\"noopener noreferrer\">Create<\/a> a new <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) bucket or use an existing bucket.<\/li>\n<\/ol>\n<h2>Train a FairMOT model<\/h2>\n<p>To train your FairMOT model, we use the <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-multiple-object-tracking\/blob\/main\/fairmot-training.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">fairmot-training.ipynb<\/a> notebook. The following diagram outlines the logical flow implemented in this code.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/ML-5167-image001.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-30988\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/ML-5167-image001.jpg\" alt=\"\" width=\"608\" height=\"491\"><\/a><\/p>\n<p>In the Initialize SageMaker section, we define the S3 bucket location and dataset name, and choose either to train on the entire dataset (by setting the <code>half_val<\/code> parameter to 0) or split it into training and validation (<code>half_val<\/code> is set to 1). We use the latter mode for hyperparameter tuning.<\/p>\n<p>Next, the <code>prepare-s3-bucket.sh<\/code> script downloads the dataset from <a href=\"https:\/\/motchallenge.net\/\" target=\"_blank\" rel=\"noopener noreferrer\">MOT challenge<\/a>, converts it, and uploads it to the S3 bucket. We tested training the model using the <a href=\"https:\/\/motchallenge.net\/data\/MOT17\/\" target=\"_blank\" rel=\"noopener noreferrer\">MOT17<\/a> and <a href=\"https:\/\/motchallenge.net\/data\/MOT20\/\" target=\"_blank\" rel=\"noopener noreferrer\">MOT20<\/a> datasets, but you can try training with other MOT datasets as well.<\/p>\n<p>In the Build and push SageMaker training image section, we create a custom container image with the FairMOT training algorithm. You can find the definition of the Docker image in the <code>container-dp<\/code> folder. Because this container image consumes about 13.5 GB volume, the <code>prepare-docker.sh<\/code> script changes the default directory of the local temporary Docker image in order to avoid the \u201cno space\u201d error. The <code>build_and_push.sh<\/code> command does just that\u2014it builds and pushes the container to <a href=\"http:\/\/aws.amazon.com\/ecr\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Container Registry<\/a> (Amazon ECR). You should be able to validate the result on the Amazon ECR console.<\/p>\n<p>Finally, the Define a training job section initiates the model training. You can observe the model training on the SageMaker console on the <strong>Training Jobs<\/strong> page. The model shows an <code>In progress<\/code> status first and changes to <code>Completed<\/code> in about 3 hours (if you\u2019re running the notebook as is). You can access corresponding training metrics on the training job details page, as shown in the following screenshot.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/ML-5167-image003.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-30989\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/ML-5167-image003.png\" alt=\"\" width=\"1181\" height=\"1017\"><\/a><\/p>\n<h2>Training metrics<\/h2>\n<p>The FairMOT model is based on a backbone network with object detection and re-ID branches on top. The object detection branch has three parallel heads to estimate heatmaps, object center offsets, and bounding box sizes. During the training phase, each head has a corresponding loss value: <code>hm_loss<\/code> for heatmap, <code>offset_loss<\/code> for center offsets, and <code>wh_loss<\/code> for bounding box sizes. The re-ID branch has an <code>id_loss<\/code> for the re-ID feature learning. Based on these four loss values, a total loss named loss is calculated for the entire network. We monitor all loss values on both the training and validation datasets. During hyperparameter tuning, we rely on <code>ObjectiveMetric<\/code> to select the best-performing model.<\/p>\n<p>When the training job is complete, note the URI of your model in the <strong>Output<\/strong> section of the job details page.<\/p>\n<p>Finally, the last section of the notebook demonstrates SageMaker hyperparameter optimization (HPO). The right combination of hyperparameters can improve performance of ML models; however, finding one manually is time-consuming. <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/automatic-model-tuning.html\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker hyperparameter tuning<\/a> helps automate the process. We simply define the range for each tuning hyperparameter and the objective metric, while HPO does the rest.<\/p>\n<p>To accelerate the process, SageMaker HPO can run multiple training jobs in parallel. In the end, the best training job provides the most optimal hyperparameters for the model, which you can then use for training on the entire dataset.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/ML-5167-image005.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-30990\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/ML-5167-image005.png\" alt=\"\" width=\"1212\" height=\"775\"><\/a><\/p>\n<h2>Perform real-time inference<\/h2>\n<p>In this section, we use the <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-multiple-object-tracking\/blob\/main\/fairmot-inference.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">fairmot-inference.ipynb<\/a> notebook. Similar to the training notebook, we begin by initializing SageMaker parameters and building a custom container image. The inference container is then deployed with the model we built earlier. The model is referenced via the <code>s3_model_uri<\/code> variable\u2014you should double-check to make sure it links to the correct URI (adjust manually if necessary).<\/p>\n<p>The following diagram illustrates the inference flow.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/18\/fairmot-blog-inference.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-31017 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/18\/fairmot-blog-inference.png\" alt=\"\" width=\"561\" height=\"441\"><\/a><\/p>\n<p>After our custom container is deployed on a SageMaker inference endpoint, we\u2019re ready to test. First, we download a test video from <a href=\"https:\/\/raw.githubusercontent.com\/ifzhang\/FairMOT\/master\/videos\/MOT16-03.mp4\" target=\"_blank\" rel=\"noopener noreferrer\">MOT16-03<\/a>. Next, in our inference loop, we use OpenCV to split the video into individual frames, convert them to base64, and make predictions by calling the deployed inference endpoint.<\/p>\n<p>The following code demonstrates this logic implemented with the SageMaker SDK:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">frame_path = # the path of a frame\nwith open(frame_path, \"rb\") as image_file:\n        img_data = base64.b64encode(image_file.read())\n        data = {\"frame_id\": frame_id}\n        data[\"frame_data\"] = img_data.decode(\"utf-8\")\n        if frame_id == 0:\n            data[\"frame_w\"] = frame_w\n            data[\"frame_h\"] = frame_h\n            data[\"batch_size\"] = 1\n        body = json.dumps(data).encode(\"utf-8\")\n    \n    os.remove(frame_path)\n    response = client.invoke_endpoint(\n        EndpointName=endpoint_name, ContentType=\"application\/json\", Accept=\"application\/json\", Body=body\n    )\n\n    body = response[\"Body\"].read()<\/code><\/pre>\n<\/p><\/div>\n<p>The resulting video is stored in <code>{root_directory}\/datasets\/test.mp4<\/code>. The following is a sample frame. The same person in consecutive frames is wrapped by a bounding box with a unique ID.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/ML-5167-image009.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-30992\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/ML-5167-image009.png\" alt=\"\" width=\"1920\" height=\"1084\"><\/a><\/p>\n<h2>Perform batch inference<\/h2>\n<p>Now that we implemented and validated the FairMOT model using a frame-by-frame inference endpoint, we build a container that can process the entire video as a whole. This allows us to use FairMOT as a step in more complex video processing pipelines. We use a SageMaker processing job to achieve this goal, as demonstrated in the <a href=\"https:\/\/github.com\/aws-samples\/amazon-sagemaker-multiple-object-tracking\/blob\/main\/fairmot-batch-inference.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">fairmot-batch-inference.ipynb<\/a> notebook.<\/p>\n<p>Once again, we begin with SageMaker initialization and building a custom container image. This time we encapsulate the frame-by-frame inference loop into the container itself (the <code>predict.py<\/code> script). Our test data is <a href=\"https:\/\/raw.githubusercontent.com\/ifzhang\/FairMOT\/master\/videos\/MOT16-03.mp4)m\" target=\"_blank\" rel=\"noopener noreferrer\">MOT16-03<\/a>, pre-staged in the S3 bucket. As in the previous steps, make sure that the <code>s3_model_uri<\/code> variable refers to the correct model URI.<\/p>\n<p>SageMaker processing jobs rely on Amazon S3 for input and output data placement. The following diagram demonstrates our workflow.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/ML-5167-image011.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-30993\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/ML-5167-image011.jpg\" alt=\"\" width=\"578\" height=\"403\"><\/a><\/p>\n<p>In the Run batch inference section, we create an instance of <code>ScriptProcessor<\/code> and define the path for input and output data, as well as the target model. We then run the processor, and the resulting video is placed into the location defined in the <code>s3_output<\/code> variable. It looks the same as the resulting video generated in the previous section.<\/p>\n<h2>Clean up<\/h2>\n<p>To avoid unnecessary costs, delete the resources you created as part of this solution, including the inference endpoint.<\/p>\n<h2>Conclusion<\/h2>\n<p>This post demonstrated how to use SageMaker to train and deploy an object tracking model based on FairMOT. You can use a similar approach to implement other custom algorithms. Although we used public datasets in this example, you can certainly accomplish the same with your own dataset. <a href=\"https:\/\/aws.amazon.com\/sagemaker\/groundtruth\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Ground Truth<\/a> can help you with the labeling, and SageMaker <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/docker-containers.html\" target=\"_blank\" rel=\"noopener noreferrer\">custom containers<\/a> simplify implementation.<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/Gordon-Wang.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-30987 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/11\/17\/Gordon-Wang.jpg\" alt=\"\" width=\"100\" height=\"133\"><\/a>Gordon Wang<\/strong> is a Data Scientist on the Professional Services team at Amazon Web Services. He supports customers in many industries, including media, manufacturing, energy, and healthcare. He is passionate about computer vision, deep learning, and MLOps. In his spare time, he loves running and hiking.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/train-and-deploy-a-fairmot-model-with-amazon-sagemaker\/<\/p>\n","protected":false},"author":0,"featured_media":1362,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1361"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1361"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1361\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1362"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1361"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1361"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1361"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}