{"id":1155,"date":"2021-11-06T08:40:00","date_gmt":"2021-11-06T08:40:00","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/11\/06\/run-alphafold-v2-0-on-amazon-ec2\/"},"modified":"2021-11-06T08:40:00","modified_gmt":"2021-11-06T08:40:00","slug":"run-alphafold-v2-0-on-amazon-ec2","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/11\/06\/run-alphafold-v2-0-on-amazon-ec2\/","title":{"rendered":"Run AlphaFold v2.0 on Amazon EC2"},"content":{"rendered":"<div id=\"\">\n<p>After the <a href=\"https:\/\/www.nature.com\/articles\/s41586-021-03819-2\" target=\"_blank\" rel=\"noopener noreferrer\">article in <em>Nature<\/em><\/a> about the open-source of <a href=\"https:\/\/deepmind.com\/research\/case-studies\/alphafold\" target=\"_blank\" rel=\"noopener noreferrer\">AlphaFold v2.0<\/a> on <a href=\"https:\/\/github.com\/deepmind\/alphafold\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub<\/a> by <a href=\"https:\/\/deepmind.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">DeepMind<\/a>, many in the scientific and research community have wanted to try out DeepMind\u2019s AlphaFold implementation firsthand. With compute resources through <a href=\"http:\/\/aws.amazon.com\/ec2\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Compute Cloud<\/a> (Amazon EC2) with Nvidia GPU, you can quickly get AlphaFold running and try it out yourself.<\/p>\n<p>In this post, I provide you with step-by-step instructions on how to install AlphaFold on an EC2 instance with Nvidia GPU.<\/p>\n<h2>Overview of solution<\/h2>\n<p>The process starts with a Deep Learning Amazon Machine Image (DLAMI). After installation, we run predictions using the AlphaFold model with CASP14 samples on the instance. I also show how to create an <a href=\"http:\/\/aws.amazon.com\/ebs\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Block Store<\/a> (Amazon EBS) snapshot for future use to reduce the effort of setting it up again and save costs.<\/p>\n<p>To run AlphaFold without setting up a new EC2 instance from scratch, go to the last section of this post. You can create a new EC2 instance with the provided public EBS snapshots in a short time.<\/p>\n<p>The total cost for the AWS resources used in this post cost is less than $100 if you finish all the steps and shut down all resources within 24 hours. If you create an EBS snapshot and store it inside your AWS account, the EBS snapshot storage cost is about $150 per month.<\/p>\n<h2>Launch an EC2 instance with a DLAMI<\/h2>\n<p>In this section, I demonstrate how to set up an EC2 instance using a DLAMI from AWS. It already has lots of AlphaFold\u2019s dependencies preinstalled and saves time on the setup.<\/p>\n<ol>\n<li>On the Amazon EC2 console, choose your preferred AWS Region.<\/li>\n<li>Launch a new EC2 instance with DLAMI by searching Deep Learning AMI. I use DLAMI version 48.0 based on Ubuntu 18.04. This is the latest version at the time of this writing.<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29721 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_ec2ami.png\" alt=\"EC2 AMI\" width=\"800\" height=\"328\"><\/li>\n<li>Select a p3.2xlarge instance with one GPU as the instance type. If you don\u2019t have enough quota on a p3.2xlarge instance, you can increase the Amazon EC2 quota on your AWS account.<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29724 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_p32xlarge.png\" alt=\"P3 2xlarge\" width=\"800\" height=\"183\"><\/li>\n<li>Configure the proper <a href=\"http:\/\/aws.amazon.com\/vpc\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Virtual Private Cloud<\/a> (Amazon VPC) setting based on your AWS environment requirements. If this is your first time configuring your Amazon VPC, consider using the default Amazon VPC and review <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/vpc-getting-started.html\" target=\"_blank\" rel=\"noopener noreferrer\">Get started with Amazon VPC<\/a>.<\/li>\n<li>Set the system volume to 200 GiB, and add one new data volume of 3 TB (3072 GiB) in size.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29725 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_ebsvolume.png\" alt=\"EBS Volume\" width=\"800\" height=\"163\"><\/li>\n<li>Make sure that the <a href=\"https:\/\/docs.aws.amazon.com\/AWSEC2\/latest\/UserGuide\/authorizing-access-to-an-instance.html\" target=\"_blank\" rel=\"noopener noreferrer\">security group settings<\/a> allow you to access the EC2 instance with SSH, and the EC2 instance can reach the internet to install AlphaFold and other packages.<\/li>\n<li>Launch the EC2 instance.<\/li>\n<li>Wait for the EC2 instance to become ready and use SSH to access the Amazon EC2 terminal.<\/li>\n<li>Optionally, if you have other required software for the new EC2 instance, install it now.<\/li>\n<\/ol>\n<h2>Install AlphaFold<\/h2>\n<p>You\u2019re now ready to install AlphaFold.<\/p>\n<ol>\n<li>After you use SSH to access the Amazon EC2 terminal, first update all packages:\n          <\/li>\n<li>Mount the data volume to the folder <code>\/data<\/code>. For more details, refer to the <a href=\"https:\/\/docs.aws.amazon.com\/AWSEC2\/latest\/UserGuide\/ebs-using-volumes.html\" target=\"_blank\" rel=\"noopener noreferrer\">Make an Amazon EBS volume available for use on Linux<\/a><\/li>\n<li>Use the <code>lsblk<\/code> command to view your available disk devices and their mount points (if applicable) to help you determine the correct device name to use:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">lsblk<\/code><\/pre>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29726 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_lsblk.png\" alt=\"lsblk command\" width=\"800\" height=\"247\"><\/p>\n<\/p><\/div>\n<\/li>\n<li>Determine whether there is a file system on the volume. New volumes are raw block devices, and you must create a file system on them before you can mount and use them. The device is an empty volume.\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">sudo file -s \/dev\/xvdb<\/code><\/pre>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29727 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_sudofile.png\" alt=\"sudo file\" width=\"652\" height=\"51\"><\/p>\n<\/p><\/div>\n<\/li>\n<li>Create a file system on the volume and mount the volume to the <code>\/data<\/code> folder:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">sudo mkfs.xfs \/dev\/xvdb\nsudo mkdir \/data\nsudo mount \/dev\/xvdb \/data\nsudo chown ubuntu:ubuntu -R \/data\ndf -h \/data<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Install the AlphaFold dependencies and any other required tools:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">sudo apt install aria2 rsync git vim wget tmux tree -y<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Create working folders, and clone the AlphaFold code from the GitHub repo:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">cd \/data\nmkdir -p \/data\/af_download_data\nmkdir -p \/data\/output\/alphafold\nmkdir -p \/data\/input\ngit clone https:\/\/github.com\/deepmind\/alphafold.git<\/code><\/pre>\n<\/p><\/div>\n<p>You use the new volume exclusively, so the snapshot you create later has all the necessary data.<\/p>\n<\/li>\n<li>Download the data using the provided scripts in the background. AlphaFold needs multiple genetic (sequence) database and model parameters.\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">nohup \/data\/alphafold\/scripts\/download_all_data.sh \/data\/af_download_data &amp;<\/code><\/pre>\n<\/p><\/div>\n<p>The whole download process could take over 10 hours, so wait for it to finish. You can use the following command to monitor the download and unzip process:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">du -sh \/data\/af_download_data\/*<\/code><\/pre>\n<\/p><\/div>\n<p>When the download process is complete, you should have the following files in your <code>\/data\/af_download_data<\/code> folder:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">$DOWNLOAD_DIR\/                             # Total: ~ 2.2 TB (download: 438 GB)\n    bfd\/                                   # ~ 1.7 TB (download: 271.6 GB)\n        # 6 files.\n    mgnify\/                                # ~ 64 GB (download: 32.9 GB)\n        mgy_clusters_2018_12.fa\n    params\/                                # ~ 3.5 GB (download: 3.5 GB)\n        # 5 CASP14 models,\n        # 5 pTM models,\n        # LICENSE,\n        # = 11 files.\n    pdb70\/                                 # ~ 56 GB (download: 19.5 GB)\n        # 9 files.\n    pdb_mmcif\/                             # ~ 206 GB (download: 46 GB)\n        mmcif_files\/\n            # About 180,000 .cif files.\n        obsolete.dat\n    small_bfd\/                             # ~ 17 GB (download: 9.6 GB)\n        bfd-first_non_consensus_sequences.fasta\n    uniclust30\/                            # ~ 86 GB (download: 24.9 GB)\n        uniclust30_2018_08\/\n            # 13 files.\n    uniref90\/                              # ~ 58 GB (download: 29.7 GB)\n        uniref90.fasta<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Update <code>\/data\/alphafold\/docker\/run_docker.py<\/code> to make the configuration march the local path:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">vim \/data\/alphafold\/docker\/run_docker.py<\/code><\/pre>\n<\/p><\/div>\n<p>With the folders you\u2019ve created, the configurations look like the following. If you set up a different folder structure in your EC2 instance, set it accordingly.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">#### USER CONFIGURATION ####\n\n# Set to target of scripts\/download_all_databases.sh\nDOWNLOAD_DIR = '\/data\/af_download_data'\n\n# Name of the AlphaFold Docker image.\ndocker_image_name = 'alphafold'\n\n# Path to a directory that will store the results.\noutput_dir = '\/data\/output\/alphafold'<\/code><\/pre>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29728 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_config.png\" alt=\"User Config\" width=\"800\" height=\"197\"><\/p>\n<\/p><\/div>\n<\/li>\n<li>Confirm the NVidia container kit is installed:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">sudo docker run --rm --gpus all nvidia\/cuda:11.0-base nvidia-smi<\/code><\/pre>\n<\/p><\/div>\n<p>You should see similar output to the following screenshot.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29729 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_nvidia.png\" alt=\"NVidia docker\" width=\"800\" height=\"386\"><\/p>\n<\/li>\n<li>Build the AlphaFold Docker image. Make sure that the local path is <code>\/data\/alphafold<\/code> because a <code>.dockerignore<\/code> file is under that folder.\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">cd \/data\/alphafold\ndocker build -f docker\/Dockerfile -t alphafold .\ndocker images<\/code><\/pre>\n<\/p><\/div>\n<p>You should see the new Docker image after the build is complete.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29730 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_dockerimage.png\" alt=\"alpha folder docker image\" width=\"800\" height=\"93\"><\/p>\n<\/li>\n<li>Use pip to install all Python dependencies required by AlphaFold:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">pip3 install -r \/data\/alphafold\/docker\/requirements.txt<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Go to the <a href=\"https:\/\/www.predictioncenter.org\/casp14\/targetlist.cgi\" target=\"_blank\" rel=\"noopener noreferrer\">CASP14 target list<\/a> and copy the sequence from the <a href=\"https:\/\/www.predictioncenter.org\/casp14\/target.cgi?target=T1050&amp;view=sequence\" target=\"_blank\" rel=\"noopener noreferrer\">plaintext link for T1050<\/a>.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29731 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_casp14.png\" alt=\"CASP14 T1050\" width=\"800\" height=\"610\"><\/li>\n<li>Copy the content into a new <code>T1050.fasta<\/code> file and save it under the <code>\/data\/input<\/code> folder.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29733 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_fasta.png\" alt=\"T1050 fasta file\" width=\"800\" height=\"72\"><\/li>\n<li>You can use this same process to create a few more <code>.fasta<\/code> files for testing under the <code>\/data\/input<\/code> folder.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29734 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_fastafiles.png\" alt=\"fasta files\" width=\"800\" height=\"157\"><\/li>\n<\/ol>\n<h2>Install CloudWatch monitoring for GPU (Optional)<\/h2>\n<p>Optionally, you can install <a href=\"http:\/\/aws.amazon.com\/cloudwatch\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon CloudWatch<\/a> monitoring for GPU. This requires an <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) role.<\/p>\n<ol>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/AmazonCloudWatch\/latest\/monitoring\/create-iam-roles-for-cloudwatch-agent.html\" target=\"_blank\" rel=\"noopener noreferrer\">Create an Amazon EC2 IAM role<\/a> for CloudWatch and attach it to the EC2 instance.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29735 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_iamrole.png\" alt=\"IAM Role\" width=\"800\" height=\"158\"><\/li>\n<li>Change the Region in <code>gpumon.py<\/code> if your instance is in another Region, and provide a new namespace like <code>AlphaFold<\/code> as the CloudWatch namespace:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">vim ~\/tools\/GPUCloudWatchMonitor\/gpumon.py<\/code><\/pre>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29736 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_gpumon.png\" alt=\"gpumon config\" width=\"800\" height=\"275\"><\/p>\n<\/p><\/div>\n<\/li>\n<li>Launch <code>gpumon<\/code> and start sending GPU metrics to CloudWatch:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">source activate python3\npython ~\/tools\/GPUCloudWatchMonitor\/gpumon.py &amp;<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<\/ol>\n<h2>Use AlphaFold for prediction<\/h2>\n<p>We\u2019re now ready to run predictions with AlphaFold.<\/p>\n<ol>\n<li>Use the following command to run a prediction of a protein sequence from <code>\/data\/input\/T1050.fasta<\/code>:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">nohup python3 \/data\/alphafold\/docker\/run_docker.py --fasta_paths=\/data\/input\/T1050.fasta --max_template_date=2020-05-14 &amp;<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Use the <code>tail<\/code> command to monitor the prediction progress:\n<p>The whole prediction takes a few hours to finish. When the prediction is complete, you should see the following in the output folder. In this case, <code>&lt;target_name&gt;<\/code> is <code>T1050<\/code>.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">&lt;target_name&gt;\/\n    features.pkl\n    ranked_{0,1,2,3,4}.pdb\n    ranking_debug.json\n    relaxed_model_{1,2,3,4,5}.pdb\n    result_model_{1,2,3,4,5}.pkl\n    timings.json\n    unrelaxed_model_{1,2,3,4,5}.pdb\n    msas\/\n        bfd_uniclust_hits.a3m\n        mgnify_hits.sto\n        pdb70_hits.hhr\n        uniref90_hits.sto\n<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Change the owner of the output folder from <code>root<\/code> so you can copy them:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">sudo chown ubuntu:ubuntu \/data\/output\/alphafold\/ -R<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Use <code>scp<\/code> to copy the output from the prediction output folder to your local folder:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">scp -i &lt;ec2-key-path&gt;.pem -r ubuntu@&lt;ec2-ip&gt;:\/data\/output\/alphafold\/T1050 ~\/Downloads\/<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Use this protein 3D <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/Structure\/icn3d\/full.html\" target=\"_blank\" rel=\"noopener noreferrer\">viewer<\/a> from NIH to view the predicted 3D structure from your result folder.<\/li>\n<li>Select <code>ranked_0.pdb<\/code>, which contains the prediction with the highest confidence.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-29748\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_openpdb-1.png\" alt=\"Open PDB file\" width=\"800\" height=\"272\"><br \/>The following is a 3D view of the predicted structure for <code>T1050<\/code> by AlphaFold.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29738 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_3Dview.png\" alt=\"T1050 3D view\" width=\"800\" height=\"403\"><\/li>\n<\/ol>\n<h2>Create a snapshot from the data volume<\/h2>\n<p>It takes time to install AlphaFold on an EC2 instance. However, the P3 instance and the EBS volume can become expensive if you keep them running all the time. You may want to have an EC2 instance ready quickly but also don\u2019t want to spend time rebuilding the environment every time you need it. An EBS snapshot helps you save both time and cost.<\/p>\n<ol>\n<li>On the Amazon EC2 console, choose <strong>Volumes<\/strong> in the navigation pane under <strong>Elastic Block Store<\/strong>.<\/li>\n<li>Filter by the EC2 instance ID.Two volumes should be listed.<\/li>\n<li>Select the data volume with 3072 GiB in size.<\/li>\n<li>On the <strong>Actions <\/strong>menu, choose <strong>Create snapshot.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29739 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_snapshot.png\" alt=\"Create snapshot\" width=\"800\" height=\"153\"><br \/><\/strong>The snapshot takes a few hours to finish.<\/li>\n<li>When the snapshot is complete, choose <strong>Snapshots<\/strong>, and your new snapshot should be in the list.<\/li>\n<\/ol>\n<p>You can safely shut down your EC2 instance now. At this point, all the data in the data volume is safely stored in the snapshot for future use.<\/p>\n<h2>Recreate the EC2 instance with a snapshot<\/h2>\n<p>To recreate a new EC2 instance with AlphaFold, the first steps are similar to what you did earlier when creating an EC2 instance from scratch. But instead of creating the data volume from scratch, you attach a new volume restored from the Amazon EBS snapshot.<\/p>\n<ol>\n<li>Open the Amazon EC2 console and in the AWS Region of your choice, and launch a new EC2 instance with DLAMI by searching <code>Deep Learning AMI<\/code>.<\/li>\n<li>Choose the DLAMI based on Ubuntu 18.04.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29721 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_ec2ami.png\" alt=\"EC2 AMI\" width=\"800\" height=\"328\"><\/li>\n<li>Select p3.2xlarge with one GPU as the instance type. If you don\u2019t have enough quota on a p3.2xlarge instance, you can <a href=\"https:\/\/aws.amazon.com\/premiumsupport\/knowledge-center\/ec2-instance-limit\/\" target=\"_blank\" rel=\"noopener noreferrer\">increase the Amazon EC2 quota<\/a> on your AWS account.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29724 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_p32xlarge.png\" alt=\"P3 2xlarge\" width=\"800\" height=\"183\"><\/li>\n<li>Configure the proper Amazon VPC setting based on your AWS environment requirements. If this is your first time configuring your Amazon VPC, consider using the default Amazon VPC and review <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/vpc-getting-started.html\" target=\"_blank\" rel=\"noopener noreferrer\">Get started with Amazon VPC<\/a>.<\/li>\n<li>Set the system volume to 200 GiB, but don\u2019t add a new data volume.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29740 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_ebsvolume2.png\" alt=\"EBS volume\" width=\"800\" height=\"152\"><\/li>\n<li>Make sure that the <a href=\"https:\/\/docs.aws.amazon.com\/AWSEC2\/latest\/UserGuide\/authorizing-access-to-an-instance.html\" target=\"_blank\" rel=\"noopener noreferrer\">security group settings<\/a> allow you to access the EC2 instance and the EC2 instance can reach the internet to install Python and Docker packages.<\/li>\n<li>Launch the EC2 instance.<\/li>\n<li>Take note of which Availability Zone the instance is in and the instance ID, because you use them in a later step.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29741 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_restoreec2.png\" alt=\"restored ec2 AZ\" width=\"800\" height=\"97\"><\/li>\n<li>On the Amazon EC2 console, choose <strong>Snapshots<\/strong>.<\/li>\n<li>Select the snapshot you created earlier or use the public snapshot provided.<\/li>\n<li>On the <strong>Actions<\/strong> menu, choose <strong>Create Volume<\/strong>.<br \/>For this post, we provide public snapshots in Regions <code>us-east-1<\/code>, <code>us-west-2<\/code>, and <code>eu-west-1<\/code>. You can search public snapshots by snapshot ID: <code>snap-0d736c6e22d0110d0<\/code> in <code>us-east-1<\/code>,<code>snap-080e5bbdfe190ee7e<\/code> in <code>us-west-2<\/code>,<code>snap-08d06a7c7c3295567<\/code> in <code>eu-west-1<\/code>.<\/li>\n<li>Set up the new data volume settings accordingly. Make sure that the Availability Zone is the same as the newly created EC2 instance. Otherwise, you can\u2019t mount the volume to the new EC2 instance.<\/li>\n<li>Choose <strong>Create Volume<\/strong> to create the new data volume.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29742 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_restoresnapshot.png\" alt=\"restore snapshot\" width=\"800\" height=\"311\"><\/li>\n<li>Choose <strong>Volumes<\/strong>, and you should see the newly created data volume. Its state should be <code>available<\/code>.<\/li>\n<li>Select the volume, and on the <strong>Actions <\/strong>menu, choose <strong>Attach volume<\/strong>.<\/li>\n<li>Choose the newly created EC2 instance and attach the volume.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29743 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_attachevolume.png\" alt=\"Attached volume\" width=\"800\" height=\"222\"><\/li>\n<li>Use SSH to access the Amazon EC2 terminal and run <code>lsblk<\/code>. You should see that the new data volume is unmounted. In this case, it is <code>\/dev\/xvdf<\/code>.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29744 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_lsblk2.png\" alt=\"lsblk\" width=\"800\" height=\"276\"><\/li>\n<li>Determine whether there is a file system on the volume. The data volume created from the snapshot has an XFS file system on it already.\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">sudo file -s \/dev\/xvdf<\/code><\/pre>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29745 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_sudo2.png\" alt=\"sudo file\" width=\"800\" height=\"47\"><\/p>\n<\/p><\/div>\n<\/li>\n<li>Mount the new data volume to the <code>\/data<\/code> folder:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">sudo mkdir \/data\nsudo mount \/dev\/xvdf \/data\nsudo chown ubuntu:ubuntu -R \/data<\/code><\/pre>\n<\/p><\/div>\n<\/li>\n<li>Update all packages on the system and install the dependencies. You do need to rebuild the AlphaFold Docker image.\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">sudo apt update\nsudo apt install aria2 rsync git vim wget tmux tree -y\npip3 install -r \/data\/alphafold\/docker\/requirements.txt\n\ncd \/data\/alphafold\ndocker build -f docker\/Dockerfile -t alphafold .\ndocker images<\/code><\/pre>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29730 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_dockerimage.png\" alt=\"alpha folder docker image\" width=\"800\" height=\"93\"><\/p>\n<\/p><\/div>\n<\/li>\n<li>Confirm that the Nvidia container kit is installed:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">sudo docker run --rm --gpus all nvidia\/cuda:11.0-base nvidia-smi<\/code><\/pre>\n<\/p><\/div>\n<p>You should see output like the following screenshot.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-29729 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/ML_5671_nvidia.png\" alt=\"NVidia docker\" width=\"800\" height=\"386\"><\/p>\n<\/li>\n<li>Use the following command to run a prediction of a protein sequence from <code>\/data\/input\/T1024.fasta<\/code>:\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">cd \/data\nnohup python3 \/data\/alphafold\/docker\/run_docker.py --fasta_paths=\/data\/input\/T1024.fasta --max_template_date=2020-05-14 &amp;<\/code><\/pre>\n<\/p><\/div>\n<p>I use a different protein sequence because the snapshot contains the result from <code>T1050<\/code> already. If you want to run the prediction for <code>T1050<\/code> again, first delete or rename the existing <code>T1050<\/code> result folder before running the new prediction.<\/p>\n<\/li>\n<li>Use the tail command to monitor the prediction progress:\n<p>The whole prediction takes a few hours to finish.<\/p>\n<\/li>\n<\/ol>\n<h2>Clean up<\/h2>\n<p>When you finish all your predictions and copy your results locally, clean up the AWS resources to save cost. You can safely shut down the EC2 instance and delete the EBS data volume if it didn\u2019t delete when the EC2 instance was shut down. When you need to use AlphaFold again, you can follow the same process to spin up a new EC2 instance and run new predictions in a matter of minutes. And you don\u2019t incur any additional cost other than the EBS snapshot storage cost.<\/p>\n<h2>Conclusion<\/h2>\n<p>With Amazon EC2 with Nvidia GPU and the Deep Learning AMI, you can install the new AlphaFold implementation from DeepMind and run predictions over CASP14 samples. Because you back up the data on the data volumes to point-in-time snapshots, you avoid paying for EC2 instances and EBS volumes when you don\u2019t need them. Creating an EBS volume based on the previous snapshot greatly shortens the time needed to recreate the EC2 instance with AlphaFold. Therefore, you can start running your predictions in a short amount of time.<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/qwanga-1.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-29772 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2021\/10\/25\/qwanga-1.jpg\" alt=\"\" width=\"100\" height=\"111\"><\/a>Qi Wang<\/strong> is a Sr. Solutions Architect on the Global Healthcare and Life Science team at AWS. He has over 10 years of experience working in the healthcare and life science vertical in business innovation and digital transformation. At AWS, he works closely with life science customers transforming drug discovery, clinical trials, and drug commercialization.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/run-alphafold-v2-0-on-amazon-ec2\/<\/p>\n","protected":false},"author":0,"featured_media":1156,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1155"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1155"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1155\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1156"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1155"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1155"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1155"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}