{"id":169,"date":"2020-09-02T01:40:07","date_gmt":"2020-09-02T01:40:07","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/09\/02\/using-amazon-textract-with-aws-privatelink\/"},"modified":"2020-09-02T01:40:07","modified_gmt":"2020-09-02T01:40:07","slug":"using-amazon-textract-with-aws-privatelink","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/09\/02\/using-amazon-textract-with-aws-privatelink\/","title":{"rendered":"Using Amazon Textract with AWS PrivateLink"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"http:\/\/www.aws.amazon.com\/textract\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Textract <\/a>now supports <a href=\"http:\/\/aws.amazon.com\/vpc\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Virtual Private Cloud<\/a> (Amazon VPC) endpoints via <a href=\"https:\/\/aws.amazon.com\/privatelink\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS PrivateLink<\/a> so you can securely initiate API calls to Amazon Textract from within your VPC and avoid using the public internet.<\/p>\n<p>In this post, we show you how to access Amazon Textract APIs from within your VPC without traversing the public internet, and how to use <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/vpc-endpoints-access.html#vpc-endpoint-policies\" target=\"_blank\" rel=\"noopener noreferrer\">VPC endpoint policies<\/a> to restrict access to Amazon Textract.<\/p>\n<p>Amazon Textract is a fully managed machine learning (ML) service that automatically extracts text and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables.<\/p>\n<p>You can use AWS PrivateLink to access Amazon Textract securely by keeping your network traffic within the AWS network, while simplifying your internal network architecture. It enables you to privately access Amazon Textract APIs from your VPC in a scalable manner by using interface VPC endpoints. A VPC endpoint is an elastic network interface in your subnet with a private IP address that serves as the entry point for all Amazon Textract API calls. A VPC endpoint enables you to privately connect your VPC to supported AWS services and VPC endpoint services powered by AWS PrivateLink without requiring an internet gateway, NAT device, VPN connection, or <a href=\"http:\/\/aws.amazon.com\/directconnect\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Direct Connect<\/a> connection. Instances in your VPC don\u2019t require public IP addresses to communicate with resources in the service. Traffic between your VPC and the other service doesn\u2019t leave the AWS network.<\/p>\n<p>The following diagram illustrates the solution architecture.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-15384\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/31\/textract-privatelink.jpg\" alt=\"\" width=\"1000\" height=\"485\"><\/p>\n<h2>Prerequisites<\/h2>\n<p>To get started, you need to have a VPC set up in the AWS Region of your choice. For instructions, see <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/vpc-getting-started.html\" target=\"_blank\" rel=\"noopener noreferrer\">Getting started with Amazon VPC<\/a>. In this post, we use the us-east-2 Region. You should also have an <a href=\"https:\/\/signin.aws.amazon.com\/signin?redirect_uri=https%3A%2F%2Fportal.aws.amazon.com%2Fbilling%2Fsignup%2Fresume&amp;client_id=signup\" target=\"_blank\" rel=\"noopener noreferrer\">AWS account<\/a> with sufficient access to create resources in the following services:<\/p>\n<ul>\n<li>Amazon Textract<\/li>\n<li>AWS PrivateLink<\/li>\n<\/ul>\n<h2>Solution overview<\/h2>\n<p>The walkthrough includes the following high-level steps:<\/p>\n<ol>\n<li>Create VPC endpoints.<\/li>\n<li>Use Amazon Textract via AWS PrivateLink.<\/li>\n<\/ol>\n<h2>Creating VPC endpoints<\/h2>\n<p>To create a VPC endpoint, complete the following steps. We use the <code>us-east-2<\/code> Region in this post, so the console and URLs may differ depending on the Region you choose.<\/p>\n<ol>\n<li>On the Amazon VPC console, choose <strong>Endpoints<\/strong>.<\/li>\n<li>Choose <strong>Create Endpoint<\/strong>.<\/li>\n<li>For <strong>Service category<\/strong>, select <strong>AWS services<\/strong>.<\/li>\n<li>For <strong>Service Name<\/strong>, choose <strong>amazonaws.us-east-2-textract<\/strong> or <strong>com.amazonaws.us-east-2.textract-fips<\/strong>.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15506 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/01\/privatelink-2-2.jpg\" alt=\"\" width=\"949\" height=\"493\">\n<\/li>\n<li>For <strong>VPC<\/strong>, enter the VPC you want to use.<\/li>\n<li>For <strong>Availability Zone<\/strong>, select your preferred Availability Zones.<\/li>\n<li>For <strong>Enable DNS name<\/strong>, select <strong>Enable for this endpoint<\/strong>.<\/li>\n<\/ol>\n<p>This creates a <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/vpc-dns.html#vpc-private-hosted-zones\" target=\"_blank\" rel=\"noopener noreferrer\">private hosted zone<\/a> that enables you to access the resources in your VPC using custom DNS domain names, such as <code>example.com<\/code>, instead of using private IPv4 addresses or private DNS hostnames provided by AWS. The Amazon Textract DNS hostname that the <a href=\"http:\/\/aws.amazon.com\/cli\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Command Line Interface<\/a> (AWS CLI) and Amazon Textract SDKs use by default (https:\/\/textract.<span><em>Region<\/em><\/span>.amazonaws.com) resolves to your VPC endpoint.<\/p>\n<ol start=\"8\">\n<li>For <strong>Security group<\/strong>, choose the security group to associate with the endpoint network interface.<\/li>\n<\/ol>\n<p>If you don\u2019t specify a security group, the default security group for your VPC is associated.<\/p>\n<ol start=\"9\">\n<li>Choose <strong>Create Endpoint<\/strong>.<\/li>\n<\/ol>\n<p>When the <strong>Status<\/strong> changes to <code>available<\/code>, your VPC endpoint is ready for use.<\/p>\n<ol start=\"10\">\n<li>Choose the <strong>Policy<\/strong> tab to apply more restrictive access control to the VPC endpoint.<\/li>\n<\/ol>\n<p>The following example policy limits VPC endpoint access to only the <code>DetectDocumentText<\/code> API. An IAM principal, even with access to all Textract APIs, can still only access the specific API in the following policy using this VPC endpoint. This is an additional layer of access control applied at the VPC endpoint. You should apply the principle of least privilege when defining your own policy. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/userguide\/vpc-endpoints-access.html\" target=\"_blank\" rel=\"noopener noreferrer\">Controlling access to services with VPC endpoints<\/a>.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-json\">{\r\n    \"Version\": \"2012-10-17\",\r\n    \"Statement\": [\r\n        {\r\n            \"Action\": [\r\n                \"textract:DetectDocumentText\"\r\n            ],\r\n            \"Resource\": [\r\n                \"*\"\r\n            ],\r\n            \"Effect\": \"Allow\",\r\n            \"Principal\": \"*\"\r\n        }\r\n    ]\r\n}\r\n<\/code><\/pre>\n<\/div>\n<p>Now that you have set up your VPC endpoint, the following section shows you how to access Amazon Textract APIs from within that VPC using AWS PrivateLink.<\/p>\n<h2>Accessing Amazon Textract APIs via AWS PrivateLink<\/h2>\n<p>After you set up the relevant VPC endpoint policies, you have two options to configure endpoints in order to access Amazon Textract APIs:<\/p>\n<p>The following code is an example AWS CLI command to run from within the VPC:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">$ aws textract detect-document-text --document '{\"S3Object\":{\"Bucket\":\"textract-test-bucket\",\"Name\":\"example-doc.jpg\"}}' --region us-east-2<\/code><\/pre>\n<\/div>\n<ul>\n<li>You can also use the DNS name that was generated when creating the VPC endpoint. These DNS names are in the form of *.us-east-2.vpce.amazonaws.com or *.textract-fips.us-east-2.vpce.amazonaws.com. For example: <code>vpce-0f1aa01f0ce676709-il663k5n.textract.us-east-2.vpce.amazonaws.com<\/code>.<\/li>\n<\/ul>\n<p>The following code is an example AWS CLI command to run from within the VPC:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-bash\">aws textract detect-document-text --document '{\"S3Object\":{\"Bucket\":\"textract-test-bucket\",\"Name\":\"example-doc.jpg\"}}' --region us-east-2 --endpoint https:\/\/vpce-05e9d346575f9cb38-1wdh6mi2.textract.us-east-2.vpce.amazonaws.com<\/code><\/pre>\n<\/div>\n<h2>Conclusion<\/h2>\n<p>You now have successfully configured a VPC endpoint for Amazon Textract in your AWS account. Traffic to Amazon Textract APIs from that VPC endpoint are only within the AWS network. The VPC endpoint policy you configured further allows you to restrict which Amazon Textract APIs are accessible from within that VPC.<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignleft size-full wp-image-14737\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/11\/raj-copparapu-100.jpg\" alt=\"\" width=\"100\" height=\"135\">Raj Copparapu is a Product Manager focused on putting machine learning in the hands of every developer.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-15391 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/31\/thomas-loockx-100.jpg\" alt=\"\" width=\"100\" height=\"138\">Thomas joined Amazon Web Services in 2016 initially working on Application Auto Scaling before moving into this current role at Textract. Before joining AWS, he worked in engineering roles in the domains of computer graphics and networking. Thomas holds a master\u2019s degree in engineering from the university of Leuven in Belgium.<\/p>\n<p>\u00a0<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/using-amazon-textract-with-aws-privatelink\/<\/p>\n","protected":false},"author":0,"featured_media":170,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/169"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=169"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/169\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/170"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=169"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=169"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=169"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}