{"id":159,"date":"2020-09-01T09:09:02","date_gmt":"2020-09-01T09:09:02","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/09\/01\/getting-started-with-the-amazon-kendra-sharepoint-online-connector\/"},"modified":"2020-09-01T09:09:02","modified_gmt":"2020-09-01T09:09:02","slug":"getting-started-with-the-amazon-kendra-sharepoint-online-connector","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/09\/01\/getting-started-with-the-amazon-kendra-sharepoint-online-connector\/","title":{"rendered":"Getting started with the Amazon Kendra SharePoint Online connector"},"content":{"rendered":"<div id=\"\">\n<p><a href=\"https:\/\/aws.amazon.com\/kendra\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Kendra<\/a> is a highly accurate and easy-to-use enterprise search service powered by machine learning (ML). To get started with Amazon Kendra, we offer data source connectors to get your documents easily ingested and indexed.<\/p>\n<p>This post describes how to use Amazon Kendra\u2019s SharePoint Online connector. To allow the connector to access your SharePoint Online site, you only need to provide the index URL and the credentials of a user with owner rights. These access credentials will be securely stored in <a href=\"https:\/\/aws.amazon.com\/secrets-manager\/\">AWS Secrets Manager<\/a>.<\/p>\n<p>Currently, Amazon Kendra has two provisioning editions: the Amazon Kendra Developer Edition for building proof of concepts (POCs) and the Amazon Kendra Enterprise Edition. Amazon Kendra connectors work with both editions.<\/p>\n<h2>Prerequisites<\/h2>\n<p>To get started, you need the following:<\/p>\n<ul>\n<li>A SharePoint Online site<\/li>\n<li>A SharePoint Online user with <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/data-source-sharepoint.html\" target=\"_blank\" rel=\"noopener noreferrer\">owner rights<\/a>\n<\/li>\n<\/ul>\n<p>Owner rights are the minimum admin rights needed for the connector to access and ingest documents from your SharePoint site. This follows the AWS principle of granting <a href=\"https:\/\/docs.aws.amazon.com\/IAM\/latest\/UserGuide\/best-practices.html#grant-least-privilege\" target=\"_blank\" rel=\"noopener noreferrer\">least privilege<\/a> access.<\/p>\n<p>The metadata in your SharePoint Online documents must be specifically mapped to Amazon Kendra attributes. This mapping is done in the <strong>Attributes and field mappings <\/strong>section in this post. The SharePoint document title is mapped to the Amazon Kendra system attribute <code>_document_title<\/code>. If you skip the field mapping step, you need to create a new data connector to the SharePoint Online site.<\/p>\n<p>The <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) role for the SharePoint Online data source is not the same as the Amazon Kendra index IAM role. Please read the section <strong>Defining targets: Site URL and data source IAM role <\/strong>carefully. It\u2019s important to pay particular attention to the interplay between the SharePoint Online data source\u2019s IAM role and the secrets manager that contains your SharePoint Online credentials.<\/p>\n<p>For this post, we assume that you already have a SharePoint Online site deployed.<\/p>\n<h2>Setting up a SharePoint Online connector for Amazon Kendra from the console<\/h2>\n<p>The following section describes the process of deploying an Amazon Kendra index and configuring a SharePoint Online connector. If you already have an index, you can skip to the <strong>Configuring the SharePoint Online connector<\/strong> section.<\/p>\n<p>For this use case, our SharePoint Online site contains a collection of AWS whitepapers with custom columns, such as <code>Topics<\/code>. <img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15220 size-full\" title=\"AWS Whitepapers with custom columns\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/1-Topics-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"393\"><\/p>\n<h3>Creating an Amazon Kendra index<\/h3>\n<p>In an Amazon Kendra setup workflow, the first step is to create an index, where you define an <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/security_iam_service-with-iam.html\" target=\"_blank\" rel=\"noopener noreferrer\">IAM role<\/a> and the method you want Amazon Kendra to use for <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/encryption-at-rest.html\">data encryption<\/a>. For this use case, we create a new role.<\/p>\n<p>If you use an existing role, check that it has permission to write to an <a href=\"http:\/\/aws.amazon.com\/cloudwatch\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon CloudWatch<\/a> log. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/iam-roles.html#iam-roles-index\" target=\"_blank\" rel=\"noopener noreferrer\">IAM roles for indexes<\/a>. <img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15221 size-full\" title=\"IAM Index Details\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/2-Index-details-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"672\"><\/p>\n<p>Next, you select which provisioning edition to use. For this post, I select the <strong>Developer edition<\/strong>. If you\u2019re new to Amazon Kendra, we recommend creating an Amazon Kendra Developer Edition index because it\u2019s a more cost-efficient way to explore Amazon Kendra. For production environments, we highly recommended using the Enterprise Edition because it allows for more storage capacity and queries per day, and is designed for high availability.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15222 size-full\" title=\"Selecting provisioning edition\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/3-Provisioning-editions.jpg\" alt=\"\" width=\"900\" height=\"465\"><\/p>\n<h3>Configuring the SharePoint Online connector<\/h3>\n<p>After you create your index, you set up the data sources. One of the advantages of implementing Amazon Kendra is that you can use a set of prebuilt connectors for data sources such as <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3), <a href=\"http:\/\/aws.amazon.com\/rds\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Relational Database Service<\/a> (Amazon RDS), SharePoint Online, and Salesforce.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15223 size-full\" title=\"Pre-built connectors for data sources\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/4-Select-connector-type.jpg\" alt=\"\" width=\"900\" height=\"354\"><\/p>\n<p>For this use case, we choose <strong>SharePoint Online<\/strong>.<\/p>\n<h4>Assigning a name to the data source<\/h4>\n<p>In the <strong>Define attributes<\/strong> section, you enter a name for the data source, an optional description, and assign optional tags.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15224 size-full\" title=\"Defining attributes of your data source\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/5-Define-attributes.jpg\" alt=\"\" width=\"900\" height=\"342\"><\/p>\n<h4>Defining targets: Site URL and data source IAM role<\/h4>\n<p>In the <strong>Define targets <\/strong>section, you enter the targets where you need to define the SharePoint Online site URLs where the documents reside and the IAM role that the connecter uses to operate. It\u2019s important to remember that this IAM role is different from the one used to create the index. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/iam-roles.html#iam-roles-ds\" target=\"_blank\" rel=\"noopener noreferrer\">IAM roles for data sources<\/a>.<\/p>\n<p>If you don\u2019t have an IAM role for this task, you can easily create one by choosing <strong>Create New Role<\/strong>. For this use case, I use a previously created role.<\/p>\n<p>Under the URL text box, you can select <strong>Use change log<\/strong>, which enables the connector to use the SharePoint change log to determine the documents that need to be updated in the index. If your SharePoint change log is too large, your sync process may take longer.<\/p>\n<p>You can also select <strong>Crawl attachments<\/strong>, which allows the crawler to include the attachments associated with items stored in your site.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15225 size-full\" title=\"Selecting crawl attachments\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/6-Define-targets.jpg\" alt=\"\" width=\"900\" height=\"583\"><\/p>\n<p>You can also include or exclude documents by using regular expressions. You can define patterns that Amazon Kendra either uses to exclude certain documents from indexing or include only documents with that pattern. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/API_SharePointConfiguration.html\" target=\"_blank\" rel=\"noopener noreferrer\">SharePointConfiguration<\/a>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15226 size-full\" title=\"Using regex to include or exclude folders\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/7-Additional-configuration-optional.jpg\" alt=\"\" width=\"900\" height=\"367\"><\/p>\n<h4>Providing SharePoint Online credentials<\/h4>\n<p>In the <strong>Configure settings<\/strong> section, you set up your SharePoint Online user (if you don\u2019t have one created, you can create an additional user). The credentials you enter are stored in the Secrets Manager.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15227 size-full\" title=\"Entering credentials in Configure settings\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/8-Configure-settings.jpg\" alt=\"\" width=\"900\" height=\"654\"><\/p>\n<p>Save the authentication information and set up the sync run schedule, which determines how often Kendra checks your SharePoint Online site URLs for changes. For this use case, I choose to <strong>Run on demand<\/strong>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15228 size-full\" title=\"Selecting Run on demand\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/9-Set-sync-run-schedule.jpg\" alt=\"\" width=\"900\" height=\"254\"><\/p>\n<h4>Attributes and field mappings<\/h4>\n<p>In this next step, you can create field mappings. Even though this is an optional step, it\u2019s a good idea to add this extra layer of metadata to your documents from SharePoint Online. Metadata enables you to improve accuracy through <a href=\"https:\/\/docs.aws.amazon.com\/en_us\/kendra\/latest\/dg\/manual-tuning.html\" target=\"_blank\" rel=\"noopener noreferrer\">manual tuning<\/a>, <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/filtering.html\" target=\"_blank\" rel=\"noopener noreferrer\">filtering<\/a>, and faceting. You can\u2019t add metadata to already ingested documents, so if you want to add metadata later, you need to delete this data source and recreate this data source with metadata and re-ingest your documents.<\/p>\n<p>The default SharePoint Online metadata fields are <strong>Title<\/strong>, <strong>Created<\/strong>, and <strong>Modified<\/strong>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15229 size-full\" title=\"Default SharePoint Online metadata fields\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/10-Amazon-Kendra-default-field-mapping.jpg\" alt=\"\" width=\"900\" height=\"281\"><\/p>\n<p>One powerful feature is the ability to create custom field mappings. For example, on my SharePoint Online site, I created a column named <code>Category<\/code>. By importing this extra piece of information, we can create filters based on category names.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15261 size-full\" title=\"Creating custom field mappings\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/10A-Filters-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"183\"><\/p>\n<p>To import that extra information, you create a custom field mapping by choosing <strong>Add a new field mapping button<\/strong>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15230 size-full\" title=\"Importing additional information via Add a new field mapping button\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/11-Custom-field-mapping.jpg\" alt=\"\" width=\"900\" height=\"172\"><\/p>\n<p>If you\u2019re combining multiple data sources, you can map this new field to an existing field. For this use case, I have other documents that have the attribute <code>Category<\/code>, so I choose <strong>Option A<\/strong> to map fields to an existing document attributes field in my Amazon Kendra index. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/custom-attributes.html\" target=\"_blank\" rel=\"noopener noreferrer\">Creating custom document attributes<\/a>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15231 size-full\" title=\"Option A - Combining multiple data sources\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/12-Add-new-field-mapping.jpg\" alt=\"\" width=\"900\" height=\"448\"><\/p>\n<p>Also, on my SharePoint Site, I have an additional field called <code>Topic<\/code>. Because I don\u2019t have that field on my index yet, I select <strong>Option B<\/strong> and enter the data source field name and select the data type (for this use case, <strong>String<\/strong>).<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15232 size-full\" title=\"Option B - Enter data source field name\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/13-Option-B.jpg\" alt=\"\" width=\"900\" height=\"391\"><\/p>\n<p>Field names are case-sensitive, so we need to make sure we match them. Additionally, when a data field on SharePoint is renamed, only the display name changes. This means that if you want to import a data field, you need to refer to the original name. A way to find it is to sort by that column and check the name as listed on the address bar.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15265 size-full\" title=\"Sorting by column and checking the name\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/13A-Name-Category-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"41\"><\/p>\n<p>Let\u2019s check what field is used for sorting:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15294 size-full\" title=\"Checking the URL for which field is used for sorting\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/sortFieldTopic-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"129\"><\/p>\n<h4>Reviewing settings and creating a SharePoint Online data source<\/h4>\n<p>As a last step, you review the settings and create the data source. The <strong>Domain(s) and role<\/strong> section provides additional configuration information.<\/p>\n<p>After you create your SharePoint Online data source, a banner similar to the following screenshot will appear at the top of your screen. To start the syncing and document ingestion process, choose <strong>Sync now<\/strong>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15233 size-full\" title=\"Selecting Sync now\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/14-Success-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"39\"><\/p>\n<p>You see a banner indicating the progress of the data source sync job. After the sync job is finished, you can test your index.<\/p>\n<h3>Testing<\/h3>\n<p>You can test your new index on the Amazon Kendra search console. See the following screenshot.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15234 size-full\" title=\"Testing the index on Amazon Kendra's search console\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/15-AmazonS3-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"454\"><\/p>\n<p>Also, if you configured extra fields as facetable, you can filter your documents by those facets. See the following screenshot.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15236 size-full\" title=\"Filtering your documents \" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/16-What-are-AWS-ML-Services-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"297\"><\/p>\n<h2>Creating an Amazon Kendra index with a SharePoint Online connector with Python<\/h2>\n<p>In addition to the console, you can create a new Amazon Kendra index SharePoint online connector and sync it by using the <a href=\"https:\/\/aws.amazon.com\/sdk-for-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS SDK for Python (Boto3)<\/a>. Boto3 makes it easy to integrate your Python application, library, or script with AWS services, including Amazon Kendra.<\/p>\n<p>My personal preference for testing my Python scripts is to spin up an <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> notebook instance, a fully managed ML <a href=\"http:\/\/aws.amazon.com\/ec2\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Compute Cloud<\/a> (Amazon EC2) instance that runs the Jupyter Notebook app. For instructions, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/gs-setup-working-env.html\" target=\"_blank\" rel=\"noopener noreferrer\">Create an Amazon SageMaker Notebook Instance<\/a>.<\/p>\n<h3>IAM roles requirements and overview<\/h3>\n<p>To create an index using the AWS SDK, you need to have the policy <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/security_iam_id-based-policy-examples.html#security_iam_id-predefined-policies\" target=\"_blank\" rel=\"noopener noreferrer\">AmazonKendraFullAccess<\/a> attached to the role you are using.<\/p>\n<p>At a high level, these are the different roles Amazon Kendra requires:<\/p>\n<ul>\n<li>\n<strong>IAM roles for indexes<\/strong> \u2013 Needed to write to CloudWatch Logs.<\/li>\n<li>\n<strong>IAM roles for data sources<\/strong> \u2013 Needed when you use the <code>CreateDataSource<\/code> method. These roles require a specific set of permissions depending on the connector you use. For our use case, it needs permissions to access the following: <\/p>\n<ul>\n<li>Secrets Manager, where the SharePoint online credentials are stored.<\/li>\n<li>The <a href=\"http:\/\/aws.amazon.com\/kms\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Key Management Service<\/a> (AWS KMS) customer master key (CMK) to decrypt the credentials by Secrets Manager.<\/li>\n<li>The <code>BatchPutDocument<\/code> and <code>BatchDeleteDocument<\/code> operations to update the index.<\/li>\n<li>The Amazon S3 bucket that contains the SSL certificate used to communicate with the SharePoint Site (we use SSL for this use case).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/iam-roles.html\" target=\"_blank\" rel=\"noopener noreferrer\">IAM access roles for Amazon Kendra<\/a>.<\/p>\n<p>For this method, you need:<\/p>\n<ul>\n<li>An Amazon SageMaker notebooks role with permission to create an Amazon Kendra index where you\u2019re using the notebook<\/li>\n<li>An Amazon Kendra IAM role for CloudWatch<\/li>\n<li>An Amazon Kendra IAM role for the SharePoint Online connector<\/li>\n<li>A SharePoint Online credentials store on Secrets Manager<\/li>\n<\/ul>\n<h3>Creating an Amazon Kendra index<\/h3>\n<p>To create an index, you use the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">import boto3\r\nfrom botocore.exceptions import ClientError\r\nimport pprint\r\nimport time\r\n \r\nkendra = boto3.client(\"kendra\")\r\n \r\nprint(\"Creating an index\")\r\n \r\ndescription = &lt;YOUR INDEX DESCRIPTION&gt;\r\nindex_name = &lt;YOUR NEW INDEX NAME&gt;\r\nrole_arn = \"KENDRA ROLE WITH CLOUDWATCH PERMISSIONS ROLE\"\r\n \r\ntry:\r\n    index_response = kendra.create_index(\r\n        Description = description,\r\n        Name = index_name,\r\n        RoleArn = role_arn,\r\n        Edition = \"DEVELOPER_EDITION\",\r\n        Tags=[\r\n        {\r\n            'Key': 'Project',\r\n            'Value': 'SharePoint Test'\r\n        } \r\n        ]\r\n    )\r\n \r\n    pprint.pprint(index_response)\r\n \r\n    index_id = index_response['Id']\r\n \r\n    print(\"Wait for Kendra to create the index.\")\r\n \r\n    while True:\r\n        # Get index description\r\n        index_description = kendra.describe_index(\r\n            Id = index_id\r\n        )\r\n        # If status is not CREATING quit\r\n        status = index_description[\"Status\"]\r\n        print(\"    Creating index. Status: \"+status)\r\n        if status != \"CREATING\":\r\n            break\r\n        time.sleep(60)\r\n \r\nexcept  ClientError as e:\r\n        print(\"%s\" % e)\r\n \r\nprint(\"Done creating index.\")\r\n<\/code><\/pre>\n<\/div>\n<p>While your index is being created, you get regular updates (every 60 seconds; check line 38) until the process is complete. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">Creating an index\r\n{'Id': '3311b507-bfef-4e2b-bde9-7c297b1fd13b',\r\n 'ResponseMetadata': {'HTTPHeaders': {'content-length': '45',\r\n                                      'content-type': 'application\/x-amz-json-1.1',\r\n                                      'date': 'Mon, 20 Jul 2020 19:58:19 GMT',\r\n                                      'x-amzn-requestid': 'a148a4fc-7549-467e-b6ec-6f49512c1602'},\r\n                      'HTTPStatusCode': 200,\r\n                      'RequestId': 'a148a4fc-7549-467e-b6ec-6f49512c1602',\r\n                      'RetryAttempts': 2}}\r\nWait for Kendra to create the index.\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: CREATING\r\n    Creating index. Status: ACTIVE\r\nDone creating index\r\n<\/code><\/pre>\n<\/div>\n<p>When your index is ready it will provide an ID <code>3311b507-bfef-4e2b-bde9-7c297b1fd13b<\/code> on the response. Your index ID will be different than the <code>ID<\/code> in this post.<\/p>\n<h3>Adding attributes to the Amazon Kendra index<\/h3>\n<p>If you have metadata attributes associated with your SharePoint Online documents, you should do the following:<\/p>\n<ol>\n<li>Determine the Amazon Kendra attribute name you want for each of your SharePoint Online metadata attributes. By default, Amazon Kendra has six <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/field-mapping.html\" target=\"_blank\" rel=\"noopener noreferrer\">reserved fields<\/a> (<code>_category<\/code>, <code>created_at<\/code>, <code>_file_type<\/code>, <code>_last_updated_at<\/code>, <code>_source_uri<\/code>, and <code>_view_count<\/code>).<\/li>\n<li>Update the Amazon Kendra index with the Amazon Kendra attribute names.<\/li>\n<li>Map each SharePoint Online metadata attribute to each Amazon Kendra metadata attribute.<\/li>\n<\/ol>\n<p>If you have the metadata attribute <code>Topic<\/code> associated with your SharePoint Online document, and you want to use the same attribute name in the Amazon Kendra index, the following code adds the attribute <code>Topic<\/code> to your Amazon Kendra index:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">try:\r\n    update_response = kendra.update_index(\r\n        Id='3311b507-bfef-4e2b-bde9-7c297b1fd13b',\r\n        RoleArn='arn:aws:iam::&lt;YOUR ACCOUNT NUMBER&gt;-NUMBER:role\/service-role\/AmazonKendra-us-east-1-KendraRole',\r\n        DocumentMetadataConfigurationUpdates=[\r\n        {\r\n            'Name': 'Topic',\r\n            'Type': 'STRING_VALUE',\r\n            'Search': {\r\n                'Facetable': True,\r\n                'Searchable': True,\r\n                'Displayable': True\r\n            }\r\n        }   \r\n    ]\r\n    )\r\nexcept  ClientError as e:\r\n        print('%s' % e)   \r\npprint.pprint(update_response) \r\n<\/code><\/pre>\n<\/div>\n<p>If everything goes well, we receive a 200 response:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">{'ResponseMetadata': {'HTTPHeaders': {'content-length': '0',\r\n                                      'content-type': 'application\/x-amz-json-1.1',\r\n                                      'date': 'Mon, 20 Jul 2020 20:17:07 GMT',\r\n                                      'x-amzn-requestid': '3eba66c9-972b-4757-8d92-37be17c8f8a2},\r\n                      'HTTPStatusCode': 200,\r\n                      'RequestId': '3eba66c9-972b-4757-8d92-37be17c8f8a2',\r\n                      'RetryAttempts': 0}} \r\n}\r\n<\/code><\/pre>\n<\/div>\n<h3>Providing the SharePoint Online credentials<\/h3>\n<p>You also need to have <code>GetSecretValue<\/code> for your secret stored in Secrets Manager.<\/p>\n<p>If you need to create a new secret in Secrets Manager to store the SharePoint Online credentials, make sure the role you use has permissions to create a secret and tagging. See the following policy code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">{\r\n    \"Version\": \"2012-10-17\",\r\n    \"Statement\": [\r\n        {\r\n            \"Sid\": \"SecretsManagerWritePolicy\",\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n                \"secretsmanager:UntagResource\",\r\n                \"secretsmanager:CreateSecret\",\r\n                \"secretsmanager:TagResource\"\r\n            ],\r\n            \"Resource\": \"*\"\r\n        }\r\n    ]\r\n}\r\n<\/code><\/pre>\n<\/div>\n<p>To create a secret on Secrets Manager, enter the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">secretsmanager = boto3.client('secretsmanager')\r\n\r\nSecretName = &lt;YOUR SECRETNAME&gt;\r\nSharePointCredentials = \"{'username': &lt;YOUR SHAREPOINT SITE USERNAME&gt;, 'password': &lt;YOUR SHAREPOINT SITE PASSWORD&gt;}\"\r\n\r\ntry:\r\n  create_secret_response = secretsmanager.create_secret(\r\n  Name=SecretName,\r\n  Description='Secret for a Sharepoint data source connector',\r\n  SecretString=SharePointCredentials,\r\n  Tags=[\r\n   {\r\n    'Key': 'Project',\r\n    'Value': 'SharePoint Test'\r\n   }\r\n ]\r\n )\r\nexcept ClientError as e:\r\n  print('%s' % e)\r\n  pprint.pprint(create_secret_response)\r\n<\/code><\/pre>\n<\/div>\n<p>If everything went well, you get a response with your secret\u2019s ARN:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">{'ARN': &lt;YOUR SECRETS ARN&gt;,\r\n 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',\r\n                                      'content-length': '159',\r\n                                      'content-type': 'application\/x-amz-json-1.1',\r\n                                      'date': 'Wed, 22 Jul 2020 16:05:32 GMT',\r\n                                      'x-amzn-requestid': '3d0ac6ff-bd32-4d2e-8107-13e49f070de5'},\r\n                      'HTTPStatusCode': 200,\r\n                      'RequestId': '3d0ac6ff-bd32-4d2e-8107-13e49f070de5',\r\n                      'RetryAttempts': 0},\r\n 'VersionId': '7f7633ce-7f6c-4b10-b5b2-2943dd3fd6ee'}\r\n<\/code><\/pre>\n<\/div>\n<h3>Creating the SharePoint Online data source<\/h3>\n<p>Your Amazon Kendra index is up and running and you have established the attributes that you want to map to our SharePoint Online document\u2019s attributes.<\/p>\n<p>You now need an IAM role with <code>Kendra:BatchPutDocument<\/code> and <code>kendra:BatchDeleteDocument<\/code> permissions. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/iam-roles.html#iam-roles-ds-spo\" target=\"_blank\" rel=\"noopener noreferrer\">IAM roles for Microsoft SharePoint Online data sources<\/a>. We use the ARN for this IAM role when invoking the <a href=\"https:\/\/boto3.amazonaws.com\/v1\/documentation\/api\/latest\/reference\/services\/kendra.html#kendra.Client.create_data_source\" target=\"_blank\" rel=\"noopener noreferrer\">CreateDataSource<\/a> API.<\/p>\n<p>Make sure the role you use for your data source connector has a trust relationship with Amazon Kendra. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">{\r\n  \"Version\": \"2012-10-17\",\r\n  \"Statement\": [\r\n    {\r\n      \"Effect\": \"Allow\",\r\n      \"Principal\": {\r\n        \"Service\": \"kendra.amazonaws.com\"\r\n      },\r\n      \"Action\": \"sts:AssumeRole\"\r\n    }\r\n  ]\r\n<\/code><\/pre>\n<\/div>\n<p>The following code is the policy structure used:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">{\r\n    \"Version\": \"2012-10-17\",\r\n    \"Statement\": [\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n                \"secretsmanager:GetSecretValue\"\r\n            ],\r\n            \"Resource\": [\r\n                \"arn:aws:secretsmanager:region:account ID:secret:secret ID\"\r\n            ]\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n                \"kms:Decrypt\"\r\n            ],\r\n            \"Resource\": [\r\n                \"arn:aws:kms:region:account ID:key\/key ID\"\r\n            ]\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n                \"kendra:BatchPutDocument\",\r\n                \"kendra:BatchDeleteDocument\"\r\n            ],\r\n            \"Resource\": [\r\n                \"arn:aws:kendra:region:account ID:index\/index ID\"\r\n            ],\r\n            \"Condition\": {\r\n                \"StringLike\": {\r\n                    \"kms:ViaService\": [\r\n                        \"kendra.amazonaws.com\"\r\n                    ]\r\n                }\r\n            }\r\n        },\r\n        {\r\n            \"Effect\": \"Allow\",\r\n            \"Action\": [\r\n                \"s3:GetObject\"\r\n            ],\r\n            \"Resource\": [\r\n                \"arn:aws:s3:::bucket name\/*\"\r\n            ]\r\n        }\r\n    ]\r\n}\r\n<\/code><\/pre>\n<\/div>\n<p>The following code is my role\u2019s ARN:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">arn:aws:iam::&lt;YOUR ACCOUNT NUMBER&gt;:role\/Kendra-Datasource<\/code><\/pre>\n<\/div>\n<p>Following the least privilege principle, we only allow our role to put and delete documents in our index and read the secrets to connect to our SharePoint Online site.<\/p>\n<p>When creating a data source, you can specify the sync schedule, which indicates how often your index syncs with the data source we create. This schedule is defined on the <code>Schedule<\/code> key of our request. You can use <a href=\"https:\/\/docs.aws.amazon.com\/AmazonCloudWatch\/latest\/events\/ScheduledEvents.html\" target=\"_blank\" rel=\"noopener noreferrer\">schedule expressions for rules<\/a> to define how often you want to sync your data source. For this use case, the <code>ScheduleExpression<\/code> is <code>'cron(0 11 * * ? *)'<\/code>, which sets the data source to sync every day at 11:00 AM.<\/p>\n<p>I use the following code. Make sure you match your <code>SiteURL<\/code> and <code>SecretARN<\/code>, as well as your <code>IndexID<\/code>. Additionally, <code>FieldMappings<\/code> is where you map between the SharePoint Online attribute name and the Amazon Kendra index attribute name. I use the same attribute name in both, but you can name the Amazon Kendra attribute whatever you\u2019d like.<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">print('Create a data source')\r\n \r\nSecretArn= &lt;YOUR SHAREPOINT ONLINE USER AND PASSWORD SECRETS ARN&gt;\r\nSiteUrl = &lt;YOUR SHAREPOINT SITE URL&gt;\r\nDSName= &lt;YOUR NEW DATA SOURCE NAME&gt;\r\nIndexId= &lt;YOUR INDEX ID&gt;\r\nDSRoleArn= &lt;YOUR DATA SOURCE ROLE&gt;\r\nScheduleExpression='cron(0 11 * * ? *)'\r\n\r\ntry:\r\n    datasource_response = kendra.create_data_source(\r\n    Name=DSName,\r\n    IndexId=IndexId,        \r\n    Type='SHAREPOINT',\r\n    Configuration={\r\n        'SharePointConfiguration': {\r\n            'SharePointVersion': 'SHAREPOINT_ONLINE',\r\n            'Urls': [\r\n                SiteUrl\r\n            ],\r\n            'SecretArn': SecretArn,\r\n            'CrawlAttachments': True,\r\n            'UseChangeLog': True,\r\n            'FieldMappings': [\r\n                {\r\n                    'DataSourceFieldName': 'Topic',\r\n                    'IndexFieldName': 'Topic'\r\n                },\r\n            ],\r\n            'DocumentTitleFieldName': 'Title'\r\n        },\r\n               },\r\n    Description='My SharePointOnline Datasource',\r\n    RoleArn=DSRoleArn,\r\n    Schedule=ScheduleExpression,\r\n    Tags=[\r\n        {\r\n            'Key': 'Project',\r\n            'Value': 'SharePoint Test'\r\n        }\r\n    ]\r\n    )\r\n    pprint.pprint(datasource_response)\r\n    print('Waiting for Kendra to create the DataSource.')\r\n    datasource_id = datasource_response['Id']\r\n    while True:\r\n        # Get index description\r\n        datasource_description = kendra.describe_data_source(\r\n            Id=datasource_id,\r\n            IndexId=IndexId\r\n        )\r\n        # If status is not CREATING quit\r\n        status = datasource_description[\"Status\"]\r\n        print(\"    Creating index. Status: \"+status)\r\n        if status != \"CREATING\":\r\n            break\r\n        time.sleep(60)    \r\n\r\nexcept  ClientError as e:\r\n        print('%s' % e)     \r\n<\/code><\/pre>\n<\/div>\n<p>At this point, you should receive a 200 response:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">Create a data source\r\n{'Id': '527ac6f7-5f3c-46ec-b2cd-43980c714bf7',\r\n 'ResponseMetadata': {'HTTPHeaders': {'content-length': '45',\r\n                                      'content-type': 'application\/x-amz-json-1.1',\r\n                                      'date': 'Mon, 20 Jul 2020 15:26:13 GMT',\r\n                                      'x-amzn-requestid': '30480044-0a86-446c-aadc-f64acb4b3a86'},\r\n                      'HTTPStatusCode': 200,\r\n                      'RequestId': '30480044-0a86-446c-aadc-f64acb4b3a86',\r\n                      'RetryAttempts': 0}}\r\n<\/code><\/pre>\n<\/div>\n<h3>Syncing the data source<\/h3>\n<p>Even though you defined a schedule for syncing the data source, you can sync on demand by using <a href=\"https:\/\/boto3.amazonaws.com\/v1\/documentation\/api\/latest\/reference\/services\/kendra.html#kendra.Client.start_data_source_sync_job\" target=\"_blank\" rel=\"noopener noreferrer\">start_data_source_sync_job<\/a>:<\/p>\n<pre><code class=\"lang-python\">DSId=&lt;YOUR DATA SOURCE ID&gt;\r\nIndexId=&lt;YOUR INDEX ID&gt;\r\n \r\ntry:\r\n    ds_sync_response = kendra.start_data_source_sync_job(\r\n    Id=DSId,\r\n    IndexId=IndexId\r\n)\r\nexcept  ClientError as e:\r\n        print('%s' % e)  \r\n        \r\npprint.pprint(ds_sync_response)\r\n<\/code><\/pre>\n<p>The response should look like the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">{'ExecutionId': '6574acd6-e66f-4797-85cf-278dce9256b4',\r\n 'ResponseMetadata': {'HTTPHeaders': {'content-length': '54',\r\n                                      'content-type': 'application\/x-amz-json-1.1',\r\n                                      'date': 'Mon, 20 Jul 2020 15:54:24 GMT',\r\n                                      'x-amzn-requestid': '415547b2-d095-4501-b6ad-eba4b731d109'},\r\n                      'HTTPStatusCode': 200,\r\n                      'RequestId': '415547b2-d095-4501-b6ad-eba4b731d109',\r\n                      'RetryAttempts': 0}}\r\n<\/code><\/pre>\n<\/div>\n<h3>Testing<\/h3>\n<p>Finally, you can query your index. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">response = kendra.query(\r\nIndexId='3311b507-bfef-4e2b-bde9-7c297b1fd13b',\r\nQueryText='Is there a service that has 11 9s of durability?')\r\nif response['TotalNumberOfResults'] &gt; 0:\r\n    print(response['ResultItems'][0]['DocumentExcerpt']['Text'])\r\n    print(\"More information: \"+response['ResultItems'][0]['DocumentURI'])\r\nelse:\r\n    print('No results found, please try a different search term.')\r\n<\/code><\/pre>\n<\/div>\n<p>You will get a result like the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">Amazon S3 has a data durability of 11 nines. \r\nFor transactional data storage, customers have the option to take advantage of the fully \r\nmanaged Amazon Relational Database Service (Amazon RDS) that supports Amazon \r\nAurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server with high \r\nMore information: https:\/\/juansdomain.sharepoint.com\/sites\/AWSWhitePapers\/Shared%20Documents\/real-time_communication_aws.pdf<\/code><\/pre>\n<\/div>\n<h2>Common errors<\/h2>\n<p>Each of the errors noted in this section can occur if you\u2019re using the Amazon Kendra console or the Amazon Kendra API.<\/p>\n<p>You should look at the CloudWatch logs and error messages returned on the Amazon Kendra console or via the Amazon Kendra API. The CloudWatch logs help you determine the reason for a particular error, whether you are experiencing it using the console or programmatically.<\/p>\n<p>Common errors when trying to access SharePoint Online as a data source are:<\/p>\n<ul>\n<li>Secrets Manager errors<\/li>\n<li>SharePoint credential errors<\/li>\n<li>IAM role errors<\/li>\n<li>URL errors<\/li>\n<\/ul>\n<p>In the following sections, we provide more details on how to address each error.<\/p>\n<h3>Secrets Manager errors<\/h3>\n<p>You might get an error message from the Secrets Manager stating that your role doesn\u2019t have permissions to retrieve the secrets value. This can occur when you create a new secret manager and you don\u2019t add read permissions to the <a href=\"https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/iam-roles.html#iam-roles-ds-spo\" target=\"_blank\" rel=\"noopener noreferrer\">data source role<\/a>.<\/p>\n<p>Here\u2019s an example of the error message:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">Create a DataSource\r\n('An error occurred (ValidationException) when calling the CreateDataSource '\r\n 'operation: Secrets Manager throws the exception: User: '\r\n 'arn:aws:sts::&lt;YOUR ACCOUNT NUMBER&gt;:assumed-role\/Kendra-Datasource\/DataSourceConfigurationValidator '\r\n 'is not authorized to perform: secretsmanager:GetSecretValue on resource: '\r\n &lt;YOUR SECRET ARN&gt; '(Service: AWSSecretsManager; Status Code: 400; Error Code: '\r\n 'AccessDeniedException; Request ID: 886ff6ac-f8f3-46b0-94dc-8286fd1682c1; '\r\n 'Proxy: null)')<\/code><\/pre>\n<\/div>\n<p>To address this, you need to make sure that our role has a policy attached to with GetSecretValue permissions on the secret.<\/p>\n<p>If you\u2019re troubleshooting on the console, complete the following steps:<\/p>\n<ol>\n<li>On the Secrets Manager console, copy the secret <a href=\"https:\/\/docs.aws.amazon.com\/general\/latest\/gr\/aws-arns-and-namespaces.html\" target=\"_blank\" rel=\"noopener noreferrer\">ARN<\/a>.<\/li>\n<\/ol>\n<p>The secret ARN is listed in the <strong>Secret details<\/strong> section. See the following screenshot.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15237 size-full\" title=\"Secret ARN list in the Secret details section\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/17-Sharepoint-Documents-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"359\"><\/p>\n<ol start=\"2\">\n<li>On the IAM console, choose <strong>Roles<\/strong>.<\/li>\n<li>Search for the role associated with Amazon Kendra.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15238 size-full\" title=\"Searching for the role associated with Kendra\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/18-Kendra-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"178\"><\/p>\n<ol start=\"4\">\n<li>Choose the role that you assigned to the data source.<\/li>\n<li>Choose <strong>Add inline policy<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15235 size-full\" title=\"Adding inline policy\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/16-Add-inline-policy.jpg\" alt=\"\" width=\"900\" height=\"152\"><\/p>\n<ol start=\"6\">\n<li>For <strong>Select Service<\/strong>, choose <strong>Secrets Manager<\/strong>.<\/li>\n<li>On the visual editor, on the <strong>Access Level<\/strong>, choose <strong>Read<\/strong>.<\/li>\n<li>Choose <strong>GetSecretValue<\/strong>.<\/li>\n<li>Under <strong>Resources<\/strong>, select <strong>Specific<\/strong>.<\/li>\n<li>Choose <strong>Add ARN<\/strong>.<\/li>\n<li>For <strong>Specify ARN for secret<\/strong>, enter the secret ARN you copied.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15239 size-full\" title=\"Entering the secret ARN previously copied\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/19-Add-ARNs-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"663\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15240 size-full\" title=\"Secrets manager\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/20-Secrets-Manager.jpg\" alt=\"\" width=\"900\" height=\"278\"><\/p>\n<ol start=\"12\">\n<li>Review and choose <strong>Create Policy<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15241 size-full\" title=\"Completing the setup process in Kendra's data source setup\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/21-Review-policy-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"271\"><\/p>\n<p>You can now go back to your Amazon Kendra data source setup and finish the process.<\/p>\n<h3>SharePoint credential errors<\/h3>\n<p>Another common issue can be caused by a failure to crawl the site. On the sync details, the error message may say something about invalid URLs. To dive deeper into the issue, select the error message.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15242 size-full\" title=\"Selecting the error message\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/22-SharePoint-credential-errors-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"73\"><\/p>\n<p>This takes you to the CloudWatch console, where you can enter a query on the latest logs and choose <strong>Run Query<\/strong>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15243 size-full\" title=\"Running a query on the latest logs\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/23-Run-Query-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"130\"><\/p>\n<p>The results appear on the <strong>Logs<\/strong> tab.<\/p>\n<p>You can see three records matching the <code>logStream<\/code> generated by the data source sync job.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15244 size-full\" title=\"Query records\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/24-LogStream.jpg\" alt=\"\" width=\"900\" height=\"88\"><\/p>\n<p>For the first document, the error message is \u201cThe URLs specified in the data source configuration aren\u2019t valid. The URLs should be either a SharePoint site or list. Check the URLs and try the request again.\u201d<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15245 size-full\" title=\"Checking the URLs\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/25-Check-the-URLs-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"139\"><\/p>\n<p>However, it\u2019s interesting to notice that this is the last generated message. let\u2019s see what Document #2 shows us:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15246 size-full\" title=\"Reviewing Document #2\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/26-Document2-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"178\"><\/p>\n<p>You may receive an invalid URL for the data source configuration that is triggered because of an underlying authentication problem.<\/p>\n<p>The easiest way to address this issue is to generate new credentials for the Amazon Kendra crawler.<\/p>\n<ol>\n<li>To set up a user for the crawler to run, log in to your SharePoint Online configuration and open the Microsoft 365 Admin page.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15247 size-full\" title=\"Microsoft 365 Admin page\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/27-Microsoft365-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"956\"><\/p>\n<ol start=\"2\">\n<li>In the <strong>User management <\/strong>section, choose <strong>Add user<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15248 size-full\" title=\"Adding user in User management page\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/28-User-Management.jpg\" alt=\"\" width=\"900\" height=\"788\"><\/p>\n<ol start=\"3\">\n<li>Fill in the form with the details for the crawler.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15249 size-full\" title=\"Filling in details for the crawler\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/29-Set-up-the-Basics-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"990\"><\/p>\n<p>For this use case, you don\u2019t need to assign a license for this user.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15250 size-full\" title=\"Assigning product licenses\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/30-Assign-product-licenses-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"766\"><\/p>\n<ol start=\"4\">\n<li>Set it up as a user without admin center access.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15251 size-full\" title=\"Setting up a user without admin center access\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/31-Optional-Settings.jpg\" alt=\"\" width=\"900\" height=\"609\"><\/p>\n<ol start=\"5\">\n<li>After you create the user, record the generated password because you need to modify it later.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15252 size-full\" title=\"Recording the generated password for future reference\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/32-Kendra-Crawler-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"517\"><\/p>\n<ol start=\"6\">\n<li>We can now go back to our site and choose the members icon on the top right of the screen.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15275 size-full\" title=\"Members icon\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/33-Not-following.jpg\" alt=\"\" width=\"167\" height=\"80\"><\/p>\n<ol start=\"7\">\n<li>To add a member, choose <strong>Add members<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15276 size-full\" title=\"Adding members\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/34-Group-membership.jpg\" alt=\"\" width=\"236\" height=\"113\"><\/p>\n<ol start=\"8\">\n<li>Add the new user you just created and choose <strong>Save<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15277 size-full\" title=\"Adding the new user\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/35-Add-membership-1.jpg\" alt=\"\" width=\"328\" height=\"225\"><\/p>\n<ol start=\"9\">\n<li>From the drop-down menu under the new user\u2019s name, choose <strong>Owner<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15278 size-full\" title=\"Selecting owner\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/36-Kendra-Crawler-Member-1.jpg\" alt=\"\" width=\"284\" height=\"174\"><\/p>\n<h3>IAM role issues<\/h3>\n<p>Another common issue is caused by lack of permissions for the IAM role used to crawl your data source.<\/p>\n<p>You can identify this issue on the CloudWatch logs. See the following code:<\/p>\n<div class=\"hide-language\">\n<pre class=\"unlimited-height-code\"><code class=\"lang-python\">{\r\n    \"CrawlStatus\": \"ERROR\",\r\n    \"ErrorCode\": \"InvalidRequest\",\r\n    \"ErrorMessage\": \"Amazon Kendra can't run the BatchDeleteDocument action with the \r\n                     specified role. Make sure that the role grants \r\n                     the kendra:BatchDeleteDocument permission.\"\r\n}\r\n<\/code><\/pre>\n<\/div>\n<p>The permissions needed for this task are <code>BatchPutDocument<\/code> and <code>BatchDeleteDocument<\/code>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15257 size-full\" title=\"Checking permissions \" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/37-Specify-the-actions.jpg\" alt=\"\" width=\"900\" height=\"432\"><\/p>\n<p>Make sure that the resource matches your index ID (you can find your index ID on the index details page on the console).<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15258 size-full\" title=\"Checking that resources match your index ID\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/38-Add-ARNs-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"667\"><\/p>\n<h3>Wrong SharePoint site URL<\/h3>\n<p>You may experience an error stating you need to provide a sharepoint.com URL. Make sure your site URL is under sharepoint.com.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15259 size-full\" title=\"Ensuring that the site is under sharepoint.com\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/39-Invalid-SharePoint.jpg\" alt=\"\" width=\"900\" height=\"118\"><\/p>\n<h3>Conclusion<\/h3>\n<p>You have now learned how to ingest the documents from your SharePoint Online site into your Amazon Kendra Index, either through the console or programmatically. In this example case, you have loaded some AWS Whitepapers into your index. You are now able to run some queries such as \u201cWhat AWS service has 11 nines of durability?<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15260 size-full\" title=\"Testing the Kendra setup\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/40-AmazonS3-Screenshot.jpg\" alt=\"\" width=\"900\" height=\"435\"><\/p>\n<p>Finally, don\u2019t forget to check the <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/category\/artificial-intelligence\/amazon-kendra\/\" target=\"_blank\" rel=\"noopener noreferrer\">other blog posts about Amazon Kendra<\/a>!<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-15308 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/Juan-Pablo-Bustos.jpg\" alt=\"\" width=\"101\" height=\"149\">Juan Pablo Bustos is an AI Services Specialist Solutions Architect at Amazon Web Services, based in Dallas, TX. Outside of work, he loves spending time writing and playing music as well as trying random restaurants with his family.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-15317 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/David-Shute.jpg\" alt=\"\" width=\"101\" height=\"136\"><\/p>\n<p>David Shute is a Senior ML GTM Specialist at Amazon Web Services focused on Amazon Kendra. When not working, he enjoys hiking and walking on a beach.<\/p>\n<p>\u00a0<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/getting-started-with-the-amazon-kendra-sharepoint-online-connector\/<\/p>\n","protected":false},"author":0,"featured_media":160,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/159"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=159"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/159\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/160"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=159"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=159"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=159"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}