Getting started with the Amazon Kendra Box connector
Amazon Kendra is a highly accurate and easy-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.
For many organizations, Box Content Cloud is a core part of their content storage and lifecycle management strategy. An enterprise Box account often contains a treasure trove of assets, such as documents, presentations, knowledge articles, and more. Now, with the new Amazon Kendra data source connector for Box, these assets and any associated tasks or comments can be indexed by Amazon Kendra’s intelligent search service to reveal content and unlock answers in response to users’ queries.
In this post, we show you how to set up the new Amazon Kendra Box connector to selectively index content from your Box Enterprise repository.
Solution overview
The solution consists of the following high-level steps:
- Create a Box app for Amazon Kendra via the Box Developer Console.
- Add sample documents to your Box account.
- Create a Box data source via the Amazon Kendra console.
- Index the sample documents from the Box account.
Prerequisites
To try out the Amazon Kendra connector for Box, you need the following:
- An AWS account with privileges to create AWS Identity and Access Management (IAM) roles and policies. For more information, see Overview of access management: Permissions and policies.
- Basic knowledge of AWS and working knowledge of Box Enterprise administration.
- Admin access to a Box Enterprise workspace.
Create a Box app for Amazon Kendra
Before you configure an Amazon Kendra Box data source connector, you must first create a Box app.
- Log in to the Box Enterprise Developer Console.
- Choose Create New App.
- Choose Custom App.
- Choose Server Authentication (with JWT).
- Enter a name for your app. For example,
KendraConnector
. - Choose Create App.
- In your created app in My Apps, choose the Configuration tab.
- In the App Access Level section, choose App + Enterprise Access.
- In the Application Scopes section, check that the following permissions are enabled:
- In the Advanced Features section, select Make API calls using the as-user header.
- In the Add and Manage Public Keys section, choose Generate a Public/Private Keypair.
This requires two-step verification. A JSON text file is downloaded to your computer.
- Choose OK to accept this download.
- Choose Save Changes.
- On the Authorization tab, choose Review and Submit.
- Select Submit app within this enterprise and choose Submit.
Your Box Enterprise owner needs to approve the app before you can use it.
Go to the downloads directory on your computer to review the downloaded JSON file. It contains the client ID, client secret, public key ID, private key, pass phrase, and enterprise ID. You need these values to create the Box data source in a later step.
Add sample documents to your Box account
In this step, you upload sample documents to your Box account. Later, we use the Amazon Kendra Box data source to crawl and index these documents.
- Download AWS_Whitepapers.zip to your computer.
- Extract the files to a folder called
AWS_Whitepapers
.
- Upload the
AWS_Whitepapers
folder to your Box account.
Create a Box data source
To add a data source to your Amazon Kendra index using the Box connector, you can use an existing Amazon Kendra index, or create a new Amazon Kendra index. Then complete the following steps to create a Box data source:
- On the Amazon Kendra console, choose Indexes in the navigation pane.
- From the list of indexes, choose the index that you want to add the data source to.
- Choose Add data sources.
- From the list of data source connectors, choose Add connector under Box.
- On the Specify data source details page, enter a data source name and optional description.
- Choose Next.
- Open the JSON file you downloaded from the Box Developer Console.
It contains values for clientID
, clientSecret
, publicKeyID
, privateKey
, passphrase
, and enterpriseID
.
- On the Define access and security page, in the Source section, for Box enterprise ID, enter the value of the
enterpriseID
field. - In the Authentication section, under AWS Secrets Manager secret, choose Create and add a new secret.
- For Secret name, enter a name for the secret, for example,
boxsecret1
. - For the remaining fields, enter the corresponding values from the downloaded JSON file.
- Choose Save and add secret.
- In the IAM role section, choose Create a new role (Recommended) and enter a role name, for example,
box-role
.
For more information on the required permissions to include in the IAM role, see IAM roles for data sources.
- Choose Next.
- On the Configure sync settings page, in the Sync scope section, you can include Box web links, comments, and tasks in your index, in addition to file contents. Use the default setting (unchecked) for this post.
- For Additional configuration (change log) – optional, use the default setting (unchecked).
- For Additional configuration (regex patterns) – optional, choose Include patterns.
- For Type, choose Path
- For Path – optional, enter the path to the sample documents you uploaded earlier:
AWS_Whitepapers/
. - Choose Add.
- In the Sync run schedule section, choose Run on demand.
- Choose Next.
- On the Set fields mapping page, you can define how the data source maps attributes from Box objects to your index. Use the default settings for this post.
- Choose Next.
- On the Review and create page, review the details of your Box data source.
- To make changes, choose the Edit button next to the item that you want to change.
- When you’re done, choose Add data source to add your Box data source.
After you choose Add data source, Amazon Kendra starts creating the data source. It can take several minutes for the data source to be created. When it’s complete, the status of the data source changes from Creating
to Active
.
Index sample documents from the Box account
You configured the data source sync run schedule to run on demand, so you need to start it manually.
The current sync state changes to Syncing – crawling
, then to Syncing – indexing
.
After about 10 minutes, the current sync state changes to idle
, the last sync status changes to Successful
, and the Sync run history panel shows more details, including the number of documents added.
Test the solution
Now that you have ingested the AWS whitepapers from your Box account into your Amazon Kendra index, you can test some queries.
- On the Amazon Kendra console, choose Search indexed content in the navigation pane.
- In the query field, enter a test query, such as
What databases are offered by AWS?
You can try your own queries too.
Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from your Box account.
Clean up
To avoid incurring future costs, clean up the resources you created as part of this solution.
- If you created a new Amazon Kendra index while testing this solution, delete it.
- If you added a new data source using the Amazon Kendra connector for Box, delete that data source.
- Delete the
AWS_Whitepapers
folder and its contents from your Box account.
Conclusion
With the Amazon Kendra Box connector, organizations can make invaluable information trapped in their Box accounts available to their users securely using intelligent search powered by Amazon Kendra.
In this post, we introduced you to the basics, but there are many additional features that we didn’t cover. For example:
- You can enable user-based access control for your Amazon Kendra index, and restrict access to Box documents based on the access controls you have already configured in Box
- You can index additional Box object types, such as tasks, comments, and web links
- You can map Box object attributes to Amazon Kendra index attributes, and enable them for faceting, search, and display in the search results
- You can integrate the Box data source with the Custom Document Enrichment (CDE) capability in Amazon Kendra to perform additional attribute mapping logic and even custom content transformation during ingestion
To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide.
About the Authors
Bob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.
Tags: Archive
Leave a Reply