The concept behind Kendra, the enterprise search service that Amazon Web Services made generally available today, is pretty simple.
Hook up connectors to all the data sources relevant to your organisation. Add a FAQ for good measure. Then leave it to the Kendra engine to do the indexing and add some machine learning, AI and natural language processing magic so that when staff ask a question like "Who is our biggest customer in Switzerland?" or, "Where is meeting room 3 in building 32?", they might get something helpful in response.
However, Kendra currently appears to have a limited number of connectors. While the firm noted this morning that the "Amazon Kendra preview offered SharePoint Online, Amazon S3, and databases," it added that "starting today, Kendra will offer connectors for other popular data sources like, Salesforce, Servicenow and OneDrive, with more coming up later this year."
A look at the documentation shows supported data sources include S3 buckets, Amazon RDS MySQL or PostgreSQL, Microsoft OneDrive, Microsoft SharePoint online, Salesforce sites, and ServiceNow instances.
It is easy to spot big gaps in that list, which is way short even of what was promised at Re:Invent. Where is Microsoft Exchange, Box, Dropbox, Jira, Confluence, Google Drive and so on? Where are NoSQL databases, or databases which are not on AWS? Coming soon, we hope.
As you would expect from AWS, you can also write your own custom connectors, so if the service wins any sort of traction, plenty more should be on the way from third parties. Still, note that Google's equivalent, called Cloud Search, has many more connectors. So does Microsoft Search.
Permission to search and enter
Security and confidentiality is an issue. Let us say I have documents in Microsoft SharePoint online. In order to add this to Kendra, you need to set up credentials to access SharePoint - however, the authentication to SharePoint in the docs is a single SharePoint user. "Enter the username and password for your SharePoint Online account, and then choose Save authentication to save the new secret," the docs explain.
What if you have different permissions for different libraries or folders on SharePoint? How can you have it so that the search results are limited to information the current user is allowed to see? A quick look at the docs does not show an answer to this tricky question, though it is easy to control access to Kendra itself according to AWS IAM (Identity and Access Management) policies.
We have asked AWS whether we missed something. Google's tool can do this properly by mapping from external identities and limiting results based on user access. It is a key issue, since confidential information can otherwise leak, or the index be less useful if it only includes what everyone is allowed to see.
We did further research into the way Kendra handles SharePoint permissions. It turns out that the search engine does record the Azure AD (or Microsoft AD) permissions on SharePoint content, known as ACLs (Access Control Lists). However it is down to the developer to set up a Kendra-driven search application in such a way that the user's Azure AD or Microsoft AD identity and group membership are retrieved by the application and then passed to the Kendra API, in order to filter out documents with ACLs that don't match the user and groups.
This also means that the account configured for Kendra to use when retrieving documents from SharePoint requires admin privileges. It is all a bit manual for our taste, and we understand that AWS is considering deeper integration with Azure AD, AD and other identity providers in order to make this easier to implement. Note that the AWS IAM-based permissions are only for administering Kendra, not for authenticating users.
Domain optimisation is a feature of Kendra which relates to its AI, learning the jargon for specific sectors. According to Amazon's post this morning, there were six domains on offer during preview (IT, Finance, pharmaceuticals, Insurance, Energy, and Chemicals), to which eight more are now added (Health, HR, Legal, Telecom, Media & Entertainment, Travel & Leisure, Automotive, and News).
Setting up and pricing
You get started with Kendra by creating an index. You are asked to choose between two product editions. The Developer edition supports up to 10,000 documents, 4,000 queries a day, and one availability zone. The Enterprise edition supports 500,000 documents, up to 40,000 queries a day, and runs in three availability zones. It can be further expanded. Cost is $2.50 per hour for the Developer edition (with a free 750 hours for the first 30 days), or $7.00 per hour for the Enterprise edition – that works out to $1,800 per month for Developer or $5,040 per month for Enterprise – plus additional charges for connector usage. We get the high cost for Enterprise, but the pricing seems expensive for developers testing out the service.
AWS has promised that "your Kendra search results get better as your end users use the service. Kendra actively retrains deep learning models built for your data set and employee usage patterns to improve search accuracy."
We await user reports on how well that works, though this is bread and butter for AI so should be of some value.
When AWS unveiled the Kendra preview at the Re:Invent conference in December 2019, CEO Andy Jassy implied that all rival enterprise search products were purely keyword-based and returned "gobbledygook." This is not the case and rivals like Lucidworks and Coveo also make great play of their AI capabilities, as do public cloud rivals Microsoft and Google.
Enterprise Search is expensive but worth getting right since there is significant productivity benefit from fast, accurate and relevant results. Although out of preview, it looks like with Kendra AWS is only just getting started on this. ®