Google Cloud's Natural Language API has become a bit more, er, insightful: it can now sort content into 700 different categories, such as Health, Hobbies & Leisure and Law & Government.
This clustering allows businesses to tailor the info they deliver to customers to match those customers' preferences. Google also added the ability to determine whether a specific entity in text is spoken about in a positive or negative light (previously the API could only find the sentiment in sentences or blocks of text).
While Google says text-clustering would help publishers such as Hearst "understand what their audience is reading and how their content is being used" it might not be all sunny skies. Researchers warn that it raises some privacy concerns.
"If I know the tweets and news and other texts you consume, and I cluster them, then I can very quickly determine your set of interests / sentiments / whatever clustering regime is applied," Eduard Hovy, a natural language processing expert at Carnegie Mellon University in Pittsburgh, Pennsylvania, told The Register.
Tailoring content to preferences could help customers quickly find relevant info, he said, but "on the down side, this creates confirmation bubbles that lead you to believe that the whole world thinks like you do, and that any opposite views you might encounter are just random crazies out there, not the majority."
He added on a more general level that there are a lot of things that can be deduced by clustering user preferences that could be used in unsavoury ways by businesses and governments or ransomed by hackers.
And although businesses might argue they're targeting "classes" of people, not individuals, if enough clusters intersect and more data is extracted, "you can sometimes pinpoint an exact home," he said.
"And further on the down side, [collating such data] means a hacker or a government that gets hold of it might know more about you than you’d like, potentially for blackmail, insurance denial, etc.
"The question is: do you trust Google? Or Amazon? Or whoever it is who knows what text/images/video you consume?" he said.
Google has not responded to a request for comment. ®