AI is changing search, for better or for worse
Bing hasn't much benefited from its ML infusion but Google's rivals sense an opening
Feature Ask Google's Bard chatbot about the future of search and you'll get a summary of trends that suggest there's more to search than finding keywords in an index of documents.
It will mention the advantages of conversational and multimodal input, of personalization and its role in prediction, and of integrations with other services. It will even touch on ethical considerations like privacy, bias, inaccuracy, and disinformation.
But Bard doesn't address economics, a crucial consideration. Google's Q3 2023 revenue for its search business reached $44 billion and rivals would be happy to capture some of that cash.
If AI shakes up the search advertising business, as many believe it is already doing, there will be consequences not just for Google and competing firms, but for all the publishers participating in that ecosystem. News sites have no financial incentive to allow AI-powered search services to crawl and summarize their work if internet users see a summary page and never visit the publishing site or generate ad impressions.
Adding AI to search services will have economic consequences not just for Google and publishers, but for rivals that have to bear the cost of developing and implementing machine learning models. Microsoft's GitHub Copilot is reportedly losing as much as $80 per month per user. Both Microsoft and Google plan to charge a $30 premium for AI features in Office 365 and Google Workspace. And developers implementing OpenAI's API have to pay for the privilege.
This is not entirely surprising given that Alphabet chairman John Hennessy reportedly told Reuters that "having an exchange with AI known as a large language model likely cost 10 times more than a standard keyword search, though fine-tuning will help reduce the expense quickly." And this is echoed in a Cell paper titled, "The growing energy footprint of artificial intelligence." The paper estimates that a standard Google Search uses 0.3 Wh of electricity and an AI-powered Google Search consumes 3.0 Wh. That 10x difference matters at scale.
In its enumeration of ethical quandaries, Bard also overlooks the elephant in the room: the unfairness of capturing content without payment or consent and selling it back to people while commoditizing their work. But in that respect, Bard is like all of us who look the other way, who find AI too useful to repudiate as a product of moral compromise.
The hunt for ... better search
Much of the excitement around AI-assisted search has to do with the tech industry's (and the media's) focus on what comes next. Google has been the dominant search power for decades and there's hunger for change, particularly given persistent concerns about declining search quality in recent years – a trend now ironically exacerbated by the proliferation of generative AI content.
In February, 2023, Microsoft said it was "reinventing search with a new AI-powered Microsoft Bing and Edge." The announcement set off a flurry of speculation about AI as the successor to Google Search, or at least the catalyst for a regime change.
A year on, AI hasn't helped Bing take market share from Google Search.
Nonetheless, smaller rivals in the search and browser businesses are betting on AI for something – whether that's breaking the dominance of Google Search, securing more of the table scraps not devoured by Google, adding brand-distinguishing features that complement search, or impressing investors.
The Browser Company recently launched Arc Search. It's an iOS mobile browser that incorporates a browser, with built-in ad blocking, for searching via the device's default search engine (probably Google) or for turning the query over to its AI model so it can create a summary web page with hopefully relevant details.
When The Register tested it, the Browse for Me summarization option was slow, taking several seconds to build what amounts to a pleasantly readable, ad-free web page with a summary of salient factoids culled from several source web pages. The summary page does include links to these pages but there's no visible way to correlate which cited data point comes from which page.
Brave Software in November introduced a privacy-preserving AI assistant for its Brave browser called Leo (free or $15/month), which was recently integrated with an open source large language model called Mixtral 8x7B.
Brave uses AI for generating a Summarizer section on its search results page, which is sourced from (mostly) its own search index and is supported either by ads or a $3/month Premium subscription fee. It also uses AI for Featured Snippets and descriptions of search results. And it has incorporated an LLM for code-focused queries.
Josep M. Pujol, Brave's chief of search, told The Register in an email that LLMs are not a replacement for search.
"LLM-based search is predicated on having an underlying search engine, either owned-and-operated, or third-party via API," Pujol said. "But you cannot have search without an index (or access to one). LLMs, and the new developments on AI, will have a profound impact on how people interact with search and how results are presented, but there is no replacement, there is composition.
"In other words, LLM models are on top of search, not instead of search."
Pujol insists that running an LLM is nowhere near as costly as search infrastructure.
"We can assure you that running a real search engine is much more expensive than running an AI-model, even at scale," he said. "Proof of that is that there are quite a few companies that use LLMs on top of search (Perplexity, Arc, You, Kagi, etc.). Note that those companies might claim otherwise, but they rely on the search results from third parties.
"There are not many companies with a full-fledged general purpose search engine, namely Microsoft, Google, and Brave."
While timeliness is often cited as a problem for LLMs – which have data from the time they were trained but no later – Pujol contends that's a manageable issue.
"Of course, not all information needs to be encoded inside the models," said Pujol. "LLMs have the ability to incorporate context, which is typically provided by the fresh real-time results coming from a search engine (or any other source of fresh up-to-date data, be it search results, stock quotes, or live-sports events)."
"LLMs cannot be trained or fine-tuned on the fly, but they can incorporate support data at query (inference) time," Pujol added.
Asked what Brave has learned from implementing AI, he added, "There is a thirst for quality data from the whole industry right now for training AI models, and having an independent search engine is a key way to supply this data to third parties. Up to now, Bing was the only game in town (Google does not offer an API, at least not on public access), but it's expensive and changes its API access rules on a whim.
"With our new Brave Search API, we can provide LLMs, developers, and tech companies the data they seek for their AI applications. Brave's aim is to provide an alternative to big-tech; typically we always had the people in mind, but with the release of our search API we can also serve businesses and institutions."
Jan Standal, VP at Opera, told The Register in an email that Opera is planning an iOS version of its browser based in Blink and Chromium, now that Apple's WebKit requirement is to be dropped.
Currently, he said, Opera relies on its own AI backend called Composer that is LLM-agnostic and allows different models, like OpenAI's, to be plugged in.
"Opera was the first browser company to integrate AI in both PC and mobile browsers with its free Aria browser AI service," said Standal. "It features both real-time search capabilities, generative text, and has integrations with the browser through browser prompts. In its current iteration, it's best thought of as an augmentation of users' capabilities when browsing the web.
"In our opinion, Aria doesn't compete with traditional search – it's a complementary service that allows people to ask the browser AI more complex questions. In the future, we will develop it further into a specialized browser AI, providing users with the ability to improve their browsing experiences."
"Our main concern when implementing AI features in search has been to make users more productive while making it clear that AI is a tool meant to enhance human performance, not replace it," said founder Vladimir Prelovac in an email to The Register.
"This is why all such features in Kagi are currently activated on-demand. For example you can ask Kagi to summarize any page in search results, or even all results for that matter. Or you can ask questions about any document appearing in search results."
The Register asked Prelovac how Kagi measures the effectiveness and utilization of its AI answers and he replied that the company doesn't have that data. "Kagi is a privacy respecting search engine and we do not track any user actions, including queries," he said.
Kagi does publish general usage statistics: It has 20,515 paying members who made more than 347,000 queries and more than 1,200 daily Kagi Assistant threads using AI in the past day.
Asked whether AI will change the search business, Prelovac said it's clear to him that it will.
"AI has made a whole new space of queries that did not exist before, possible," he said. "A simple sounding query like 'Which city has more population Berlin or Rome?' was not a query you could enter into a search engine before, and is now possible, and it is good enough to give a nuanced answer."
"In the (very near) future you will be able to even ask Kagi 'Draw me a chart of civil aviation casualties for each year since 1946'," he said. "This is really making the original Google's mission of 'organizing the world's information' possible and it may so happen that it will not be Google that gets to deliver on it."
Prelovac said that in order for publishers to participate in this AI-oriented search, search engines should always cite original sources, as Kagi does, and should provide a share of search engine profits proportional to the appearance of publisher links. "This would align all incentives and create a positive feedback loop," he said.
Prelovac said there are contexts in which search works better than AI and vice versa, and thus expects both to play a role for a long time.
"Most queries are still not well suited for or too slow for AI," he said. "For example 'Starbucks near me' or looking up a movie. Or a common situation: I want to visit The Register and I am not sure what the domain is, I would enter 'the register' to have 'www.theregister.com' be my first result instantly. Comparing this to waiting five seconds for AI to output a wall of text, which may or not contain the link I need, is obviously suboptimal."
But tools like these tend to focus on general consumer usage. In more specialized contexts, the shortcomings of AI models that get mentioned in disclaimers, like inaccuracy and lack of sourcing details, can't be ignored as easily.
In a recent paper titled "Search Still Matters: Information Retrieval in the Era of Generative AI," Dr William Hersh, a professor in the department of medical informatics and clinical epidemiology in the School of Medicine at Oregon Health & Science University, argues that while AI can help information retrieval (search), it's not a replacement.
The paper, he said, has been accepted to the Journal of the American Medical Informatics Association and should appear there any day now.
- Everyone wants better web search – is Perplexity's AI the answer?
- Did all that AI chatbot hype boost Bing's market share? Oh, wait, never mind
- AI processing could consume 'as much electricity as Ireland'
- Digital memories are disappearing and not even AI or Google can help
In an email, Hersh told us, "It is often important when we search to know the source of the information, and what backs up what is claimed in a source, such as a clinical trial."
Some medical questions, he said, "can be answered just fine by AI, but the stakes for getting it right are often higher in medicine and academia."
"When I am looking at information, whether for teaching or clinical application, it is important for me to know who wrote that information and what evidence backs it up," Hersh explained. "As there may be many studies and/or papers on a given topic, I want to see the original sources so I can do my own synthesis and appraisal of what those studies and papers say. It may be helpful to get an AI overview of a body of literature, but there are many instances when we want to have the source information presented so we can make our own determination."
Assistive AI may be helpful for formulating ideas and for assisting in the interpretation of information, said Hersh. But for people making important decisions in clinical or educational settings that are based on specific information, "seeing where the information came from is just as important as some AI synthesis of it."
Hersh's paper observes that since the early days of the web, there have been concerns about information quality. Initially, Google Search helped by ranking pages for relevance, which became a proxy for quality. "Nonetheless, the information quality war has probably been lost, especially with the emergence of social media as well as methods for manipulating the retrieval of disinformation," the paper says.
Asked whether he believes that part of the interest in alternatives to search (Google) has to do with the growing difficulty of search relevance in a polluted information environment, Hersh said, "Yes, very much so; the Internet has been flooded with disinformation, and it can be challenging to partition good from the bad with Google and other search engines. Another reason why source information is so key. The medical literature and its search via PubMed is much better, though it still has some imperfections."
AI is changing the nature of the search business, but it's on us to improve the quality of the information used to build LLMs and search indexes. And doing so will probably require avoiding AI-generated content. ®