Grafana Labs updates observability line-up with query-less visualization
CTO Tom Wilkie gives an optimistic take on AI without climbing on the bandwagon
Grafana Labs showed off new releases of its eponymous visualization platform, an updated version of Loki, and introduced its distribution of the OpenTelemetry collector, Alloy, at its Amsterdam GrafanaCON this week. We spoke to the company's CTO, Tom Wilkie, about the updates and where the tech industry darling of the moment – AI – fits into everything.
Wilkie joined Grafana Labs as part of the Kasual acquisition in 2018. He became CTO in 2023 and was present when the company infamously shifted its license from Apache 2.0 to the Affero General Public License (AGPL) v3.
According to Wilkie, Grafana Labs has not been impacted by the change. He says: "We track very carefully the size of the communities, the growth of our community, the engagement of our community.
"And that's why when we say we haven't seen the impact, we haven't. The community is still growing and engaging in the same way it was before the change."
The company unveiled Grafana 11 at GrafanaCON, which is replete with improved visualizations, new data sources, including PagerDuty and SumoLogic, integration with Tempo and Traces, and simpler alerting.
However, the most notable improvement is the introduction of Explore Metrics, which assists in wading through the reams of data spat out by the observability platform. Where knowledge of PromQL – a functional query language for the Prometheus monitoring system – has been a prerequisite for the proverbial needle in the data haystack, Grafana Labs reckons that Explore Metrics (and the similar Explore Logs) will make things query-less. Almost.
Grafana already has a query builder for PromQL and announced generative AI features at ObservabilityCON 2023 to simplify query writing, but as Wilkie says, this only goes so far.
Wilkie says: "Its utility is minimal. If I'm honest, if you don't know what the query is supposed to do, it just generates rubbish queries."
To be fair to Grafana, that has been our experience with most natural language query tools that use generative AI. When a query has been defined to the point where garbage won't be spat out, a user might have just as well entered it themselves.
With Explore Metrics, the intention is to move away from a traditional query builder model. Wilkie explained how Grafana had wrestled with the challenge of creating something that could dig into a sea of metrics without requiring technical knowledge of PromSQL.
He says: "I remember the Ah Ha! moment for me was when we purchased Pyroscope, the profiling company. We had an infusion of fresh blood, and they were showing us some of their UX paradigms for navigating profiles, and that was a lot of the inspiration for this."
Certainly, Explore Metrics is a refreshing change from the query builders of old, although Wilkie guessed that the service will follow the 80-20 rule: it'll deal with 80 percent of use cases, with 20 percent still requiring the user to have a bit more technical skill.
As well as Grafana 11, Loki – the log aggregation system – has hit version 3.0 and introduced Bloom Filters as an answer to one criticism often leveled at the system: not a lot in the way of indexing, which made searching and querying tricky for non-developers.
Wilkie says: "Loki was always designed to be: 'Home in on the thing you want, and then show me the logs' ... I think that whole experience really resonated with developers.
"But, at the end of the day – and I think this was a really hard-learned lesson for us – that use case won the hearts and minds of developers, but log aggregation systems are used by more than just developers in large organizations.
"We have a very large retail customer in Europe using Loki for all of their centralized log aggregation. And it turns out that their last line of support team is basically going into Loki and searching over petabytes and petabytes of data. And they don't know how to narrow that search space down.
"And so they're like: 'Tell me every logline with this Order ID over the last 30 days,' over 30 petabytes or more of data. That's kind of a pathological use case in Loki. That's slow, right?"
Bloom filters – probabilistic data structures that are far less resource-hungry than traditional indexing – speed things along to Wilkie: "Queries that used to take minutes now take seconds."
There is a cost to the addition. Wilkie admits that the approach has slightly watered down Loki's no indexing approach, but only adds approximately one percent to the cost of log ingestion.
And then there's the thorny issue of AI. Wilkie is starting from the assumption that LLMs "aren't super-applicable to observability data," although he is very open to having that assumption tested.
While several vendors are leaping aboard the AI bandwagon, and some of the algorithms lurking in Grafana Labs' products might meet the criteria for AI, Wilkie remains cautious. Sure, there are any number of neat-looking demos out there summarizing incidents and helping with natural language querying using LLMs, but according to Wilkie: "None of them are that crucial breakthrough feature..."
- 404 Day celebrates the internet's most infamous no-show
- VictoriaMetrics takes organic growth over investor pressure
- Open source license challenges part 461: Element plots move to AGPLv3
- CNCF's chief techie talks WebAssembly, AI and licenses
Wilkie goes on: "Outside of observability, a lot of the AI use cases currently being talked about are: 'This is going to replace a human. You will need fewer engineers now we've got AI.'
"I, personally, don't think that's going to be the case.
"What will happen – or I hope will happen – is an optimistic version of the future ... those junior engineers who maybe don't have the experience and a model in their head of what's going on will have a virtual coding person to help them. I hope that means that more early career engineers can be more productive sooner."
Wilkie reckons that AI isn't about to replace SMEs, but instead takes the optimistic view that, if done right, some processes could be hugely accelerated. ®