Enterprises are getting stuck in AI pilot hell, say Chatterbox Labs execs
Security, not model performance, is what's stalling adoption
The Register: Why have large companies not done more with AI?
Battersby: At a general level, there is a huge amount of excitement about AI, understandably, right? The promise and the vision for where people want to get to is good, particularly when you think about the agentic world and what that can bring.
They get stuck in this kind of pilot hell
I think the challenge that organizations have at the moment, particularly enterprise organizations, is that there are a lot of pilots, and there are a lot of proofs of concept [POCs] going on.
But what you see is that once they've gone through their POC phase, their demonstration phase, it's quite hard for them to get from the pilot phase into production, to actually do the stuff that they want to do at scale.
And that's not because of any problem with the advanced nature of the AI models.
The challenge is that they haven't gone through the normal technology risk process that all other technology goes through to be adopted into an enterprise organization.
So, they get stuck in this kind of pilot hell. And that's why I think you see the disconnect between the promise and early results that people get. The reality is that they're actually not scaling into production because they haven't got over that hurdle yet.
The Register: What are some of the issues customers raise? Is it security? Is it compliance? Is it cost?
Battersby: Cost is always a factor in all of this stuff. But there are lots of advancements out there at the moment that make these things more efficient. Small language models are really helping with that. And the development of chips specific to inferencing – think Cerebras, think Groq – are bringing that price point down.
The main thing is … your security folks, you know, the CISO organization and down that now have security concerns about these things. Particularly in an agentic world. It's also your technology risk and governance and compliance departments that have to validate any technology that goes into an organization.
Coleman: I think I'd add something to what Stuart said there as well, because I think there's actually a merging of two industries that are overlapping, [traditional cybersecurity and AI]. And I don't think many people have got a grasp of that.
If you think of typical cybersecurity where you might look at risks and things around network traffic, identity management, all this clever stuff that … I don't think they're aware of the kind of attack vectors that AI is going to bring.
If you look at the landscape and what's been done, a couple of pioneers [like Cisco] have addressed that. For example, they acquired Robust Intelligence. Or if you look at Palo Alto Networks, which acquired Protect AI. There are some market leaders.
But the problem [for a lot of security organizations] is they don't come from AI backgrounds. So they really don't understand what the risks are.
One of the biggest things I see is if you're sitting in a large organization, you're going to have a chief risk officer and then you're going to have a CISO. How do you appease both of them?
That's one of the big issues today to actually get things into production. Because you're talking about huge risk in even regulated environments.
Battersby: And to tag on to that, if you think about that cybersecurity environment, these companies do a great job at cybersecurity. They've got cloud security posture management, data security posture management.
But if you take something like CSPM – cloud security posture management – and try to translate that into an AI world, you generally look at things about the deployment environment around AI. It kind of ignores the fact that it is AI. So you want to make the environment secure. Absolutely. You want to make sure that endpoints aren't exposed to the public, that the right users are authenticated into the AI. Of course, you need that stuff. But it ignores the AI in the middle of it.
[You need to think about both traditional security and AI security.] Because even authorized users into an AI system can then get the AI to do something that it should not do. That's the point of treating it like an AI model, not just an application.
So once those authorized users are in the system, they can manipulate it. They can attack the AI. They can make it give away confidential information. And that's where you need not just traditional cybersecurity, but you need [to think about the AI]. Like Cisco and Robust. They did a good job there.
Coleman: One of the things we discussed before ... is inferencing [which is running AI models once they're trained]. When we talk about inferencing, primarily the way that we see it is that there are the market leaders. It's all out there. Who's buying the most chips? Who's built out the infrastructure? It's all done.
But then there's a software layer to that infrastructure. I think that everybody's trying to be that sort of be-all and end-to-end.
You've got people in open source, people like Docker, people like Red Hat, people like SUSE. They want to build all the constituent components within that. But they're not mature enough. They're not ready yet.
Then you've got like inferencing companies that are new, that are bare metal, people like Together AI, like Lambda Labs, all these really, really good companies, right?
And in infrastructure, there are big monoliths, big companies that have been around for years that are trying to level up, trying to modernize. So think VMware, right? Think HPE private cloud, think the Dells of this world.
And then on top of that, you've got the real AI players in the cloud – Google, Microsoft, AWS – trying to offer that one-stop solution. You know, what we call "closed shop" now, right? So you've got this meeting of open source and closed shop all trying to come at it, but none of them have extended out the software layer. All the hardware is built, all the infrastructure is built.
But [what every senior executive is] talking about is that the orchestration layer, the governance, and the security has not been nailed. It's not been addressed.
And I think they're huge things. So when Stuart and I were talking earlier about a chief risk officer or a CISO, it's all to do with that governance and that security software layer that will protect you, whatever imprint you choose, whichever partner you want to work with. So we think that's the next big move that people are going to play.
The Register: Are the major internet companies making things worse by constantly raising the alarm about AI? Just recently, OpenAI CEO Sam Altman announced that internet use has been enabled in Codex while warning about the risks of AI internet use.
You're not going to come into my enterprise environment until you tell me and show me and validate that this is safe
Coleman: There's almost a storm brewing is the way I describe it. Good luck to them all, whether it's Anthropic, OpenAI, whatever, they're all working for market dominance. They're all pushing products daily, weekly, monthly out there hoping one sticks, which one's going to get the mass user base. But I think fundamentally, they're viewing life very differently from an enterprise.
I'll come back to the CISOs and the chief risk officers because these people have been around in their businesses for maybe 20, 30 years. They have very clearly defined processes, governance protocols, security protocols. [They'll say] all right, you keep pushing this stuff, but you're not going to come into my enterprise environment until you tell me and show me and validate that this is safe.
So I think there is that bit about let's throw it out there. Let's see what sticks. And that's a model for them. But enterprise adoption is only like 10 percent today.
McKinsey is saying it's a four-trillion-dollar market. How are you actually ever going to move that along if you keep releasing things that people don't know are safe to use or they don't even know not just the enterprise impact, but the societal impact?
People in the enterprise, they're not quite ready for that technology without it being governed and secure.
Battersby: What that means when they implement this stuff is that, outside of security for a moment, if you think about how an enterprise runs their AI, they need to be agnostic. They need to be able to have some kind of runtime – and there are plenty out there that allow them to kind of switch in and switch out, you know, because this stuff moves so fast, both politically and technologically. Putting all your eggs in on one major closed shop model vendor means it's kind of locked there.
But if you have a runtime that enables you to [switch vendors, that] gives them the flexibility to adapt to these things themselves and [maintain control].
And it also means that they can measure this stuff. So, you know, you have dimensions of cost, of performance, of security and safety, and they can kind of pull those threads and decide over time how they want to switch in and switch out. So it means that then you're not tied to these monoliths that might change direction.
The Register: Among the companies you've seen that are actually using AI successfully, do you have a sense of what they're using it for? Is it delivering meaningful business process improvement or is it still largely experimental?
Coleman: I think we've seen some really good examples, like for example, BMO … It's a customer of ours. Some of the stuff they're doing is around putting AI into production, talking about agentic workflows. But really for them, it's all around risk. And they're doing a lot of work around KPIs, risk across their business, and rolling it out that way, because fundamentally they've got key risk indicators just running through their entire business.
They want to understand before they put any of this into production, what are the risk levels, what are the risk appetites?
A bank, for example, is highly regulated, right? Risk exposure is a big part of a bank, but we're seeing lots of banks [using AI]. Insurance companies are another one, right? So insurance firms are running lots of use cases around things like HR, around claims, around underwriting.
I wouldn't agree with the market that says the use cases are not in production and they're not live. They are. There are lots of them. But what Stuart and I are saying is some of them are very much early market movers and quantifying the investment versus what the outcome is, the long-term outcome, I'm not sure we're there yet.
- AI kept 15-year-old zombie vuln alive, but its time is drawing near
- Cellebrite buys Corellium to help cops bust phone encryption
- AI can spew code, but kids should still suffer like we did, says Raspberry Pi
- Meta pauses mobile port tracking tech on Android after researchers cry foul
Battersby: The ones that have moved from the use cases that actually generate ROI for them are the ones that have managed to follow a process to get this stuff into production.
The little chatbot use case on internal assistance, that's very low risk, but they push them through. The ones that are actually where you're going to make money, with AI helping in core business processes or getting to market quicker or that kind of thing – they're the ones that actually have to go through the vigorous testing process to get there.
The Register: What does testing models for security look like?
Battersby: So, you know, when people talk about security and safety, they need to think about the actual use case that they are addressing, because when something is secure and safe, that very definition is specific to that use case.
What is secure for an internal research tool in a travel company is very different to a patient-facing use case in a healthcare company, right? These things can fundamentally do different things.
So the testing process, first of all, has to be specific to the use case, so that you're testing for the right thing. When people talk about safety, they typically talk about things like self-harm, hate speech, things that generally are motivated by content safety policies, and they're important. You've got to address them. But if you only address them, then you don't address the nuances of your actual use case.
So the first thing is to think about what is safe and secure for your use case. And then what you have to do is not trust the rhetoric of either the model vendor or the guardrail vendor, because everyone will tell you it's super safe and secure.
You actually have to test it and do an iterative process. Because once you know what you're testing for, then you're going to actually run the tests and see, well, actually, OK, some models are safer than others, but there are holes in all of them.
We can fix them and we can get them to a safe position. But you have to go around that iterative process of getting the factual metrics on the model's security and safety to then iterate and get into production.
[In the case of a global banking customer, we want to test for] the standard sort of content safety motivated categories. We don't want these models [giving harmful advice or advocating illegal activities or presenting explicit material].
But actually, for this bank, you know, they [also care about other things such as] suspicious activity reporting, counterfeit bank statements, and fake loan repayment schedules. They have to create the right test cases, the nefarious test cases and do that automatically.
So, what you don't want to do when you do that is sit down and write them. That's like a manual red teaming approach. It's never going to be fast enough, and you're never going to have the people with the [skills] available to do that.
So, you automatically generate the nefarious test cases, the things you want to test your system with. And then to the point about the safety of basic systems, well, these guys may have tested Anthropic in AWS Bedrock, Bedrock just being the runtime here, right?
And actually, when you start trying to jailbreak it and start looking at the security of the system, you see there are a lot of problems with it. [Battersby showed a chart similar to these presented on the Chatterbox Labs website.]
So, then they iterated, right? And what they did is they said, OK, well, we're going to put an external guardrail system on the front of this model to add to the security of it.
We're going to add, in this case, Bedrock guardrails just with their default configuration on this system, to any of the guardrail vendors.
It gets a bit better. You know, those guardrails external to the model do help a bit, but they're no panacea, right? The CISO is still not going to approve this.
When they then go through and update those, they change the configuration to specify them with the knowledge of the testing process.
Now they're at a security level that is acceptable to get into production. So, when we're talking about these ideas of thinking about what is safe and secure to your use case, that's specific there, and then not quite going with the rhetoric, but actually getting the factual metrics to test these things, understand their security profile, iterating, you know, like in normal software development or security testing to get you into a safe state.
What's not in [the slide deck being shown] and what's really important is that that doesn't stop, right?
So, what they've made at the moment is at the point where they go live, they can say, "OK, well, at this point in time, this thing is secure and safe by what we've just defined this system to be." But outside of their control are these models. They're running, in this case, out in the cloud.
They're non-deterministic, there are multiple components, there's a model, there's then another external system, the guardrails, running in a cloud environment with content safety filters that all change outside of their control.
Content safety filters, guardrails are not good enough. And it's not going to change anytime soon
So, what they do is they keep checking this stuff, right? They keep maintaining that because it's an ongoing process to keep checking that, and that gives them the confidence to keep it in production and keep it secure.
Coleman: What we're trying to get across to you is that content safety filters, guardrails are not good enough. And it's not going to change anytime soon. It needs to be so much more layered. If you go back 30 years, people used to test the ass out of everything. That's not happening in AI. That's a big, big market, the AI security market.
The Register: Is there a way to generalize about the real cost of running AI with this kind of security model? Is it, say, 50 percent again the initial investment to run things securely?
Battersby: Well, it's a really interesting point because what you see sometimes is it gets cheaper. Because, you know, so when you test this stuff, yeah, you're going to be hitting the endpoint a bit more. It's normally like 50 bucks. It's not an expensive thing.
But when people are doing this sensibly, what they're doing is looking across a whole portfolio of models.
Coleman: One bank's got 800, for example, to work with.
Battersby: And so when they're choosing which model to use, maybe it's not actually the biggest model that is the most secure model. People in the past, when they're operating blind without having done the security and safety testing, well, they go for the big one because the assumption is bigger is better.
But actually, if you measure this stuff, then you might see that actually the big, medium, and small language models all have the same security and safety level.
So you could choose the smaller one. Or sometimes the smaller ones are actually more safe because they don't have the nefarious abilities of the larger language models.
So the point I'm making is that once you illuminate this stuff so that you can actually see what's going on, you can then evaluate that security dimension against something like cost and actually sometimes lower your runtime cost by using a smaller model that is as safe or safer than the big ones.
The Register: So instead of "measure twice, cut once," as carpenters will say, it's "measure security all the time"?
Coleman: All the time. Absolutely. So to your carpentry example, measure all the time, and you can be flexible here and switch these things in and out when you're armed with the right knowledge.
[Pointing to enterprise concerns about sharing data with companies like OpenAI, he said] we're seeing a bit of a shift away from some big language models to small models that are being housed on-prem because they can control what new products and services they build.
And when we talk about moats, as we know, the big LLM providers, they don't have a moat. It's just mass marketing. It's a name. It's a brand. The token price is shrinking. What we're seeing is some people building smaller models, taking them on-prem, getting them into production quickly, driving new products and services. Very smart. Because it's starting to collapse. The ecosystem is starting to open up. ®