Guess how much stored data is ever used or accessed
Not a lot, says NetApp's Matt Watts as he talks file classification, wastage, and power consumption
Interview NetApp's Chief Technology Evangelist, Matt Watts, is worried about sustainability and data wastage, even as his employer withdraws third-party support from BlueXP classification.
In a 2023 report, just before the IT world became obsessed with AI, Watts wrote a foreword to a report [PDF] that made clear just how bad the data wastage situation was getting. That report noted that 41 percent of data currently stored by UK organizations was unused and unwanted.
But "40 percent is low, very low," Watts tells The Reg. "What we have seen coming back is that it can be between 70 to 80 percent in some cases."
That's a lot of redundant data sitting on powered servers.
Until recently, NetApp's BlueXP classification tool was heterogeneous. "It doesn't just look at NetApp storage," says Watts, "it looks across any storage. It even looks into the cloud providers, S3 buckets, all those sorts of things.
"It then gives us back an entire suite of metadata to tell us who owns it, what the file is, permissions, and it also tells us the last access times as well."
Watts describes it as "a deeper dive into actually pulling out what the reality of the situation is, not the gut feel, but what actually does it look like."
Watts estimates that 15 to 20 percent of power consumed by datacenters is storage, with fluctuations depending on what the hard drives are doing. According to the UK's National Grid, approximately 2.5 percent of the UK's electricity consumption is accounted for by the country's 400 to 600 commercial datacenters. That figure is expected to rise to around six percent by 2030.
"The biggest challenge with managing data more effectively," says Watts, "is ownership. And we have had this problem since I came into the IT industry 30-plus years ago, which is who actually owns the data.
"You can give people as much knowledge as you possibly can about what the data is. But you go out into an organization and say to a group, 'Well, it's your data because you created it,' and they'll say 'No no no no – it's IT's. They own the data.'
"And then you'll go and see IT … and IT will say 'no no no no – we're just the custodians of the data.'"
- Now NetApp sued by its own veep over claims of broken sales commission promises
- NetApp ditches 8% of staff as customers put away wallets
- RIP: NetApp Advanced Technology Group shuffles off this mortal coil
- NetApp reorg: Plans to 'scale public cloud', grow storage systems biz as company sheds hundreds of staffers
It's a challenge that many administrators will recognize, and one that makes NetApp's decision to make BlueXP classification (formerly Cloud Data Sense) focused on NetApp storage systems all the more vexing for data estates not running on the company's technology.
"Initially the product was heterogeneous," explains Watts. "So we added in the capability for it to support S3, it'll scan any NFS mount point, SMB, all of those sorts of things. And it was a chargeable piece of software.
"What we saw was a lot of partners were looking to try and build services around it … So we took a decision, as part of the May launch, that we would use it as a way of better differentiating ourselves."
That differentiation has meant yanking support for the likes of Google Cloud Storage, Amazon S3, OneDrive, and so on. BlueXP classification is now available as a core capability within BlueXP at no extra charge but without many legacy features.
Considering the increasing concern around data wastage and dark data squatting in datacenters, would the heterogeneous functionality make a comeback? Watts is noncommittal: "I don't think we would rule anything out. We are going to continue to allow our partners to request to be able to use it in a heterogeneous nature.
"But going forward, the initial plan is no cost and included for all customers … longer term, we could change that. But right now, we think that that creates maximum value for people who are thinking of using NetApp or currently use NetApp." ®