Comment Elastic Storage, IBM's rebranded GPFS, is a parallel filesystem offering high-speed access to shedloads of data.
That's a long way from accessing data in a tape archive. Yet Elastic Storage can do this too: build and access a tape archive.
It's done by hooking up Elastic Storage and the Linear Tape File System (LTFS). We wrote about that in July 2014. It provides file-level access to tape storage.
The system stores documents in a single global name space and, for each individual file, it selects what it thinks is the most appropriate medium for that document's access profile: cold files are automatically migrated from disk to tape as the lowest cost and lowest power-hungry storage medium, for example. The policies on selecting files for particular storage systems can be changed on the fly.
In all, this means a complete storage system can grow to 100s of petabytes without having to shove everything in flash or on spinning disk. Users see all their files under one name space regardless of storage location, with tape-stored documents taking longer to access, of course.
An Elastic Storage System (ESS) uses GPFS v4.1, which has nine noteworthy features:
- Transparent storage tiering on flash, disk and tape
- Native de-clustered RAID meaning no RAID controller hardware is needed and no single bottleneck on rebuilds
- Clustered NFS server exporting a GPFS file system via NFS
- File system snapshot and quota support
- Global file sharing with multi-cluster and Active File Management (AFM)
- Multisite disaster protection and high availability
- Integrated backup and restore
- Network performance monitoring
- File encryption and secure erase
Files are stored in logical pools; these are given performance, locality and reliability characteristics, which map the pools to media types. Pools can be used to create storage tiers with rules governing migration between them.
For example, if the disk pools as a whole become 90 per cent full then other selection rules can be used to pick files for migration to tape. For example, files that have not been accessed for 30 days are shifted out to tape. Migration can be stopped when the overall disk pools utilisation drops to, say, 70 per cent.
The tape handling is done with LTFS Enterprise Edition (EE); tapes in a library provide an external pool. Several LTFS EE nodes can share access to a tape library, and also share tape inventory and status information.
Files migrated to tape have stubs left on disk. If the tapes are exported from the system, the file stubs are deleted. LTFS tapes can be imported as needed.
Elastic Storage archive concept
There is a tape housekeeping function in which under-utilised and partially empty tapes can have their files copied to other tapes to free up space.
An ESS scheme for archiving could use two EDD domain servers which represent a GPFS cluster. A server, GS1 in the diagram above, is used for file system metadata, while a second is used for file system data. They support an NFS domain: Elastic Storage users access files via NFS.
Additional LTFS EE nodes carry out the archiving process, front-ending the tape library infrastructure.
A second way for file data to enter the archive is via a Content Management Server in a logically separate archive domain. It can link to an email server and a document scanner.
A content management application can gather data from various sources, index it and store it in the file system managed by the ESS. From there it can be migrated to the archive using migration policies.
Find out more from a 25-page IBM White Paper on Elastic Storage Archiving by Nils Haustein.
By the way, a second IBM white paper explains how GPFS can be used to store Hadoop data. It's getting on, the GPFS extension was written in 2009, but if you're interested read about it here. ®