This article is more than 1 year old
From server to end user: What's coming up for NFS?
Big changes: Hole-punching and application data blocks
Deep Dive NFS (Network File System) is one of two of the most successful file protocols in the history of computing. From the 1980s with NFSv2 through the widely deployed NFSv3 in the 1990s, and now with today’s NFS4.1 standard – and if you don’t know about NFSv4.1 and pNFS (parallel NFS), you should – the protocol has been developed to keep pace with user requirements and the changing nature of data access and processing.
Growth of storage has exploded. In 2010, according to IDC, nearly an exabyte (1,000 petabytes) of open systems storage was sold worldwide. There was more storage sold in one quarter of 2011 than was sold in the whole of 2007. That’s an astonishing growth in the amount of data, so what’s driving it? Certainly, one of the biggest drivers of storage consumption we’ve seen in recent years is the rise of virtualisation and cloud computing. Both have made the management and processing of larger quantities of data much simpler.
These kinds of requirements are driving the next set of proposed changes to the NFS standard: NFSv4.2. NFSv4.1 and pNFS provided the foundations of improved security, scalability and much improved manageability over NFSv3. The latest proposals for NFSv4.2 promise many features that end users have been requesting above and beyond those in NFSv4.1 – features that make NFS more relevant as not only an “everyday” protocol, but one that has application as a preferred distributed file protocol in and beyond the virtualised data centre.
Server side copy
Virtualisation means compute mobility. No longer tied to physical hardware in specific locations, operating systems and applications can seamlessly move from one server to another. But each virtual machine represents data and may refer to ancillary data too, and all that data needs moving. Today, a copy requires potentially costly and unnecessary moves, as a client has to request data from the source data server simply to re-write it to a target data server, a three-way interconnection.
Server-Side Copy (SSC) removes entirely one leg of such a copy operation. Instead of reading entire files or even directories of files from one data server out to the requesting client, and then having the client write them back out to another data server, SSC permits the target and source servers to communicate directly. The client manages the copy process, but isn’t involved in moving the data. Data is moved directly between data servers, and SSC removes the requirements of maintaining costly and high-bandwidth server-to-client-to-server connections, and reduces the potential for congestion on copy operations.
Guaranteed Space Reservation
There’s a limit to the demands that can be met by simply piling on more disks to meet our potential data needs. Every disk costs money to buy, to run, and more to manage. Many storage system administrators are keenly aware that users tend to overestimate their storage requirements; sometimes by orders of magnitude. Over the years, various efficiency techniques have been employed to give the appearance of a large virtual pool of storage on much smaller real storage systems.
One of those techniques, thin provisioning, gives the appearance of large amounts of available space. Although now commonplace, it can be problematic to manage in fast-growing environments, for example, two users both requesting more than 50 per cent of the available free space: both can’t have it.
A guaranteed space reservation feature in NFSv4.2 will ensure that, regardless of the thin provisioning policies, individual files will always have space available for their maximum extent.
Hole-punching
While desirable for specific types of data, and a reassurance for the end-user who needs the space to be available, such guarantees can defeat the best efforts of storage administrators to efficiently utilise disk.
For example, when a hypervisor creates a virtual disk file, it often tries to pre-allocate the space for the file so that there are no future allocation-related errors during the operation of the virtual machine. Applications like this typically zero the entire file, which is inefficient in I/O, and inefficient in storage used.
In support of better storage efficiencies, NFSv4.2 introduces support for such sparse files. Commonly called “hole-punching”, deleted and unused parts of files are returned to the storage system’s free space pool (see figure 1).
Figure 1: Thin provisioned and hole-punched data
Thin provisioning removes the need for reserving real storage for expansion that may never happen, and the real free space can be shared amongst many users. NFSv4.2’s hole punching takes that one step further, by recognising that files themselves very often contain holes that reserve space, but that contain no useful data. The client’s view is unchanged: NFSv4.2 provides hole punching transparently.