French upstart Rozo: Magic beans will help us become storage giant
Mojette transform erasure coding-powered filers
A French startup reckons it has the best software technology for scale-out filers because of its bean counter-style erasure coding.
Rozo Systems is developing its RozoFS software-defined, scale-out NAS storage reaching for multi-petabyte file stores. It intends to compete with Isilon and Qumulo, as well as object storage suppliers with NAS heads.
RozoFS runs on x86 servers under Linux. At its core is the Mojette transform, an erasure coding algorithm said to have unmatched performance. The company claims it removes the slowness, poor resiliency, and extra costs typically associated with distributed storage systems holding massive volumes of data.
Mojette transform erasure coding
The Mojette mathematical transform has been worked on in a research laboratory (IRCCyN) in Nantes, France, since 1994. The word "Mojette" refers to a type of haricot bean which was used in the past to teach children to count, so much so that we still use the term bean-counter today.
Moving on from matters legumatical, we understand that Mojette transform* erasure coding starts from the concept of a grid of numbers. To show the idea, let's imagine a 4x4 grid. We can draw straight lines along the rows, up and down the columns, and diagonally through the grid cells to the left and right. Figure 1 in the diagram below shows this.
Mojette grid and projections idea.
The lines are extended outside the grid. For each line, the values in the intersected grid cells can be added or subtracted and written at the end of the line. In fig. 1 the value b19 is the sum of the values in cells p1, p6, p11 and p16. The line of values from b22 to b16 is a kind of projection of the source grid along a particular dimension (diagonally from lower right to top left).
The grid values can be viewed as being transformed into the projected values.
Figure 2 shows four such projections with the grid cells identified by Cartesian co-ordinates, such as cell 0,0; the bottom left cell. Figure 3 shows the projection direction in colour, blue, red, green and black.
If original data is lost somehow when the source data grid is read, then the projected values can be used to reconstruct a missing cell value, with two or more projections intersecting the missing cell such that its value can be re-computed.
We're told that Mojette transform erasure coding is quicker than other forms of erasure coding because it only uses addition and subtraction operations. It is also more economical in that less storage space is needed for the codes than with other forms of erasure coding.
Mojette is said to be more efficient than other forms of erasure coding such as Reed-Solomon, Cauchy "Good", and Intel's ISA-L.
Mojette coding compared to Intel ISA-L. **
Mojette decoding compared to Intel ISA-L and memcpy.
The basic RozoFS scheme is to present a scale-out NAS with NFS and SMB access protocols to accessing servers and applications. In fact there is a pool of metadata servers and another pool of data servers with servers being nodes in a cluster. Servers nodes typically have 8-core CPUs, 32GB+ memory, and up to 80TB of local data storage. The system has multiple NAS heads and is true sofware-defined storage in that there is no hardware lock-in.
Basic RozoFS system layout
With RozoFS there is no data replication and the erasure coding enables data integrity checking and self-healing. It's scalable to hundreds of petabytes by simply adding servers, with no service or quality of service disruption, to a Rozo cluster.
The software is Posix-compliant and the company sees no need to support parallel NFS.