OpenIO, blind nano-nodes and coffee cup detection
What you don’t know, you can’t look for - obviously
Interview In a story about ARM-powered, Ethernet-addressed, object storing disk drives, I said such drives couldn’t carry out image searches at a drive-level because they would be operating "blindfolded". OpenIO says “rubbish” to that in a blog it wrote (I exaggerate.)
So I spoke to OpenIO’s co-founder and COO Marie Ponseel, suggesting this situation:
El Reg: Imagine an object is stored in its entirety on a nano-node with some metadata describing it, for instance, a TIFF image. Thus software running in the nano-node could act on that object - whereas, if objects are broken into chunks and striped across the equivalent nano-nodes, which won't know what the data stripes they store refer to.
The key point I see here is that "an object is stored in its entirety on a nano-node" as, if it is not, I don’t see how nano-node server-level SW can work on it.
Could you confirm that please?
OpenIO: We can store objects in several ways from the physical point of view (erasure coding, multiple replicas and it can also be compressed if we need it). And yes, objects can be chunked in smaller pieces.
From the logical point of view we maintain full visibility on the object. It can be searched through metadata and be part of more complex processes as well. In fact, the object can trigger an event (for example when it is written) and Grid for Apps, our server-less framework, can run an app/task using that particular object as data set... no matter if SDS clusters runs on x86, ARM or both at the same time.
One use case that our customers are finding very compelling is in the Media and Entertainment industry. One of our customers ingests the video and when the video lands in the system it triggers an event and it is automatically encoded in several different formats (creating as many new objects as necessary, one for each single format and with its new metadata). This particular customer is on x86 now, but he could add ARM-based SLS4U-96 any moment now.
El Reg: If the object is stored in its entirety then the nano-node CPU, knowing it is, say, a TIFF image, can look inside the object and find, say, a blue coffee cup, using some image search algorithm. If objects are broken into parts and encoded and stored across several nano-nodes then each nano-node only stores a part of the object and won't know or be able to know what is inside it. It cannot necessarily know that. amongst the chunks of data it stores one or more is part of a single TIFF image which, in its entirety contains a blue coffee cup. The chunks of data it stores could be parity data rather than object data. How will it know?
How will 10 nano-nodes, say, be able to somehow combine their component TIFF image data chunks into a single image and be able to search inside it for a blue coffee cup?
So in this erasure coded or chunked case each nano-node will be operating blindfold.
Or will it? Does OpenIO have technology to solve this problem?
OpenIO: The nano-node can definitely do that.
The object is physically chunked (and also compressed if you don’t need high performance) and then spread on several nodes.
SDS maintain a full map (redirect tables) of the chucks for each single object. It means that applications and nano-nodes always access and serve full objects (this allows us to always have the best load balancing - no matter the number, size of objects or nodes in the cluster).
Each single nano-node can search metadata, retrieve the object and operate on it.
Let me explain the process through an example:
- You store the image with a blue coffee cup.
- The image is stored on SDS (no matter if it is our ARM-based SLS or an X86 cluster… or even a mix of them.)
- The image is stored (data+metadata). Metadata is accessible cluster-wide (as it happens for any other object store). Data is chunked and dispersed on various nodes in the cluster (we use a set of algorithms to decide how and where)… this also means that a variable number of nano-nodes (or x86-nodes) actually store the chunks.
- The object is still accessible by any node of the cluster (as it would happen for any object storage system in the market), in fact our system is fully load balanced and any node of the cluster could respond to an API request (i.e. an object retrieve, metadata search, etc.). It also means that each single node has visibility of the entire domain (we don’t need external load balancers, or master/front-end nodes).
Now, let me go back to the blue cup:
- You can run additional applications on the nano-nodes. these applications are triggered by events. This application framework is called Grid for Apps.
- when you save the image, an event is created and passed to Grid for Apps.
- Grid for Apps, thanks to Conscience technology (a set of algorithms which continuously monitor the cluster and oversees to all balancing/orchestration operations), picks up an available nano-node in the cluster and runs the application on top of it.
- At this point the nano-node retrieves the object and does what it is requested by the app.
- It can (now) analyse the object and its metadata,
- discover the cup,
- add additional metadata to the object (with cup description for example),
- create other objects from the first one (for example different file sizes)... and so on.
Today, as I already told you, the nano-nodes we use in SLS4U-96 are not powerful enough (only two ARM cores) to run applications… we decided to keep costs down for the best $/GB, but next models will have much more CPU power (we are also investigating other possibilities, like adding a GPU…).
At the end of the day we are embracing the server-less concept and building the ultimate hyper-converged infrastructure where applications can work on data without needing hypervisors, VMs, containers or operating systems to manage.
The reconstruction of a chunked image is the key for me; that enables a nano-node to carry out an image contents search, or to carry out any other other stored file activity where content-level knowledge and access is key. Clever stuff. ®