This article is more than 1 year old

Big Data, OpenStack and object storage: Size matters, people

Consider your needs before rushing out and investing in new storage tech

Data lakes or data ponds?

In some cases the quantity of data is not huge. In others, organisational issues make a data lake very hard to build, while collecting many different smaller sets of data is quite easy. This means that you probably have different projects within the company, with different types of data under management and stored on different platforms. This leads to smaller clusters (or cloud services), less overall efficiency and makes it impossible to build a single huge infrastructure without consolidation.

Lack of ease of use and appliances

Up to now, another big obstacle to the adoption of these technologies, especially in medium-sized enterprises, is the lack of pre-packaged appliances and ease of use. Fortunately, this is now quickly changing and vendors are finally presenting pre-packaged – and in some cases hyper-converged – solutions, putting together different components in a pre-assembled fashion.

Part of the benefit comes from simplicity, but fast provisioning and automation also play an important role. With this approach, the IT department can now give freedom and flexibility to all business/organisational units, and can allow them to choose the right product for each one of their projects.

For example, if you look at solutions such as the recently launched by HDS HSP – a hyper-converged appliance based on KVM, Openstack and a proprietary distributed file system highly optimised for Big Data workloads – you’ll find that you can easily build a data lake and leverage it through different data analytics tools in a cloud-ish way.

It’s like having a specialised Big Data-as-a-Service-in-a-box! It doesn’t come cheap (you pay for the integration, support and industrialisation of the product), but minimum configuration is five nodes. Which is not much higher than the 3/4 node configuration of most Hadoop clusters out there, while adding a lot of flexibility by supporting many different Hadoop distributions, NoSQL databases and whatever else you need, while enabling the creation of a data lake.

Closing the circle

On-prem Big Data, OpenStack (private clouds) and object storage are not for everyone. If you don’t have the problem, you don’t need them. It’s just common sense, isn’t it?

In fact, only surveyors and some analysts aren’t aware of it. In this case, leveraging external services is the best choice.

If you are experiencing an exponential growth of data and infrastructure then, sooner or later, you are going to need them. In that case, it’s time to start building the two-tier strategy I have mentioned many times in my blog. A strategy where the secondary tier is the data lake, maybe based on object storage.

Consolidation of different storage/data islands will become an important part of this process, but at the same time, flexibility remains the pillar to maintaining simplicity and usability of data and resources.

This is why I’m sure we will see more data-centric, hyper-converged systems in the market soon, especially in mid-sized organisations where there aren’t enough resources to build these kind of infrastructures from scratch. ®

More about

More about

More about

TIP US OFF

Send us news


Other stories you might like