Intel is eyeing off the world of Big Data with the latest round of updates to its Parallel Studio Suite.
In the latest update, Chipzilla has added a Data Analytics Acceleration Library (DAAL) to its venerable Math Kernel Library (MKL).
As Intel explains here, DAAL's aim is to speed the operation of data analysis platforms like Hadoop, Spark, Matlab, and R.
Among other things, DAAL is designed to overcome a limitation in the MKL. As James Reinders writes in the blog post, “most of Intel MKL was designed for when all the data to operate upon fits in memory at once. Intel DAAL can handle situations when data is too big to fit in memory all at once, which can be referred to as having an ‘out of core’ algorithm”.
Over at The Register's HPC sister publication The Platform, Nicole Hemsoth quotes Reinders as saying “no matter what the function or algorithm is, DAAL combines the data handling intelligence with the computational algorithms to manage both the data handling and the number crunching.”
Another challenge using big data in HPC environments is that “the data is in many formats and comes from many different streams and oftentimes, these were not collected and prepared with complex algorithms or computations in mind”.
DAAL is designed to help with this as well, to help find the data, prepare it for computation, and maximise the computation efficiency.
Algorithms in the DAAL include:
- Low-order moments – statistical calculations on a dataset like min, max, mean, standard deviation and variance;
- Quantiles, correlation matrices, cosine distance matrices;
- Matrix decomposition using Cholesky, QR and SVD algorithms;
- Outlier detection, association rules mining, linear regression, classification and clustering.
The Platform notes that as well as DAAL, the MKL 11.3 release will also get new tooling for the visual effects industry, batch GEMM function capabilities, and more MPI wrappers.
More at The Platform, here. ®