Intel stretches HPC dev tools across chubby clusters

Cluster Studio XE ready for MICs, not for GPUs

SC11 Supercomputing hardware and software vendors are getting impatient for the SC11 supercomputing conference in Seattle, which kicks off next week. More than a few have jumped the gun with product announcements this week, including chipmaker Intel.

No, Intel is not going to launch its "Sandy Bridge-EP" Xeon E5 processors, which are expected early next year. But the new Cluster Studio XE toolset for HPC customers will help those lucky few HPC and cloud shops that have been able to get systems this year to squeeze more performance out of their Xeon E5 clusters.

The Cluster Studio XE stack includes a slew of Intel tools for creating, tuning, and monitoring parallel applications running on x86-based parallel clusters. Intel had already been selling a set of application tools called Cluster Studio, which bundled up the chip giant's C, C++, and Fortran compilers, its rendition of the message passing interface (MPI) messaging protocol that allows server nodes to share work, and various math and multithreading libraries to goose the performance of applications.

With the XE (Extended Edition) of the HPC cluster tools, Intel is goosing the performance of the MPI library, and claims its MPI 4.0.3 stack is anywhere from 3.3 to 6.5 times as fast as the OpenMPI 1.5.4 and MVAPICH2 1.6 MPI stacks from the open source community. Benchmark tests were done on a 64-node system running 768 processes and linked by InfiniBand switches.

Intel tested the Platform Computing MPI 8.1.1 stack against the three MPI stacks listed above, only this time on an eight-mode system; in this case the performance differences between Intel and Platform (which is now owned by IBM) were not huge. With the Microsoft MPI 3.2 stack on the same iron, the Intel MPI stack running on Windows servers was anywhere from 2.17 to 2.74 times faster than the Microsoft MPI.

The updated Intel MPI stack can scale to over 90,000 MPI cores, and also has hooks into the open source SLURM job scheduler that was created by Lawrence Livermore National Laboratory because of its frustration with closed-source job schedulers and the state of the open source ones.

With the Cluster Studio XE roll-up, the Inspector and Debugger modules now have cluster-level data gathering and reporting, instead of just seeing things at a node level. What this means, in plain American, is that these add-ons to the compilers can look for memory leaks and threading errors across a cluster of machines without sending the HPC application programmer on a wild goose chase to locate performance issues or crashes on an individual node. (With 90,000 cores, which is 5,625 nodes using the future eight-core Xeon E5 processors, you can't look for these issues manually.)

The Trace Analyzer and Collector module can now look at MPI performance across the nodes in a cluster and evaluate how well MPI is load balancing across the nodes. The VTune Amplifier, which is a tool that Intel uses to virtualize the threading behavior in a single node, can now show threading issues across the cluster.

The Cluster Studio XE bundle includes the Intel v12.1 compilers that were launched in September, which offered between 22 and 27 per cent better performance on Fortran benchmarks and from 6 to 11 per cent on C/C++ integer performance compared to the v12.0 releases running on Linux and Windows machines. C/C++ floating performance improvements were a few points. Intel claims it has a considerable performance advantage over other compilers – anywhere from 21 to 47 per cent faster code execution on C, C++, and Fortran tests. And that performance is not just tied to Intel's own Xeon processors.

Perhaps more significantly, on Fortran, Intel now believes it has the performance edge over Portland Group 11.4 and Absoft 11.1 on either Windows or Linux machines. The performance jump is particularly acute on Windows machines running C++.

"We believe that we have the best performance, regardless of the type of x86 chip," James Reinders, evangelist for Intel's software division, tells El Reg.

The v12.1 compilers are tuned up for the forthcoming Xeon E5 processors, and even though Intel has not been able to get its hands on machines using AMD's impending "Interlagos" Opteron 6200 processors to tune and test them, Reinders says that he is confident that the compilers and the Cluster Studio XE tools will wring more flops out of these AMD chips than the alternatives.

The interesting twist in all this is that the Cluster Studio compilers and tuning and visualization tools cannot peer into GPU coprocessors, and Reinders says he is not even sure how Intel would go about doing that. But because the future "Knights" x86-based coprocessors are based on the same architecture as Intel and AMD chips, Cluster Studio XE tools will be able to see into these MIC coprocessors and help coders tweak and tune their apps for them.

The normal Cluster Studio stack, which includes the Intel compilers as well as the math and clustering libraries, costs $1,849 per developer on a Linux workstation and $1,499 per developer on a Windows workstation. There is no runtime or royalty charge for having the tools run on a parallel x86 cluster. If you want to go all the way to the Cluster Studio XE stack, then you pay $2,849 per developer on Linux and $2,499 on Windows. Yes, the Windows versions are cheaper. ®

Similar topics

Broader topics

Other stories you might like

  • Despite global uncertainty, $500m hit doesn't rattle Nvidia execs
    CEO acknowledges impact of war, pandemic but says fundamentals ‘are really good’

    Nvidia is expecting a $500 million hit to its global datacenter and consumer business in the second quarter due to COVID lockdowns in China and Russia's invasion of Ukraine. Despite those and other macroeconomic concerns, executives are still optimistic about future prospects.

    "The full impact and duration of the war in Ukraine and COVID lockdowns in China is difficult to predict. However, the impact of our technology and our market opportunities remain unchanged," said Jensen Huang, Nvidia's CEO and co-founder, during the company's first-quarter earnings call.

    Those two statements might sound a little contradictory, including to some investors, particularly following the stock selloff yesterday after concerns over Russia and China prompted Nvidia to issue lower-than-expected guidance for second-quarter revenue.

    Continue reading
  • Another AI supercomputer from HPE: Champollion lands in France
    That's the second in a week following similar system in Munich also aimed at researchers

    HPE is lifting the lid on a new AI supercomputer – the second this week – aimed at building and training larger machine learning models to underpin research.

    Based at HPE's Center of Excellence in Grenoble, France, the new supercomputer is to be named Champollion after the French scholar who made advances in deciphering Egyptian hieroglyphs in the 19th century. It was built in partnership with Nvidia using AMD-based Apollo computer nodes fitted with Nvidia's A100 GPUs.

    Champollion brings together HPC and purpose-built AI technologies to train machine learning models at scale and unlock results faster, HPE said. HPE already provides HPC and AI resources from its Grenoble facilities for customers, and the broader research community to access, and said it plans to provide access to Champollion for scientists and engineers globally to accelerate testing of their AI models and research.

    Continue reading
  • Workday nearly doubles losses as waves of deals pushed back
    Figures disappoint analysts as SaaSy HR and finance application vendor navigates economic uncertainty

    HR and finance application vendor Workday's CEO, Aneel Bhusri, confirmed deal wins expected for the three-month period ending April 30 were being pushed back until later in 2022.

    The SaaS company boss was speaking as Workday recorded an operating loss of $72.8 million in its first quarter [PDF] of fiscal '23, nearly double the $38.3 million loss recorded for the same period a year earlier. Workday also saw revenue increase to $1.43 billion in the period, up 22 percent year-on-year.

    However, the company increased its revenue guidance for the full financial year. It said revenues would be between $5.537 billion and $5.557 billion, an increase of 22 percent on earlier estimates.

    Continue reading

Biting the hand that feeds IT © 1998–2022