Some HPC kids just can't help piling on the weight
Does huge cluster equal huge reward?
HPC Blog We've saved the biggest for last in our round-up of the ASC16 Student Cluster Competition competitors. All of the teams below are driving clusters with more than nine nodes, most with several GPU accelerators. They're definitely power hungry beasts, which will require the teams to apply heavy throttling in order to stay under the 3,000 watt power cap. Let's take a look at the teams....
EAFIT University (Columbia): Another multiple-time cluster competition entrant is EAFIT from Columbia. Our interviewee is having his 20th birthday on the first day of the cluster competition, so he gets a fair amount of abuse from me and his team over this.
The team has decided to run a traditional cluster, composed of eight two-processor nodes. They were mulling over using Intel Phi co-processors, but felt that the power draw of the accelerators was too much when compared to the extra oomph they’d give this particular set of applications. Will their old school cluster have enough power to push them into the victory circle? We’ll soon find out.
National Tsing Hua University (Taiwan): At the time of this interview, things are going well for Team Taiwan. They’re running a cluster of nine nodes with eight NIVIDA K40 GPU accelerators. Their hardware is working fine and they're pushing to optimize the benchmarks and apps they'll be running over the next two days.
The language barrier and high noise in the room play a role in this interview, even with the help of our translator Wallis. So it’s a quick check in, a few jokes on my part that fall completely flat, and we’re off to the next team.
Sun Yat-Sen University: This is the fourth time Sun Yat-Sen has competed in the ASC competition. They snagged the LINPACK award in 2014, but haven’t made the trip to the big dance at ISC yet.
Right now, they’re planning to run eight or ten nodes with GPUs, but that’s going to be a tall order to run that much gear and keep it under the 3,000 watt power cap.
The team has managed to make MANSUM_WAN scale better than the original code, which is certainly a feather in their collect caps and will help them when the competition begins tomorrow.
In the video, I managed to throw off my #1 interpreter Wallis with the phrase “…a brutal initiation into the world of student clustering…..” This was about the only time I managed to catch her on camera, she was pretty shy.
She got her revenge on me later in the day by serving me an extra hot glass of Tang (those of you who have been to China probably will know what I’m talking about here.)
Tsinghua University: The team from Tsinghua University certainly needs no introduction to cluster competition aficionados. This is the team that has won ASC and ISC in the past, and was also the first team to win all three major international competitions (ASC, ISC, SC) in the same calendar year (2015).
But the team captain for Tsinghua says it’s becoming more and more difficult to be a champion due to the other teams getting much more skilled over time. Plus these teams are snagging better equipment as well.
But the team is doing well at this point in the competition, aside from a few problems with Fortran. They feel confident that their ten node, five GPU cluster and their tuning skills will give them the edge over the other competitors. If I were a betting man, I’d put my money on Tsinghua to either win or place when the scores are totaled up.
Ural Federal University: When we catch up with Team Russia they’re working on the DNN application. This is where they use a Deep Neural Network on Tianhe-2, which is quite the complicated task.
Team Russia has moved up from a tiny cluster system in 2014 (only three nodes) to one of the largest systems in the competition with a ten node, four Phi monster. From a video standpoint, the team was the best arrayed of the bunch, all sitting or standing behind their table – it made for a great video shoot.
Beihang University of Aeronautics and Astronautics: We talked to the advisor of the Beihang University of Aeronautics and Astronautics (Team Beihang) about how the team is shaping up so far in the competition.
According to him, this competition is a completely new experience for his team – learning cluster basics from the ground up. Then, given the demands of the competition, they have to learn about the intricacies of the various benchmarks and applications – which is a LOT to learn in the three months they’ve been working together. With the team’s inexperience in mind, they’re going with a traditional CPU-only cluster, but a big one, with 11 nodes.
National University of Defense Technology: NUDT is a name that will resonate with student cluster fans around the world. They’ve participated in each ASC competition, plus the SC and ISC international competitions.
They first made their bones as LINPACK and GPU jockeys, owning a couple of LINPACK records in their early days. This year, they’re taking a different tact with a 12 node system (the largest in the competition) that’s sporting dual NVIDIA K80 accelerators.
One note of concern for Team NUDT is an admitted lack of preparation due to the rigorous physical training they have to undergo every day. So while their cardio is definitely up to spec, their clustering is suffering a bit. One case in point is their inexperience with infiniband, which is keeping them from getting RDMA working properly.
With this blog, we've met all 16 teams in the ASC16 cluster competition. Now it's time to get to the meat of the competition - the results. First up, LINPACK. Stay tuned....