Intel and Argonne National Lab on “Exascale” and their new Aurora supercomputer – TechCrunch
The scale of supercomputing has gotten almost too big to understand, with millions of compute units performing calculations at rates that for the first time require the exa prefix – meaning quadrillions per second. How was this done? With careful planning … and lots of wires, say two people close to the project.
After hearing the news that Intel and Argonne National Lab were planning to unpack a new exascale computer called the Aurora (one of several being built in the US) earlier this year, I recently had the opportunity to with Trish Damkroger, director of Intel’s Extreme Computing Organization; and Rick Stevens, Argonne’s assistant lab director for computers, the environment and life sciences.
The two discussed the technical details of the system at the supercomputing conference in Denver, which is probably where most of the people who can truly say they already understand this type of work were. While you can read about the nuts and bolts of the system in trade magazines and the press release, including Intel’s new Xe architecture and Ponte Vecchio’s general purpose computer chip, I’ve been trying to get a better picture of the two.
It should come as no surprise to anyone that this is a project that has been in the works for a long time – but you can’t guess exactly how long: more than a decade. So part of the challenge was to establish computer hardware that was miles above what was possible at the time.
“Exascale was launched in 2007 for the first time. At that point, we hadn’t even hit the petascale target, so we were planning three to four orders of magnitude, ”said Stevens. “If we had Exascale at that time, it would have required a gigawatt of power, which of course is not realistic. So a big part of achieving Exascale was reducing power consumption. “
Intel’s Xe architecture, which is focused on supercomputing, is based on a 7 nanometer process and thus pushes the pinnacle of Newtonian physics – much smaller and quantum effects come into play. But the smaller the gates, the less electricity they use, and microscopic savings add up quickly when you talk about billions and trillions of them.
But that only exposes one other problem: If you increase the performance of a processor by 1000 times, you run into a memory bottleneck. The system can think quickly, but if it cannot access and save data quickly, it makes no sense.
“With exascale-level computing but not exabyte-level bandwidth, you have a very one-sided system,” said Stevens.
And once you’ve removed those two obstacles, you’ll encounter a third one: what is known as concurrency. In high performance computing, it is equally about synchronizing a task between a large number of processing units and about making these units as powerful as possible. The machine works as a whole, and therefore every part has to communicate with every other part – which becomes a problem when it comes to scaling.
“These systems have many thousands of nodes and the nodes have hundreds of cores and the cores have thousands of computational units, so there is billions of parallelism,” said Stevens. “Dealing with it is the core of the architecture.”
As they did, since I was completely unfamiliar with the vagaries of designing high-performance computer architecture, I wouldn’t even try to explain it. But they seem to have made it since these exascale systems go online. The solution, I dare say, is essentially a major advance on the networking side. The level of bandwidth sustained between all of these nodes and entities is staggering.
Make exascale accessible
While it was possible to predict even in 2007 that we would one day achieve such power-saving processes and improved memory bandwidth, other trends would hardly have been foreseeable – for example the exploding demand for AI and machine learning. It wasn’t even a consideration then, and now it would be foolish to develop a high-performance computing system that was not at least partially optimized for machine learning problems.
“By 2023, we expect AI workloads to make up a third of the total HPC server market,” said Damkroger. “This AI-HPC convergence brings these two workloads together to solve problems faster and provide better insights.”
To that end, the Aurora system’s architecture is designed to be flexible while retaining the ability to speed up certain common operations, such as the type of matrix computations that make up a large part of certain machine learning tasks.
“But it’s not just about performance, it’s also about programmability,” she continued. “One of the big challenges with an Exacale machine is being able to write software for that machine. oneAPI will be a uniform programming model – it is based on an open standard from Open Parallel C ++, and that is the key to promoting its use in the community. “
Summit, currently the most powerful single computer system in the world, is very different from many of the system developers that developers are working on. If the developers of a new supercomputer want broad appeal, they have to bring it as close as possible to a “normal” computer.
“Getting x86-based packages to Summit is a challenge,” noted Stevens. “The big advantage for us is that this thing will basically run any software in existence since we have x86 nodes and Intel GPUs. It will run standard software, Linux software, and literally millions of apps. “
I asked about the costs involved, as with a system like this it is a mystery how a half a billion dollar budget is split up. Actually, I just thought it would be interesting to know how much of it flowed into the RAM compared to the processor cores, for example, or how many kilometers of cable they had to run through. Although both Stevens and Damkroger refused to comment, the former noted that “the backlink bandwidth on this machine is many times the total of the entire Internet, and it costs something”. Make it what you want.
Unlike its cousin El Capitan, Aurora is not used for weapons development at the Lawrence Livermore National Lab.
“Argonne is a science laboratory, and it’s an open, not classified, science,” said Stevens. “Our machine is a national user resource; We have people from all over the country using it. Much time is allocated through a process that is peer-reviewed and priced to account for the most interesting projects. About two-thirds are that, and the other third are Department of Energy items, but still unclassified issues. “
Initial work will be in the areas of climate science, chemistry, and data science, with 15 teams signed up for major projects on Aurora – details will be announced shortly.