When Intel announced earlier this year that its 7nm process technology would be delayed, it impacted Aurora, the first Intel-based exascale supercomputer. There was no clear answer back in July, but an official from the US Department of Energy (DoE) confirmed this week that the system will be delayed.
As reported by HPCwire, the DoE does not see this as a major problem as the Argonne National Laboratory, Aurora’s operator, has a contingency plan in place.
Aurora supercomputer delayed
“Yes, we have indications that the Aurora system will be delayed,” said Barb Helland, Office of Science associate director for Advanced Scientific Computing Research (ASCR) at the DoE Office of Science. The executive added that Argonne is working with Intel to “mitigate the consequences not only for Argonne, but also for the Exascale Computing Project and the country’s high-performance computing users”.
(Image credit: Intel)
“It is not unexpected that there will be delays in contracting the world’s most advanced supercomputers 4-5 to 5 years before they go live,” said Helland. “For this reason, we incorporate both cost and time schedules into our project budgets.”
The Aurora supercomputer is based on Intel’s next-generation Xeon processor, codenamed Sapphire Rapids, which runs the Golden Cove microarchitecture, and the company’s first GPU, codenamed Ponte Vecchio, which is based on the HPC architecture (Xe High Performance Computing) based.
Sapphire Rapids is manufactured using Intel’s 10nm Enhanced SuperFin process technology, which is expected to be on track for mass production in 2021. The Intel Xe-HPC Ponte Vecchio GPU is a multi-tile chiplet design that uses a base tile made with Intel’s 10nm SuperFin manufacturing technology, a Xe-Link I / O tile made by a foundry was made, a Rambo cache tile made using the 10nm enhanced SuperFin process, and a compute tile that was supposed to use Intel’s 7nm node, which was delayed by about six months. Last month, Intel announced that the Compute Tile can be manufactured in both an external and in-house foundry.
(Image credit: Intel)
Intel has always envisioned Ponte Vecchio as a multi-chiplet product with tiles from different sources. Making a key tile in an off-site foundry is not a problem in and of itself, but adjusting the thermals, tensions, and packaging the design to other parts will take some time. Intel’s Ponte Vecchio is used outside of Aurora, so it makes sense for Intel to manufacture its main compute tile in its own factories at some point. However, this means that there will be two versions of the Xe-HPC Ponte Vecchio GPU.
Each Aurora blade has two Intel Xeon Scalable “Sapphire Rapids” processors and six Intel Xe-HPC “Ponte Vecchio” GPUs. That means mass production of Intel’s data center graphics chips is critical to Aurora.
The first exascale supercomputers
So far, the US DoE has presented three exascale-class supercomputers. Argonne’s National Laboratory’s Aurora was the first system announced in March 2019 and is expected to deliver more than 1 ExaFLOPS output.
Oak Ridge National Lab’s Frontier supercomputer, powered by AMD’s Epyc ‘Milan’ processors and Radeon Instinct MI200 graphics, was unveiled in May 2019 and is expected to deliver 1.5 ExaFLOPS performance in 2021.
In March of this year, the DoE announced the El Capitan system from Lawrence Livermore Lab, which is expected to achieve two ExaFLOPS in 2023 with AMD’s Epyc “Genoa” CPUs and AMD CDNA GPUs.
All three systems use HP Enterprise’s Cray EX architecture, so they have a lot in common. Aurora is the only Intel-enabled supercomputer among the three.
We don’t know when Aurora will arrive, however, and the supercomputer has already suffered a major setback. The system was first announced in 2015 and described as the Intel Xeon Phi “Knights Hill” powered 180 TeraFLOPS supercomputer due in 2018. Since Intel canceled its Knights Hill in 2017, the original Aurora project has also been pushed back.