Aurora’s issues put Frontier in pole exascale place

Intel’s 7nm node delay has raised questions about the status of the Aurora supercomputer, which should be deployed at Argonne National Laboratory next year. Aurora was about to be the U.S.’s first exascale supercomputer, despite coinciding with the Oak Ridge National Lab’s Frontier supercomputer (both systems were scheduled to ship within 2021).

With a one-year delay on Intel’s 7nm node built into Aurora’s GPU engine (the Intel Xe-based Ponte Vecchio), would Intel use an outside foundry to make the GPU chip? And how would that affect Aurora’s speed, advancement, and delivery schedule?

We don’t have all of these answers yet, but we have received broad confirmation of the disruption from the DOE Office of Science.

There’s evidence that Aurora will indeed be delayed, but Frontier at Oak Ridge National Laboratory is on the right track, as is the Exascale Computing project, reported Barb Helland, assistant director of the Office of Science for Advanced Scientific Computing Research (ASCR) ) during an Advanced Meeting of the Scientific Computing Advisory Committee (ASCAC) held last week (September 24-25).

Aurora knot design as featured by Intel’s Raja Koduri at SC19

“It is not unexpected that there will be delays in the timing of contracts for the world’s most advanced supercomputers four to five years before they go live,” said Helland. “For this reason, we are building both cost and schedule contingents into our project budgets.”

The DOE Office of Science was not ready to provide further details at this point, but stated that it was working closely with Intel.

“Yes, we have indications that the Aurora system is delayed. Currently, however, Argonne is working with Intel to mitigate the consequences not only for Argonne, but also for the Exascale Computing Project and the country’s high-performance computing users. “

Helland seemed to downplay the setback, but reiterated that Oak Ridge’s Frontier machine is on track to ship in calendar year 2021 and that the ECP project is also on track to be completed on time (by to the fourth quarter of fiscal year 24 outside).

“I am confident that we can do this in a way that solves this problem for the benefit of the country and the program,” said Chris Fall, director of the Office of Science. “We’re still having talks and we’re figuring out the details, but I’m very comfortable. I think we will get where we need it. “

It is reasonable that systems that exceed the boundaries of scope and scale encounter unforeseen circumstances that affect the target goal posts. However, Aurora has already been significantly redefined in Intel’s roadmaps after previous delays and cancellations. Originally conceived as a pre-exascale supercomputer to be deployed in Argonne in 2018, Aurora was reoccupied in 2017 as the country’s first exascale machine with a target for 2021.

It appears that the Oak Ridge National Lab’s Frontier supercomputer is now the country’s first exascale system. The DOE is working with Oak Ridge, HPE and AMD to hit the 1.5 exaflops (minimum peak) limit by the end of 2021. Lawrence Livermore Lab’s El Capitan system (slated to ship 2 Exaflops Peak with HPE and AMD technology) is slated for about a year later (shipping early 2023). The question is: where will Aurora fit in the timeline?

HPE Cray EX Supercomputing is the basis of all three planned exascale systems. HPE is the prime contractor for Frontier and El Capitan, while Intel is the prime contractor for Aurora.

In a statement to HPCwire, Intel said it “remains committed to shipping the Aurora supercomputer to Argonne National Laboratory and enabling exascal leadership in the US Department of Energy.”

