AMD-Powered Frontier Supercomputer Cracks the Exascale Barrier, Now Fastest in the World

The AMD-powered Frontier supercomputer is the first officially recognized exascale supercomputer globally, topping 1.102 ExaFlop/s during a sustained Linpack run.

That ranks foremost on the newly-released Top500 list of the world’s speediest supercomputers as the number of AMD-powered systems on the list has grown significantly this year.

Frontier not only surpasses the previous leader, Japan’s Fugaku but muffs it out of the water Frontier is quicker than the next seven supercomputers on the list merged. Notably, while Frontier hit 1.1 ExaFlops while in a sustained Linpack FP64 benchmark, the system provides up to 1.69 ExaFlops in peak performance but has an area to hit 2 ExaFlops after more tuning. For consideration, one ExaFlop equals one quintillion floating point functions per second. 

Frontier also now ranks as the most instantaneous AI system on the planet, blabbing out 6.88 ExaFlops of mixed-precision interpretation in the HPL-AI benchmark. That correlates to 68 million instructions per second for every 86 billion neurons in the brain, highlighting the sheer computational horsepower. This system will contest the AI leadership role with newly-revealed AI-focused supercomputers powered by Nvidia’s Grace CPU Arm-based Superchips.

Further, the Frontier Test and Development (Crusher) system also rated first on the Green500, denoting that Frontier’s architecture is now the most power-efficient supercomputing architecture globally (the primary Frontier system ranks second on the Top500). The entire system delivered 52.23 GFlops per watt during the qualifying benchmark run while consuming 21.1 MW (megawatts) of power. At peak utilization, Frontier consumes 29 MW.

The Frontier supercomputer’s sheer scale is breathtaking. Still, it is just one of many accomplishments for AMD in this year’s Top500 list — AMD EPYC-powered systems now comprise five of the top ten supercomputers in the world and ten of the top twenty. AMD’s EPYC is currently in 94 of the Top500 supercomputers globally, marking a steady increase over the 73 systems in November 2021 and the 49 in June 2021. AMD also emerged in more than half of the new systems this year. Intel CPUs still populate most plans on the Top500, while Nvidia GPUs continue as the dominant accelerator.

However, in words of power efficiency, AMD reigns ultimate in the latest Green500 list — the company powers the four most efficient systems globally and has eight of the top ten and 17 of the top 20 spots.

The Frontier supercomputer is constructed by HPE and is established at the Department of Energy’s (DOE) Oak Ridge National Laboratory (ORNL) in Tennessee. The system features 9,408 compute nodes, everyone having a 64-core AMD “Trento” CPU paired with 512 GB of DDR4 memory and four AMD Radeon Instinct MI250X GPUs. Those nodes are distributed among 74 HPE Cray EX cabinets, weighing 8,000 pounds. In addition, the system contains 602,112 CPU cores tied to 4.6 petabytes of DDR4 memory.

Additionally, the 37,888 AMD MI250X GPUs feature 8,138,240 cores and contain 4.6 petabytes of HBM memory (128GB per GPU). The CPUs and GPUs are connected using the Ethernet-based HPE Cray Slingshot-11 anetworking fabric. The entire system utilizes direct water-cooling to reign in heat, with 6,000 gallons of water moved through the system by 350-horsepower pumps — these pumps could fill an Olympic-sized swimming pool in 30 minutes. The water in the system runs at a balmy 85 degrees, which helps power efficiency as the system doesn’t use chillers to reduce the water temperature.

The entire system is connected to an insanely performant storage subsystem with 700 petabytes of capacity, 75 TB/s of throughput, and 15 billion IOPS of performance. A metadata tier is spread out over 480 NVMe SSDs that provide 10PB of the overall capacity, while 5,400 NVMe SSDs provide 11.5PB of capacity for the primary high-speed storage tier. Meanwhile, 47,700 PMR hard drives provide 679PB of capacity.

Assembling Frontier was a challenge, as ORNL had to source 60 million parts with 685 different part numbers to build the system. The chip shortage hit during construction, impacting 167 of those part numbers, so ORNL found itself short two million parts. AMD also ran into issues as 15 part numbers for its MI200 GPUs encountered shortages. To help circumvent the needs, ORNL worked with the ASCR to get Defense Priorities and Allocation System (DPAS) ratings for those parts, meaning the US government invoked the Defense Act to procure the elements due to Frontier’s importance to national defense.

Even though the system currently peaks at 29 MW of power, Frontier’s mechanical plant can cool up to 40 MW of computational power, equivalent to 30,000 US homes. In addition, the plant can be expanded up to 70 MW, leaving room for future growth.

While Frontier gets the nod for the first officially-recognized Exascale supercomputer globally, China is primarily thought to have two Exacscale supercomputers, the Tianhe-3 and OceanLight, that broke the barrier a year ago. Unfortunately, those systems haven’t been submitted to the Top500 committee due to political tensions between the US and China. However, the lack of official submissions to the Top500 — a Gordon Bell submission was tendered as a proxy — has led to some doubt that these are accurate exascale systems, at least as measured with an FP64 workload.

For now, Frontier is officially the fastest supercomputer in the world and the first to formally break the exascale barrier. The nearly-mythical, oft-delayed Intel-powered Aurora is expected to come online later this year, or early next year, with up to 2 ExaFlops of performance, rivaling Frontier for the top spot in the supercomputing rankings.