MLCommons releases latest MLPerf Training benchmark results

Ryan Daws is a senior editor at TechForge Media, with a seasoned background spanning over a decade in tech journalism. His expertise lies in identifying the latest technological trends, dissecting complex topics, and weaving compelling narratives around the most cutting-edge developments. His articles and interviews with leading industry figures have gained him recognition as a key influencer by organisations such as Onalytica. Publications under his stewardship have since gained recognition from leading analyst houses like Forrester for their performance. Find him on X (@gadget_ry) or Mastodon (

Open engineering consortium MLCommons has released its latest MLPerf Training community benchmark results.

MLPerf Training is a full system benchmark that tests machine learning models, software, and hardware.

The results are split into two divisions: closed and open. Closed submissions are better for comparing like-for-like performance as they use the same reference model to ensure a level playing field. Open submissions, meanwhile, allow participants to submit a variety of models.

In the image classification benchmark, Google is the winner with its preview tpu-v4-6912 system that uses an incredible 1728 AMD Rome processors and 3456 TPU accelerators. Google’s system completed the benchmark in just 23 seconds.

“We showcased the record-setting performance and scalability of our fourth-generation Tensor Processing Units (TPU v4), along with the versatility of our machine learning frameworks and accompanying software stack. Best of all, these capabilities will soon be available to our cloud customers,” Google said.

“We achieved a roughly 1.7x improvement in our top-line submissions compared to last year’s results using new, large-scale TPU v4 Pods with 4,096 TPU v4 chips each. Using 3,456 TPU v4 chips in a single TPU v4 Pod slice, many models that once trained in days or weeks now train in a few seconds.”

Of the systems that are available on-premise, NVIDIA’s dgxa100_n310_ngc21.05_mxnet system came out on top with its 620 AMD EPYC 7742 processors and 2480 NVIDIA A100-SXM4-80GB (400W) accelerators completing the benchmark in 40 seconds.

“In the last 2.5 years since the first MLPerf training benchmark launched, NVIDIA performance has increased by up to 6.5x per GPU, increasing by up to 2.1x with A100 from the last round,” said NVIDIA.

“We demonstrated scaling to 4096 GPUs which enabled us to train all benchmarks in less than 16 minutes and 4 out of 8 in less than a minute. The NVIDIA platform excels in both performance and usability, offering a single leadership platform from data centre to edge to cloud.”

Across the board, MLCommons says that benchmark results have improved by up to 2.1x compared to the last submission round. This shows the incredible advancements that are being made in hardware, software, and system scale.

Victor Bittorf, Co-Chair of the MLPerf Training Working Group, said:

“We’re thrilled to see the continued growth and enthusiasm from the MLPerf community, especially as we’re able to measure significant improvement across the industry with the MLPerf Training benchmark suite.

Congratulations to all of our submitters in this v1.0 round – we’re excited to continue our work together, bringing transparency across machine learning system capabilities.”

For its latest benchmark, MLCommons added two new benchmarks for measuring the performance of performance for speech-to-text and 3D medical imaging. These new benchmarks leverage the following reference models: 

  • Speech-to-Text with RNN-T: RNN-T: Recurrent Neural Network Transducer is an automatic speech recognition (ASR) model that is trained on a subset of LibriSpeech. Given a sequence of speech input, it predicts the corresponding text. RNN-T is MLCommons’ reference model and commonly used in production for speech-to-text systems.
  • 3D Medical Imaging with 3D U-Net: The 3D U-Net architecture is trained on the KiTS 19 dataset to find and segment cancerous cells in the kidneys. The model identifies whether each voxel within a CT scan belongs to a healthy tissue or a tumour, and is representative of many medical imaging tasks.

“The training benchmark suite is at the centre of MLCommon’s mission to push machine learning innovation forward for everyone, and we’re incredibly pleased with the engagement from this round’s submissions,” commented John Tran, Co-Chair of the MLPerf Training Working Group.

The full MLPerf Training benchmark results can be explored here.

(Photo by Alora Griffiths on Unsplash)

Find out more about Digital Transformation Week North America, taking place on November 9-10 2021, a virtual event and conference exploring advanced DTX strategies for a ‘digital everything’ world.

Tags: , , , , , , , , , , ,

View Comments
Leave a comment

Leave a Reply