High Bandwidth Memory (HBM): Why GPUs Need It for Machine Learning

September 18, 2025

Introduction

High Bandwidth Memory (HBM) is the primary memory technology used in modern data center GPUs, including NVIDIA’s A100 and H100, as well as AMD’s MI250. Unlike traditional DDR or GDDR memory, HBM is physically stacked right next to the GPU die and connected through a silicon interposer. This design delivers both high capacity and extremely high bandwidth — key requirements for training today’s large-scale machine learning models.

Description

Figure 1: Components of the NVIDIA H100 GPU, including the GPU die, compute cores, as well as the HBM.

Why HBM Matters for Machine Learning

Training deep neural networks involves moving massive amounts of data:

If the memory system cannot supply this data quickly enough, the GPU’s compute units stall, leaving much of the available FLOPs unused. HBM ensures that data flows into the GPU at terabyte-per-second speeds, 3.352 TB/s as shown in figure 1 , keeping CUDA and tensor cores busy and enabling efficient scaling to very large models.

Technical Background: How HBM Works

Performance Metrics

When looking at GPU specifications, two numbers describe HBM performance:

Real-World Implications for ML

Key Takeaways

← Introduction to GPU ArchitectureThe GPU Memory Hierarchy →
More Writing