[
Nvidia's important H100 AI chip has made it a multitrillion-dollar firm, price greater than Alphabet and Amazon, and opponents are struggling to catch up. However maybe Nvidia is about to increase its lead with the brand new Blackwell B200 GPU and GB200 “superchip.”
Nvidia says the brand new B200 GPU provides as much as 20 petaflops With FP4 horsepower from its 208 billion transistors and a GB200 that mixes these two GPUs with a Grace CPU, LLM can supply 30x the efficiency for inference workloads, in addition to doubtlessly being considerably extra environment friendly. Nvidia says it reduces price and vitality consumption by as much as 25 occasions over the H100.
On the GPT-3 LLM benchmark with 175 billion parameters, Nvidia says the GB200's efficiency is a modest seven occasions that of the H100, and Nvidia says it delivers 4x the coaching speedup.
Nvidia advised reporters that one of many most important variations is a second-generation Transformer Engine that doubles the computation, bandwidth, and mannequin measurement by utilizing 4 bits as an alternative of eight for every neuron (thus, the FP4's 20 petaflops which I discussed earlier). The opposite key distinction solely comes once you hyperlink giant numbers of those GPUs in a server: a next-generation NVLink swap that lets 576 GPUs discuss to one another with as much as 1.8 terabytes per second of bidirectional bandwidth.
This required Nvidia to construct a completely new community swap chip, with 50 billion transistors and a few of its personal on-board compute: 3.6 teraflops of FP8, Nvidia says.
Beforehand, Nvidia says, a cluster of simply 16 GPUs spent 60 p.c of its time speaking with one another and solely 40 p.c really computing.
After all, Nvidia is relying on corporations shopping for these GPUs in giant portions, and packing them into bigger supercomputer-ready designs just like the GB200 NVL72, which packs 36 CPUs and 72 GPUs right into a single liquid-cooled processor, for a complete of 720 GB. Plugs into the rack. petaflops of AI coaching efficiency or 1,440 petaflops (aka 1.4 exaflops) of estimation.
Every tray within the rack comprises both two GB200 chips or two NVLink switches, with 18 of the previous and 9 of the latter per rack. In complete, Nvidia says certainly one of these racks can help 27-trillion parameter fashions. GPT-4 is rumored to be a roughly 1.7-trillion parameter mannequin.
The corporate says Amazon, Google, Microsoft and Oracle are already planning to introduce NVL72 racks into their cloud service choices, although it's not clear what number of they’re buying.
And naturally, Nvidia is glad to supply the remainder of the options to corporations as effectively. Right here's the DGX SuperPod for the DGX GB200, which mixes eight programs into one for a complete of 288 CPUs, 576 GPUs, 240 TB of reminiscence, and 11.5 exaflops of FP4 computing.
Nvidia says its programs can scale to hundreds of GB200 Superchips, related with 800Gbps networking with its new Quantum-X800 InfiniBand (for as much as 144 connections) or Spectrum-X800 Ethernet (for as much as 64 connections).