The Bottom Line:
- Nvidia introduces the Blackwell platform, marking a significant technological advancement in computing with high-performance AI capabilities.
- The platform includes CPUs, GPUs, NVLink, NICs, and switches, all integrated into a complete system for AI supercomputing.
- Future iterations like Blackwell Ultra and the upcoming Reuben platform promise even greater performance improvements on a yearly cycle.
- Innovations in networking, such as Spectrum X Ethernet architecture, address the unique demands of AI data centers, ensuring efficient and fast communication between GPUs.
- Nvidia’s advancements in reducing energy consumption and increasing computational power make previously impossible tasks like large language model training feasible and cost-effective.
Introducing the Blackwell Platform: A Leap in AI Computing
Advancements in AI Computing
this last 60 years we saw several tectonic shifts in Computing where everything changed and we’re about to see that happen again the further performance we drive up the Greater the cost decline Hopper platform of course was the most successful data center processor probably in history however Blackwell is here and every single platform as you’ll notice are several things you got the CPU you have the GPU you have MV link you have to Nick and you have to switch every single generation as you’ll see is not just a GPU but it’s an entire platform we build the entire platform we integrate the entire platform into an AI Factory supercomputer however then we disaggregate it and offer it to the world our basic philosophy is very simple one build the entire data center scale dis agregate it and sell it to you in Parts on a one-year Rhythm and we push everything that technology limits what whatever tsmc processed technology will push it to the absolute limits whatever packaging technology push it to the absolute limits whatever memory technology push it to Absolute limits cies technology Optics technology everything is pushed to the Limit while Blackwell is here next year is Blackwell Ultra just as we had h100 and h200 it’ll probably see some pretty exciting New Generation from us for Blackwell Ultra well this is the very first time and I’m not sure yet whe I’m going to regret this or not we have code names in our company and we try to keep them very secret most of the employees don’t even know but our next Generation platform is called Reuben so we have the Reuben platform and one year later we have the Reuben um Ultra platform all of these chips that I’m showing you here are all in full development 100% of them and the rhythm is one year at the limits of Technology all 100% architecturally compatible so this is this is basically what Nvidia is building and all of the riches of software on top of it so in a lot of ways the last 12 years the company has really transformed tremendously and I want to thank all of our partners here for supporting us every step along the way this is the Nvidia Blackwell platform ladies and gentlemen this is Blackwell this is our production board this is the most complex highest performance computer the world ever made this is the gray CPU and these are you could see each one of these blackw dieses two of them connected together you see that it is the largest D the largest chip the world makes and then we connect two of them together with a 10 terte pers second link and the performance is incredible take a look at this the AI flops uh for each generation has increased by a, times in eight years and so just to compare even Mo’s law at its best of times compared to what Blackwell could do so the amount of computations is incredible and when whenever we bring the computation High the thing that happens is the cost goes down the amount of energy that is used has gone down by 350 times Well Pascal would have taken $1,000 gwatt hours 1,000 gwatt hours means that it would take a gwatt data center the world doesn’t have a gwatt data center but if you had a gigawatt data center it would take a month if you had a 100 watt 100 megawatt data center it would take about a year and that’s the reason why these large language models chat GPT wasn’t possible only eight years ago with Blackwell what used to be 1,000 gwatt hours to three an incredible advance our token generation performance has made it possible for us to drive the energy down by three four 45,000 times okay so Blackwell is just an enormous leap well even so it’s not big enough and so we have to build even larger machines and so the way that we build it is called dgx so this is a dgx Blackwell this has this is air cooled has eight of these gpus inside look at the size of the heat sinks on these gpus about 15 kilow 15,000 watts and completely air cooled this version supports x86 and it’s it goes into the infrastructure that we’ve been shipping Hoppers into however if you would like to have liquid cooling we have a new system it’s based on this board and we call it mgx for modular so this one node has four Blackwell chips and these switches connect every single one of these Blackwells to each other so that we have one giant 72 GPU Blackwell this now looks like one GPU this one GPU has 72 versus the last generation of eight so we increased it by nine times the amount of bandwidth we’ve increased by 18 times the AI flops we’ve increased by 45 times and yet the amount of power is only 10 times this is 100 kilow and that is 10 kilow and that’s for one you could always connect more of these together and I’ll show you how to do that in a second there’s this confusion about what Nvidia does how is it possible that that Nvidia became so big building gpus and so there’s an impression that this is what a GPU looks like now this is a GPU this is one of the most advanced gpus in the world but this is a game GPU but you and I know that this is what a GPU looks like this is one GPU L and gentlemen dgx GPU the back of this GPU is the MV link spine and it’s right here this is an MV link spine and it connects 702 gpus to each other this is a electrical mechanical Miracle the transceivers makes it possible for us to drive the entire length in Copper and as a result this switch Envy link switch driving the envying spine in Copper makes it possible for us to save 20 kilow in one rack 20 Kow could now be used for processing just an incredible achievement so this is the the MV links spine and even this is not big enough even this is not big enough for AI Factory so we have to connect it all together with very high-speed networking well we have two types of networking we have infiniband which has been used uh in supercomputing and AI factories all over the world and it is growing incredibly fast for us however not every data center can handle infiniband because they’ve already invested their ecosystem in ethernet for too long and so what we’ve done is we’ve brought the capabilities of infiniband to the ethernet architecture which is incredibly hard ethernet was designed for high average throughput because every single note every single computer is connected to a different person on the internet and most of the communications is the data center with somebody on the other side of the internet however deep learning in AI factories the gpus are not communicating with people on the internet they’re communicating with each other because they’re all they’re collecting partial products and they have to reduce it and then redistribute it chunks of partial products reduction redistribution that traffic is incredibly bursty and it is not the average throughput that matters it’s the last arrival that matters is whoever gives me the answer last okay ethernet has no provision for that and so there are several things that we had to create we created an end to-end architecture so that the the Nick and the switch can communicate and we applied four different Technologies to make this possible number one Nvidia has the world’s most advanced RDMA and so now we have the ability to have a network level RDMA for ethernet that is incredibly great number two we have congestion control the switch does telemet Tre at all times incredibly fast and whenever the Knicks are sending too much information we can tell them to back off so that it doesn’t create hotspots number three adaptive routing ethernet needs to transmit and receive in order we see congestions or we see uh ports that are not currently being used irrespective of the ordering we will send it to the available ports and Bluefield on the other end reorders it so that it comes back in order that adaptive routing incredibly powerful and then lastly noise isolation there’s more than one model being trained or something causes the last arrival to end up too late it really slows down to training well overall remember you have built A5 billion or3 billion Data Center and you’re using this for training if the training time was 20% longer the $5 billion data center is effectively like a $6 billion data center so the cost impact is quite High ethernet with Spectrum X basically allows us to improve the performance so much that the network is basically free and so this is really quite an achievement we’re very we have a whole pipeline of ethernet products behind us this is Spectrum x800 it is uh 51.2 terabits per second the next one coming is 512 Ric is one year from now 512 Ric and that’s called Spectrum x800 Ultra and the one after that is x1600 but the important idea is this x800 is designed for tens of thousands of gpus x800 ultra is designed for hundreds of thousands of gpus and x1600 is designed for millions of gpus the days of millions of GPU data centers are coming and the reason for that is very simple of course we want to train much larger models but very importantly in the future almost every interaction you have with the Internet or with a computer will likely have a generative AI running in the cloud somewhere and that generative AI is working with you interacting with you generating videos or images or text or maybe a digital human and so you’re interacting with your computer almost all the time and there’s always a generative AI connected to that some of it is on Prem some of it is on your device and a lot of it could be in the cloud these generative AIS will also do a lot of reasoning capability instead of just one shot answers they might iterate on answers so that it improve the quality of the answer before they give it to you and so the amount of generation we’re going to do in the future is going to be extraordinar”.
Comprehensive AI Supercomputing with Integrated CPUs, GPUs, and More
Advancements in AI Supercomputing
The Blackwell platform represents a significant advancement in AI supercomputing technology. Unlike traditional platforms that focus solely on GPUs, Blackwell integrates CPUs, GPUs, MV link, Nick, and switches into a comprehensive platform. This integration allows for enhanced performance and efficiency in AI computing.
Platform Scalability and Innovation
The Blackwell platform is designed to scale efficiently, with a year-on-year release cycle that pushes the boundaries of technology limits. Each generation, such as Blackwell Ultra and Reuben platform, is in active development, ensuring architectural compatibility and cutting-edge advancements. This iterative approach ensures that Nvidia stays at the forefront of AI computing technology.
Network Infrastructure and Future Developments
In addition to hardware innovations, Nvidia has focused on enhancing network capabilities for AI factories. By integrating advanced RDMA, congestion control, adaptive routing, and noise isolation technologies into Ethernet architecture, they have optimized data transfer for AI workloads. The Spectrum x800 series further emphasizes Nvidia’s commitment to creating high-performance networking solutions for large-scale GPU data centers.
Future Platforms: Blackwell Ultra and Reuben Promise Annual Performance Gains
Advancements in AI Supercomputing
The Blackwell platform represents a significant advancement in AI supercomputing technology. Unlike traditional platforms that focus solely on GPUs, Blackwell integrates CPUs, GPUs, MV link, Nick, and switches into a comprehensive platform. This integration allows for enhanced performance and efficiency in AI computing.
Platform Scalability and Innovation
The Blackwell platform is designed to scale efficiently, with a year-on-year release cycle that pushes the boundaries of technology limits. Each generation, such as Blackwell Ultra and Reuben platform, is in active development, ensuring architectural compatibility and cutting-edge advancements. This iterative approach ensures that Nvidia stays at the forefront of AI computing technology.
Network Infrastructure and Future Developments
In addition to hardware innovations, Nvidia has focused on enhancing network capabilities for AI factories. By integrating advanced RDMA, congestion control, adaptive routing, and noise isolation technologies into Ethernet architecture, they have optimized data transfer for AI workloads. The Spectrum x800 series further emphasizes Nvidia’s commitment to creating high-performance networking solutions for large-scale GPU data centers.
Spectrum X Ethernet Architecture: Revolutionizing AI Data Center Networking
The Spectrum X Ethernet Architecture represents a significant advancement in AI data center networking technology. Nvidia has integrated advanced RDMA, congestion control, adaptive routing, and noise isolation technologies into Ethernet architecture to optimize data transfer for AI workloads.
This innovation allows for efficient communication between GPUs within AI factories and enables high-performance networking solutions for large-scale GPU data centers. The development of the Spectrum x800 series showcases Nvidia’s commitment to enhancing network capabilities and advancing the field of AI supercomputing.
Enhancing AI Efficiency: Energy Reduction and Computational Power Boosts
The latest advancements in AI computing technology are showcased in the Blackwell platform, which integrates CPUs, GPUs, MV link, Nick, and switches to enhance performance and efficiency. The platform is designed for scalability and innovation, with ongoing development of generations like Blackwell Ultra and Reuben to ensure compatibility and cutting-edge progress. In addition to hardware improvements, Nvidia has focused on optimizing network capabilities for AI factories by integrating advanced technologies like RDMA, congestion control, adaptive routing, and noise isolation into Ethernet architecture. These enhancements aim to facilitate efficient data transfer for AI workloads and demonstrate Nvidia’s commitment to advancing AI supercomputing infrastructure.