Nvidia Builds a Monster GPU for AI

Nvidia officially launched its next generation Pascal GPU, dubbed P100, at its own Graphics Tech Conference this past week. It’s definitely a monster — 15 billion transistors and 600mm2  despite being manufactured on 16nm FinFET process. I’m not going to dive into the details of the GPU, since the speeds and feeds can be found elsewhere. What’s interesting to me are the implications of Pascal on the future of graphics cards.

Nvidia positions the new GPU as ideal for high performance computer (HPC) applications — deep learning AI applications in particular. CEO Jen-Hsun Huang waxed lyrical about the potential of Pascal to improve performance in deep learning applications. Adding dedicated FP16 support helps, as well as the massive number of shader ALUs (what Nvidia calls CUDA cores), and the higher clock frequency compared to Maxwell. Pascal also builds in double-precision floating point, something that the company left out of its recent Maxwell GPUs.

Neither Nvidia nor its manufacturing partner TSMC has discussed yields, but it’s noteworthy that the version of Pascal shipping late this year only enables 56 out of the 60 SMs (streaming multiprocessors) out of the 60 laid out on the die. Pascal clearly represents the most ambitious semiconductor ever built. Nvdia’s done this with initial deliveries of previous-generation GPUs, so that’s not unprecedented.

Pascal's SM includes fewer shader ALUs per SM, but keeps most other resources constant.
Pascal’s SM includes fewer shader ALUs per SM, but keeps most other resources constant.

Notably absent at GTC: any discussion of Pascal in graphics cards. Any time anyone asked an Nvidia rep about graphics cards, they got stonewalled with the “we don’t discuss unannounced products” mantra. Discussion of graphics features, such as ROPs, texture units, and other rendering features were MIA, though the block diagram of Pascal’s SM clearly shows texture units. So let’s do a little speculating, shall we?

In the past, Nvidia initailly would ship versions of its high-end GPU with features disabled into consumer graphics cards. This allowed them to use chips where flaws may have existed in, for example, the double-precision floating point units could be repurposed for consumer cards. As they refined the existing design and as manufacturing improved, disabling a feature become as much a marketing call as a technical one.

Maxwell’s launch took a different direction. The first Maxwell GPU, the GM 108 aka the GTX 750, was a relatively low-end GPU, and lacked some of features which appeared in later iterations. The second-generation GM204 GPU shipped in mid-2015, and included 5.2 billion transistors at a 398mm2 die size. Pascal is a different beast entirely, and marks a return to Nvidia launching at the ultra-high-end first.

It’s certainly possible Nvidia could take lower-quality P100 parts and turn them into consumer cards. If P100 yields are low, this makes some sense. However, Pascal’s architecture continues the company’s design trend of shrinking the base building block — the SM — which enables more granular modularity. Given Pascal’s long gestation, Nvidia could very well have cut-down versions already in the works. A 48SM version might be roughly 480mm2, while a 42SM chip would be 420mm2. Both are larger than the GM204’s roughly 400mm2, but smaller than the GM200 chip, also 600mm2 (though at 28nm) and with a mere 8 billion transistors.

What may be more an issue is that 15 billion transistor number. Minimizing defects in semiconductors has improved radically over the years, but 15 billion is still a damn big number. These are all back-of-the envelope numbers based on the existing design. If Nvidia leaves out actual features — double-precision FPUs, for example, a Pascal GPU could be even smaller.

I’ve also seen speculation that Nvidia’s consumer Pascal would use standard GDDR5 or the newer GDDR5x instead of HBM2 memory, but I’m not entirely convinced they’d do this in a high-end gaming card. After all, archrival AMD’s been shipping HBM-enabled graphics cards for months now. AMD’s already showing working Polaris-based graphics cards, suggesting that AMD’s next gen cards aren’t far from shipping.

So it’s my guess that we may see a Pascal-based graphics card before the end of the year, but it won’t be based on P100.  Nvidia has the incremental  resources to co-develop a separate chip alongside P100 for the volume market, which the company depends upon for cash flow.

I also could be completely wrong, however, and it wouldn’t be the first time. But the math seems to make sense.


Leave a Reply