TSMC recently demonstrated a technology that could very well turn out to be the harbinger of exponential performance improvement for both Radeon and GeForce GPUs. Wafer on Wafer tech solves one of the larger problems with building MCMs right now. Part of the problem with creating an MCM based GPU (or CPU for that matter) is the latency that is introduced between the different GPU clusters.
TSMC just made building large MCM based GPUs much easier
Allow me to elaborate a bit. Currently, we can build an MCM using an interposer and a custom interconnect. The different wafers are placed laterally alongside each other and are connected with an interconnect that facilitates communication between them. This introduces latency and means that the MCM will only ever be as fast as the interconnect allows. TSMC has proposed a far better solution, however: the use of TSV (through silicon vias) to stack dies on top of each other (like NAND and DRAM) allowing for much much faster interfacing as well as potentially much larger core count GPUs.
We have only ever seen single die based GPUs (unless you count the cross fire/ sli ones) and part of the reason for that was that MCM was extremely difficult and the interposer itself had horrible yields. TSMC’s solution could allow both AMD and NVIDIA to cheaply stack dies on top of each other providing for an exponential increase in core count with no additional technological progress required. In fact, the only issue we can see with this technique is how the TDP envelop will almost certainly shrink at a corresponding rate as well – meaning any GPUs created using this method will be clocked lower and have lower voltage thresholds.
Of course, this doesn’t solve the architectural problem, the underlying architecture has to be capable of holding twice the amount of cores and while this is possible in cases like Pascal, it might not be in cases like GCN (which we know only scales to a maximum core count of 4096 SP). At this point in time the technology is not ready but the mere fact that TSMC has announced it means that at some point in time it will likely start offering the technique to its customers (probably on the mature 16nm FinFET process) and both NVIDIA and AMD will have the option of creating much more powerful GPUs by stacking two lower tier ones.
Exploring the multi-chip module die philosophy for GPUs
Here’s the thing however, AMD has proven itself to be exceptionally good at creating MCM based products. The Threadripper series (the 1920X and 1950X at any rate) were absolutely disruptive to the HEDT market space. They single-handedly turned what was usually a 6-core and very expensive affair to a 16 core affordable combo. The power of servers and Xeons was finally in the hands of average consumers, so why can’t the same philosophy work for GPUs as well?
Well, theoretically speaking, it should work better in all regards for GPUs which are parallel devices than for CPUs which are serial devices. Not only that but you are looking at massive yield gains from just shifting to an MCM based approach instead of a monolithic die. A single huge die has abysmal yields, is expensive to produce and usually has high wastage. Multiple chips totaling the same die size would offer yield increases straight off the bat.
I took the liberty to do some rough approximations using the lovely Silicon Edge tool and was not surprised to see instant yield gains. The Vega 64 has a die measuring 484mm² which equates to a die measuring 22mm² by 22mm². Splitting this monolithic die into 4x 11mm² by 11² gives you the same net surface area (484mm²) and will also result in yield gains. How much? Let’s see. According to the approximation, a 200mm wafer should be able to produce 45 monolithic dies (22×22) or 202 smaller dies (11×11). Since we need 4 smaller dies to equal 1 monolithic part, we end up with 50 484mm² MCM dies. That’s a yield gain of 11% right there.
The yield gains are even larger for bigger chips. The upper limit of lithographic techniques (with reasonable yields) is roughly 625mm². On a single 200mm wafer, we can get about 33 of these (25×25) or 154 smaller dies (12.5×12.5). That gives us a total of 38 MCM based dies for a yield increase of 15%. Now full disclosure, this is a very rough approximation and does not take into account several factors such as packaging yields, complicated high-level design, etc but the basic idea holds well. But at the same time, it also does not take into account increased gains by lowered wastage – a faulty 625mm² monolithic die is much more wasteful than a single 156mm² one!
Long story short, AMD is perfectly capable of creating an MCM based GPU and would even get some serious yield benefits out of this if it chooses to run with this for Navi. Considering the 7nm node is very much in the early bleeding edge stage, yields can’t be too good even by mid-2018 for very large high-performance ASICs. Switching to smaller dies for an MCM based approach would solve that problem and even allow it to surpass the total 600mm² surface area limitation of monolithic dies. NVIDIA is also actively pursuing this path for the same reasons.