20140910 - Forward vs Deferred From the Perspective of GPU Limits

EDIT: Mb/ms = in this post mega-byte (not bit).

Starting with specs from my notebook's GPU,
2930 Mflop/ms : 160 Mb/ms : 122 Mtex/ms : 30 Mpix/ms

Normalized for one pixel written to one render target,
97 flop/pix : 5.3 byte/pix : 4 tex/pix : 1 pix

Now looking at GPU capacity during the time for writing to 5 32-bit targets in a G-buffer,
488 flop/pix (244 op/pix) : 26 byte/pix : 20 tex/pix : 5 pix

G-buffer export eats 4*5=20 bytes of bandwidth leaving 6 left (not counting Z and/or stencil, and assuming color is fully uncompressed). This hints at why it is important to source from compressed textures when filling the G-buffer. Also important to note that G-buffer fill suffers from the same exact invocation occupancy problem (quad packing) for small triangles as clustered forward shading.

Next, GPU capacity during the time for reading the G-buffer back for lighting (again going to approximate and skip bandwidth used for fetching Z),
366 flop/pix (183 op/pix) : 20 byte/pix : 15 tex/pix

Adding G-buffer fill and G-buffer readback for lighting, presents a quick estimation of the floor of overhead for deferred shading in this example,
854 flop/pix (427 op/pix) : 46 byte/pix : 35 tex/pix

Massive amount of GPU capacity in the shadow of G-buffer overhead. I believe this is one of the primary reasons some developers really like clustered forward shading (or other modern varients): ability to choose simple shaders and cut minimum shader cost by a factor of 4 or so.

Defining Feature of the PC Platform?
A core of the PC as a platform is ultra high resolution, high framerate, super-sampling, or all of the above at the same time. Forward shading had a lot to do with building this legacy. Low overhead shading with easy SGSSAA driver override. Ability to play games with great pixel quality. This is often what I personally miss most with the current trend of high pixel overhead games.