1985-2017 Industy Retrospective
The graph below tracks major industry trends which are all normalized to 2017 values,
pix = peak pixels on the highest end display
freq = peak clock frequency shipped (CPU dominated)
trans = peak transistors/chip (CPU dominated)
mm2 = peak chip size
nm = process
ops = operations/second (GPU dominated)
os = operating system install size
bw = peak bandwidth (GPU dominated)
vram = amount of GPU memory
tdp = single GPU power usage
Peak Chip Size
Single-chip area had roughly hit the practical limit around 2010.
Single GPU Power Consumption
Also had mostly leveled off around 2010.
The later bump in the graph is FuryX which broke the 250 Watt convention and went to 275.
Quickly reaching the asymptotic trend line.
In the future there is another bump with the switch to EUV
Last bump in the graph is the 5GHz AMD FX-9590 CPU.
Frequency exploded and hit the practical limit in the CPU space due to Pentium 4 in early 2000.
Once this single thread frequency wall was hit,
it took around 5 years for parallel machines in the form of GPUs to start to take off.
GPU frequency scaling is not represented in the graph,
and there has been a near term bump in GPU frequency.
The transistor line is CPU dominated.
Interesting to see the massive local increase after 2005,
which happened before GPU parallel performance started to really take off.
The near term GPU performance has been roughly scaling with transistor count.
While many physical limits are being reached,
peak spec performance has continued to grow almost linearly in the past 7 years.
Hints that the industry is still quite limited by time to optimize in the product release cycle,
and that there is a lot of slop left to improve upon even after physical limits are reached.
Another trend not captured in these graphs is actual GPU application scaling,
which has not always been keeping with this trend line
(ie a GPU which is 2x as fast in raw specs not getting 2x the frame rate for same resolution when not CPU bound).
The pixels/op exploded early on CRTs for workstation usage.
The 24" Sony GDM-W900 had capacity for 1920x1200 in 1996!
However the vast majority of non-workstation real-time rendering content was done at much lower than peak workstation display resolution,
and CRTs worked great for variable resolutions.
Around 8-9 years back, GPUs caught up to peak resolution trends
and both display OEMs and GPU IHVs have been keeping roughly in sync in terms of ops/pixel.
This trend has resulted in the current pixel quality stagnation.
OS Install Size
Start of the complexity explosion (major compounding of technical debt) happened in the time between Windows 95 and Windows XP.
This trend is growing almost exponentially now.
IMO this trend line is also tracking with the frustration developers and users feel with using computers.
The 1980's and early 1990's were a lot of fun in comparison with the complexity grid-lock which is everywhere now.
Future GPU Scaling
Still have known process scaling.
Which means that the status quo does not change much in the near term.
As process scaling slows down, there will ultimately be an interesting push to do more with less transistors,
without breaking API-level backwards compatibility.
Interesting to look at some numbers of where GPUs are now in terms of transistors per operation capacity of the machine.
Using the Fury Nano because I've got the ratios memorized,
4096 bytes/clock (through vector L1$)
And if the chip is divided into 4096 pieces, the chip has around 2.17 million transistors/piece.
To place complexity (ie transistors/operation) into context, according to Wikipedia
1.180 million transistors - 80486 (1989, has tiny L1 cache and floating point unit)
0.275 million transistors - 80386 (1985)
0.025 million transistors - ARM 1 (1985)
So roughly 2 orders of magnitude difference from ARM 1 to the amortized structures required for one lane of computation on the GPU.
As single-core CPUs became constrained by physical limits,
GPUs will also likely become constrained in the not too distant future after approaching the limit of design optimization,
and I believe this will ultimately force the next disruptive revolution in computing.
Disruptive in that it enables a non-backward-compatible new method to solve problems to establish itself.
GPUs are effectively relatively-small-banked hierarchical-cache based parallel SIMD machines with external memories.
All GPU shaders are designed around this paradigm.
Mass simplification and merging ALU and MEM might enable over an order of magnitude in increased ALU capacity given a fixed process,
but would ultimately never be something which could efficietly implement a current graphics API...