20180323 - GDC RT Analysis
To be clear, this post is my personal thoughts and analysis of GDC ray-tracing news.
I'm trying to separate reality from marketing hype, and understand what this actually means to me as a cross-platform graphics developer ...
Numbers From Remedy's GDC Slides
Slides Published Here
The above slides, "An example of using NVIDIA RTX ray tracing tech in Northlight"
states explicitly that this demo is NVIDIA's RTX instead of DXR.
So this section will continue the assumption that numbers are representative of Volta-only NVIDIA RTX,
and not necessarily what could be expected from DXR.
Slide 20: "AMBIENT OCCLUSION"
Slide 20: "Single ray per pixel on 1080p is roughly 5ms"
Slide 20: "Maximum ray distance of four meters"
Meaning only a tiny scene.
Estimating peak practical upper bound on view distance via a real world example.
I've watched sunset of the Chicago skyline from the beach at Indiana Dunes
which can be estimated at a 32 KM view distance.
The difference between 4 M to 32 KM is roughly 13 doublings.
In my prior sphere tracing talk
I used scenes which scaled to the limit of 32-bit float precision,
approaching 24 doublings.
Slide 28: "SHADOWS"
Slide 28: "Single ray per pixel on 1080p is under 4ms"
Translates to around ~0.5 billion rays per second for non-shaded coherent shadow rays in the tiny scene.
Numbers from Wikipedia
on capacity of Titan-V:
- 658.8 GB/s
- 14899.2 Gflop/s
- 465.6 Gtex/s
Now amortizing those numbers across ~0.5 billion rays:
Just looking at machine capacity per ray paints a very clear picture:
that tracing even coherent shadow rays without any shading is amazingly expensive using RTX.
For comparison, I typically budget only upwards of say 32 tex/pixel for a "super expensive" filtering operation.
- ~1317 bytes/ray
- ~29798 flops/ray
- ~931 tex/ray
The numbers say to me that I'm personally better off focusing on something better than ray-tracing.
"only Volta and newer architectures have the specific hardware features required for hardware acceleration of DXR/RTX"
Published measured numbers from a pure software Radeon Rays
shows a range of ~0.15 to ~0.8 billion shadow rays/sec on a FirePro W9100.
Comparing the 2nd generation GCN W9100 (2014 Mar) to the Votla Titan V (2017 Dec):
- 2.0x bandwidth for Titan V
- 2.8x flops for Titan V
- 2.8x texture fetch (cache hit) for Titan V
- Three year hardware advantage and "ray tracing engine" for Titan V
Working out estimated range of scaling if W9100 had the base texture hit and ALU capacity of Titan V:
Those numbers and the capacity/ray from prior section suggest that there is no fundamental hardware reason to limit RTX to Volta.
Also NVIDIA's OptiX has been doing ray tracing on pre-Volta hardware for around a decade already.
Reality of Incoherent Rays
- ~0.15 * 2.8x = ~0.4 billion shadow rays/sec (lower bound)
- ~0.83 * 2.8x = ~2.3 billion shadow rays/sec (upper bound)
I didn't see the slides posted yet otherwise I'd quote specifics, but if memory serves,
the GDC 2018 talk on Frostbite's usage of RTX for interactive light map baking
showed numbers dip under 100 million rays/sec for secondary rays on Volta.
shows a similar trend.
Incoherent rays can be an order of magnitude slower than coherent shadow rays.
There is no changing the physical limits of random access to memory.
The Myth That RT Enables a No-Hack Realtime Render
A GPU like a Titan V is in the class of GPUs which are expected to easily do over 4K at 60 Hz on typical content.
Note 4K at 60 Hz is around ~0.5 billion pixels a second.
Remedy's numbers for non-shadow rays show under ~0.5 billion rays/sec.
Meaning quite literally an average of under one ray per pixel for something like 4K at 60 Hz.
Details from SEED's GDC Presentation
"Hack-Free" tracing requires over an order of magnitude more rays/second than is available.
Hacks, or what can be more accurately described as "special-purpose optimizations" are here to stay indefinitely.
Comments Like "GPU Real-Time Ray-Tracing Will Soon Be Practical"
- Hyprid Rendering Pipeline - No one wants to burn the full ray budget just doing primary rays at 4K. So raster hacks remain.
- Reflections - 960x540 resolution for tracing, so up-sampling and noise reduction hacks remain.
- GI - Low-frequency surfels, so hacks to avoid per pixel frequency tracing remain.
- Shadows - One light source, temporal noise reduction, so hacks remain.
IMO we need at least 10x increase in rays/second from where Titan V is,
in combination with a 30x reduction in GPU cost, or roughly a 300x change.
Unfortunately I'll be long dead before that could happen.
- GPUs hit TDP power wall in 2010 (meaning at practical board power limits, no more scaling there)
- GPUs had approached practical economic die size limit in 2010 (meaning no easy scaling from area, without insane costs)
- The remaining scaling, process shrinks, are getting more expensive and are leveling off in perf improvements
- Off-chip bandwidth scaling is hitting an asymptote
- Compression for bandwidth deduction is already being leveraged
- Titan V is 15x to 30x more expensive than a typical mainstream GPU
The next step IMO in understanding the real costs of RT would involve
building out a scene that is typical in games like say Uncharted:
long view distances with a forest moving in the wind, a crowd of animated characters, with lots of particle effects,
and a player which can dynamically interact in the world.
To Quick Sort or Radix Sort?
Ray-tracing is and will always be the "quick sort" of rendering: log-time searching.
For real-time I'd personally rather invest in the "radix sort" of rendering: do better than log-time.
My ask as a developer is simple, expose the actual hardware,
without wrapping it into a giant software emulation layer for ray-tracing,
as there are likely even better non-log-time (aka actually real-time) use cases.
Lastly, how should I be interpreting a single-hardware-vendor GDC launch of a new DX API running
as Remedy's GDC slides make clear, in a single-vendor closed-to-modification Gameworks RTX API
being labeled as DXR, wrapped with marketing of a benchmark vendor as one of the few early adopters?