20180323 - GDC RT Analysis


To be clear, this post is my personal thoughts and analysis of GDC ray-tracing news. I'm trying to separate reality from marketing hype, and understand what this actually means to me as a cross-platform graphics developer ...


Numbers From Remedy's GDC Slides
Slides Published Here
The above slides, "An example of using NVIDIA RTX ray tracing tech in Northlight", states explicitly that this demo is NVIDIA's RTX instead of DXR. So this section will continue the assumption that numbers are representative of Volta-only NVIDIA RTX, and not necessarily what could be expected from DXR.

Slide 20: "AMBIENT OCCLUSION"
Slide 20: "Single ray per pixel on 1080p is roughly 5ms"
Slide 20: "Maximum ray distance of four meters"

Meaning only a tiny scene. Estimating peak practical upper bound on view distance via a real world example. I've watched sunset of the Chicago skyline from the beach at Indiana Dunes which can be estimated at a 32 KM view distance. The difference between 4 M to 32 KM is roughly 13 doublings. In my prior sphere tracing talk I used scenes which scaled to the limit of 32-bit float precision, approaching 24 doublings.

Slide 28: "SHADOWS"
Slide 28: "Single ray per pixel on 1080p is under 4ms"

Translates to around ~0.5 billion rays per second for non-shaded coherent shadow rays in the tiny scene.

Numbers from Wikipedia on capacity of Titan-V:


Now amortizing those numbers across ~0.5 billion rays:


Just looking at machine capacity per ray paints a very clear picture: that tracing even coherent shadow rays without any shading is amazingly expensive using RTX. For comparison, I typically budget only upwards of say 32 tex/pixel for a "super expensive" filtering operation.

The numbers say to me that I'm personally better off focusing on something better than ray-tracing.



Hardware Features?
From Anandtech: "only Volta and newer architectures have the specific hardware features required for hardware acceleration of DXR/RTX"

Published measured numbers from a pure software Radeon Rays shows a range of ~0.15 to ~0.8 billion shadow rays/sec on a FirePro W9100.

Comparing the 2nd generation GCN W9100 (2014 Mar) to the Votla Titan V (2017 Dec):


Working out estimated range of scaling if W9100 had the base texture hit and ALU capacity of Titan V:


Those numbers and the capacity/ray from prior section suggest that there is no fundamental hardware reason to limit RTX to Volta. Also NVIDIA's OptiX has been doing ray tracing on pre-Volta hardware for around a decade already.


Reality of Incoherent Rays
I didn't see the slides posted yet otherwise I'd quote specifics, but if memory serves, the GDC 2018 talk on Frostbite's usage of RTX for interactive light map baking showed numbers dip under 100 million rays/sec for secondary rays on Volta. Radeon Rays shows a similar trend.

Follow the Cash Cache

Incoherent rays can be an order of magnitude slower than coherent shadow rays. There is no changing the physical limits of random access to memory.


The Myth That RT Enables a No-Hack Realtime Render
A GPU like a Titan V is in the class of GPUs which are expected to easily do over 4K at 60 Hz on typical content. Note 4K at 60 Hz is around ~0.5 billion pixels a second. Remedy's numbers for non-shadow rays show under ~0.5 billion rays/sec. Meaning quite literally an average of under one ray per pixel for something like 4K at 60 Hz.

Details from SEED's GDC Presentation:


"Hack-Free" tracing requires over an order of magnitude more rays/second than is available. Hacks, or what can be more accurately described as "special-purpose optimizations" are here to stay indefinitely.


Comments Like "GPU Real-Time Ray-Tracing Will Soon Be Practical"
How so?


IMO we need at least 10x increase in rays/second from where Titan V is, in combination with a 30x reduction in GPU cost, or roughly a 300x change. Unfortunately I'll be long dead before that could happen.

The next step IMO in understanding the real costs of RT would involve building out a scene that is typical in games like say Uncharted: long view distances with a forest moving in the wind, a crowd of animated characters, with lots of particle effects, and a player which can dynamically interact in the world.


To Quick Sort or Radix Sort?
Ray-tracing is and will always be the "quick sort" of rendering: log-time searching.

For real-time I'd personally rather invest in the "radix sort" of rendering: do better than log-time.

My ask as a developer is simple, expose the actual hardware, without wrapping it into a giant software emulation layer for ray-tracing, as there are likely even better non-log-time (aka actually real-time) use cases.


Messaging?
Lastly, how should I be interpreting a single-hardware-vendor GDC launch of a new DX API running as Remedy's GDC slides make clear, in a single-vendor closed-to-modification Gameworks RTX API being labeled as DXR, wrapped with marketing of a benchmark vendor as one of the few early adopters?