20090709 - Stochastic Visibility Update
I haven't been posting what I've been working on in graphics for a while because I've been too busy with other non-graphics stuff, however vacation has given me a bunch of time to think up new ideas (of which I will ultimately only have time to try a select few).
I'm still fascinated with the problem of solving visibility for fully dynamic geometry requiring very expensive ray traversals (if ray cast or traced), but without ray casting (or tracing) and instead using what I've termed as "stochastic visibility" and blogged about. Effectively keeping a consistent tree structure of the scene as required by visibility (and shading), then only expanding and contracting the tree structure enough each frame to service rendering to the quality level required for human frame to frame perception. Stochastic visibility collides a point per scene tree node in an image space mapping to both help compute tree updates and solve for visibility. It is stochastic because the points collided are randomly positioned inside the bounding volume of the node. Collisions help prune out nodes which yield non-visible geometry. Collisions also directly solve the tree node memory allocation problem (there is no memory allocation).
I might just have have to force myself to get the per frame JPEG output working again so that I can post some videos of the older stochastic visibility working on the GTX275 prior to Siggraph 2009.
Vacation has provided some new ideas to improve upon "stochastic visibility" which I really want to get to (put here so that I don't forget),
(1.) IMPROVED TEMPORAL CONSISTENCY VS HISTORY BUFFER. Collisions would sometimes prune out a part of the required scene tree branch, and keeping persistent nodes was way too costly to do in OpenGL3 with GPGPU methods on a 8600 level graphics card. One trivial solution to this problem would be to run a pass which checks the source data for collisions and does a reduction of the source nodes keeping the highest priority node which had a collision. If I had a set of reducing resolution history buffers, I could likely insure good temporal consistency (solving the random pruning problem).
(2.) ADD PARENT LINKS. My scene tree nodes didn't have links to the parents. Was a problem for animated l-systems and particle fluid dynamics, because I wanted a physical constraint which pulled nodes back to the static position in parent (as defined as a possibly animated l-system rule). Turns out that the history buffer pass enables me to correct parent links in child nodes, even though parents move to different memory locations each frame. If this works, this GPU data structure truly becomes awesome because it solves the 1M node memory allocation problem per frame with fully dynamic trees, automatic "defragmentation" of memory, automatic regrouping for good data/branch locality and cache performance, and now nodes maintaining parent links even though all memory locations change per frame.
(3.) MULTI-LEVEL TREE UPDATE PER FRAME. With LOD transparent blend-in and a triangle based scene, I've found in previous results that I can add/prune nodes at only a LOD level per frame and effectively service visibility if I have a conservative amount of overlap. However when the screen tree goes down to the pixel or near pixel/level, the tree must be able to expand by more than one level per frame to fill visibility gaps in dynamic geometry. I'm planning on a new method where I first re-project the current node set, then do a hierarchical image space reduction to choose the highest priority nodes for varying level multi-level tree update per frame.
If all goes well, I'll be fixing the problems I had with my engine designed with only mid-range OpenGL3 GPU requirements. Sure this is distracting me from my Compute 2 CUDA work, but the market possibility of making it work on a majority of GPUs vs only DX11 and some DX10 level GPUs is just great to ignore...
Random Thoughts on Temporal and Spacial Locality
For real-time rendering, in many scenes, perhaps a large part low frequency terms to lighting don't change a lot per frame, and could probably be amortized across many frames. Could splitting lighting into different solutions for high and low frequency lighting contributions work well (we, game devs, kind of already do this, think about Gears2 SSAO)? Low frequency typically requires more samples (such as diffuse GI or AO) which could be amortized over more frames. High frequency often requires less (such as mirror reflections or spec highlights) which cannot be amortized over frames (except perhaps when faked with cubemaps). Perhaps this can be pushed more.
At some point triangles are micro-polys, or perhaps now point rendered, or perhaps voxel rendered. When we reach this point, does it make sense to begin to group these tiny things to shade by not just by spacial locality, but also by normal (or better yet sample directions required to service BRDF)? The idea being that you'd have a group of fragments, all of which likely need a very similar subset of fetched samples or raycasts (and in the low frequency diffuse case, could share lots of raycasts or samples). Note this grouping could be static in the geometry structure. Or perhaps this just means that the low frequency lighting path has built-in separate temporal reprojection with depth/normal aware low-pass filtering.
I've been thinking deferred shading via DX11 CS5 would be quite fun, but looks as if dynamic thread SIMD re-grouping overhead is just a little too expensive for many fun things (and insane amounts of fast stencil just isn't going to cut it). DX11 doesn't expose any horizontal SIMD scan instruction, so likely not for this generation (except via non-portable means on certain hardware)? Still CS opens up a huge number of deferred techniques to efficient implementations. Those who shine with DX11 CS will be finding really good ways to have lots of dynamic shadows...