20090714 - Hybrid Cached Raycast and Reproject Raster
I've been suggesting this idea and permutations of this idea partly because I'd like to see someone try it, and I'm busy "traversing" a different solution to a similar problem (just geared towards dynamic rather than more static geometry). Undoubtedly there are those who have thought of this, just I haven't seen a paper on it yet.
The rough idea is to amortize the ray traversal cost over many frames and to make use of both spacial and temporal coherence. This shares a lot of similarities to video compression, in regards to reprojection being like motion prediction in B/P-frames, and the tile cache being like bits of I-frames.
The idea uses a traversal tile cache. A tile being a warp or vector (SIMD) friendly packet of the result of raycasts into an object in the scene. For example for CUDA, a tile might be a 8x4 grid of raycasts. An object might be the entire static world, or a character, or a dynamic object. Tiles are associated with an object ID so that the tile can be correctly reprojected as the object and view moves.
Tracing into a skinned character is a separate problem for which I don't have a good direct solution. However, given a solution to this problem, one could use the per tile object ID to "re-skin" the tile during reprojection. Actually one would want to store the tile in the base pose coordinate space, and simply apply the bones to the fragments in the tile to do the reprojection.
Tiles in the cache become less valid as objects move or the view changes. Each frame the cache gets updated to fill areas of the view(s) (note, there might be multiple views such as shadowmaps, etc) which have sparse coverage. Tile cache update rate would scale to match the capacity of the GPU.
Drawing the frame would be done via a composite pass which re-projects, lights, and blends all the cached tiles (and fills in any holes). Part of lighting could very well be done in tile space, and at a frequency lower than the screen refresh for low frequency lighting, and perhaps in image space at the screen refresh rate for high frequency lighting like specular and hard shadows.
Ultimately the success and failure of this type of system depends on if the composite/reprojection cost is lower than the full traversal.
Compared to What I'm Doing
With classic ray traversal the structure is geared towards answering the question "if a ray intersects the object, what does it hit?". With my structure the traversal is geared towards answering the question, "given a lower quality version of the object in this 3D region, what does the next higher quality level of detail look like". In my case the traversal could be of any tree, even if the tree self intersects. In the case of a skinned object, classic ray traversal is a complex problem, in my case it is trivial.
I go directly down the tree without divergence, because my traversal doesn't go by ray intersection. However I depend on a reprojection scatter pass (which can be redone as a coarse binned scatter, local gather, or whatever form is fastest on the hardware). This reprojection scatter pass is used to prune out traversals of the tree which are not adding to visible nodes.
My scene is that of a forest of l-systems (64K roots, 2M nodes). Each traversal step is amazingly expensive (decompress rule, apply rule) compared to a triangle or voxel raycast or raytrace, and in my case the composite/reprojection step isn't as costly as traversal, so the hybrid traverse/raster works.
Not sure if the hybrid cached trace/raster approach will work well for classic raycasting, but I'd bet one of you out there will be answering this question in the next year or so!