20130821 - Modern Racing the Beam is a Great Idea
First estimate the head location for the next frame. Given a best prediction, render the stereo view pairs. This provides 2 frames of data from 2 different views. These views can be rendered at a lower resolution even on a still frame by insuring enough vertical and horizontal offset to be able to reconstruct a high resolution view (like rendering the stereo pair on different 2xMSAA samples aligned for the infinite background). Total pixel shader cost remains under control.
To render the actual view, render quads which are 1 to N ROP tiles tall by frame width. Order quads from top to bottom. Vertex shader spins doing uncached global memory reads waiting on the GPU to update a frame counter on v-sync. Then a fetch the latest head position from a pinned CPU memory queue. Without PCIe atomic operations to insure write order, just read both the matrix data and a hash of the matrix data and keep retrying until the hash check passes.
The pixel shader uses the just-in-time rotation and translation data pass in attributes from the vertex shader to reconstruct the correct view given head tracking using a quick image space search blending between the set of 2 images for the current frame and 2 from the prior frame. This pixel shader writes to the front buffer and does whatever warp is required for the VR display unit.
This is the modern version of the tricks some of us used to do years ago like reprogram the VGA palette registers each scan line to simulate true color on a 256 color video mode.
At least for GL, vendors just need to allow some API to get a front buffer surface, and something which increments a 32-bit frame counter in a GPU-side buffer on a v-sync interrupt.
Not sure if modern displays have reduced blanking times, or if blanking time after the interrupt is too little or too much. If the blanking time after the interrupt is too much, then use clock() instruction access (which GL doesn't currently provide) in the vertex shader to spin more?
Combine this with 120 Hz displays set to refresh with a strobing backlight. Scan-out is around 8 ms, so in theory 10 ms of latency might be possible on current hardware.