20120110 - Single Pass SRAA

Here is a better description of a possible single geometry pass SRAA. I'm using OpenGL in this example because that is what I work in most often,

Use "layout(early_fragment_tests)". Described in GL_EXT_shader_image_load_store,

"When early per-fragment operations are enabled, the depth bounds test, stencil test, depth buffer test, and occlusion query sample counting operations are performed prior to fragment shader execution, and the stencil buffer, depth buffer, and occlusion query sample counts will be updated accordingly. When early per-fragment operations are enabled, these operations will not be performed again after fragment shader execution."

The render pass will be writing Z as standard and writing out the lower 8-bits of gl_PrimitiveID into an MSAA render target.

The render pass will fetch the coverage mask, gl_SampleMaskIn[], and check if the shaded fragment covers the sample associated with the non-MSAA g-buffer. If the sample is not covered, the shader exits here.

I do not remember off-hand if the gl_SampleMaskIn[] has pre or post-z test coverage when "early_fragment_tests" is enabled (this is not strictly defined in the GL specs). If coverage is post-z test, then all is good, if it is pre-z test, then there is the possibility that the fragment shader will be executed even if the associated "non-MSAA sample" is occluded. But at least in this state one of the samples in the pixel is non-occluded. Can attempt to correct for this case later by storing the associated lower 8-bits of the gl_PrimitiveID in the g-buffer also.

If the fragment shader does not early exit, then the fragment shader writes G-buffer output using image stores to a set of non-MSAA images (not using ROP).

In theory there is a possible ordering problem here, in that there is no ordering on image stores, but in practice this I'm guessing this isn't a problem on current hardware. Would have to test this to be sure. If it is a problem there are multiple work arounds....