20110403 - TSSAA (Temporal Super-Sampling AA)


Here is a prototype of Temporal Super-Sampling Anti-Aliasing which solves prior problems of ghosting and reprojection, while supporting quality around 8x SSAA.


Performance

This requires 11 32-bit texture fetches per pixel. Estimate cost around 1.7ms for a 1280x720 frame on Xbox360.


Screen Shots

Images are ripped from a simple GL test app which uses a modified Unigine screenshot to texture. Images are presented 2x size. Second shot shows the extreme motion case (where TSSAA slightly blurs the image),

TSSAA without motion (16x sub-pixel jitter from below)


TSSAA with fast motion (16x sub-pixel jitter from below)


No AA


How it Works

Yakiimo3D's DX11 Perspective Matrix Jittering Temporal AA is a good place to get some background on the Halo/Crysis method for temporal 2xSSAA, which is a similar but slightly different temporal AA method (combining samples instead of accumulating samples).

This TSSAA algorithm reprojects the results from the prior TSSAA output frame. Then limits this reprojection by a smooth minimum and maximum color computed from the neighborhood of the pixel in the current frame. This limiter removes ghosting and other temporal artifacts caused by incorrect reprojection. The current pixel is accumulated with the limited reprojection using a simple falloff.


Artifacts

(1.) Motion will tend to reduce sharpness a little.
(2.) Fast exposure change exhibits light image erosion/dilation.
(3.) Very thin sub-pixel details will still exhibit some temporal dithering.


Advantages Besides Super-Sampling

Haven't had the chance to test in a real game, so would really like to get some feedback if anyone tries this!

TSSAA should remove shader jitter and noise (so should likely apply any film grain effects after TSSAA perhaps during write to back-buffer). In theory things like jittered shadowing, dithered DOF, dithered motion blur, and SSAO without blur will be partly cleaned up by TSSAA, but only if the noise is single pixel noise. Lower frequency noise likely will not get cleaned up as well.

In theory TSSAA should work on transparent objects even if the object's motion vectors are not rendered into the motion vector field and used for reprojection offset. Might even work better when the typically sharper background motion vectors are used.


Temporal Sub-pixel Shift

When using this method, frames need to be drawn with a sub-pixel shift on the projection matrix (different offset for each frame). This is the same type of shift one would use when accumulating frames for a super-sampled AA "press-release" screen shot. Not sure what the best pattern is right now (haven't had the time to try all options). I have been using the following pattern for 4x,

{-0.75,-0.25}, {-0.75,0.25}, {0.25,-0.75}, {-0.25,0.75}

Then for 16x adding in a different offset each 4 frames,

{-0.375,-0.125}, {0.375,0.125}, {0.125,-0.375}, {-0.125,0.375}

Likely best to use a rotated grid pattern (like hardware AA) but distribute the samples temporally such that the distribution fills a pixel evenly temporally.


When integrating with AFR Rendering

Have not tried this with AFR, so this is only a best guess at how to integrate!

For AFR rendering with two GPUs, integration will need to be adjusted. Instead of depending on the previous N-1 frame output, use the previous N-2 frame output. Then for computing the previous position, offset by the N frame motion vector * 2.0. The algorithm should work fine even if the reprojection vector isn't fully correct.


Source

Note, I've only tested the TSSAA_GLSL_130 path, everything else should work however...

This source is designed to work with the DX10/DX11/GL pixel position instead of the half offset DX9 position. Setting TSSAA_FALLOFF at 1/8 seems to have the best trade-off between sharpness during motion and reduction of temporal dithering. Setting TSSAA_SMOOTHER to zero will make the shader sharper during motion, but the algorithm will be less effective on edge AA on the borders of objects in motion on a non-moving background.

Note this should be applied using non-HDR and non-sRGB inputs (towards the end of the post processing pipeline)!

/*============================================================================

TSSAA PROTOTYPE by TIMOTHY LOTTES @ NVIDIA

============================================================================*/

/*============================================================================
API PORTING
============================================================================*/
#ifndef TSSAA_GLSL_120
#define TSSAA_GLSL_120 0
#endif
#ifndef TSSAA_GLSL_130
#define TSSAA_GLSL_130 0
#endif
#ifndef TSSAA_HLSL_3
#define TSSAA_HLSL_3 0
#endif
#ifndef TSSAA_HLSL_4
#define TSSAA_HLSL_4 0
#endif
/*--------------------------------------------------------------------------*/
#if TSSAA_GLSL_120
// Requires,
// #version 120
// #extension GL_EXT_gpu_shader4 : enable
#define int2 ivec2
#define float2 vec2
#define float3 vec3
#define float4 vec4
#define TssaaInt2 ivec2
#define TssaaFloat2 vec2
#define TssaaTex sampler2D
#define TssaaTexLod0(t, p) texture2DLod(t, p, 0.0)
#define TssaaTexOff(t, p, o, r) texture2DLodOffset(t, p, 0.0, o)
#endif
/*--------------------------------------------------------------------------*/
#if TSSAA_GLSL_130
// Requires "#version 130" or better
#define int2 ivec2
#define float2 vec2
#define float3 vec3
#define float4 vec4
#define TssaaInt2 ivec2
#define TssaaFloat2 vec2
#define TssaaTex sampler2D
#define TssaaTexLod0(t, p) textureLod(t, p, 0.0)
#define TssaaTexOff(t, p, o, r) textureLodOffset(t, p, 0.0, o)
#endif
/*--------------------------------------------------------------------------*/
#if TSSAA_HLSL_3
#define int2 float2
#define TssaaInt2 float2
#define TssaaFloat2 float2
#define TssaaTex sampler2D
#define TssaaTexLod0(t, p) tex2Dlod(t, float4(p, 0.0, 0.0))
#define TssaaTexOff(t, p, o, r) tex2Dlod(t, float4(p + (o * r), 0, 0))
#endif
/*--------------------------------------------------------------------------*/
#if TSSAA_HLSL_4
#define TssaaInt2 int2
#define TssaaFloat2 float2
struct TssaaTex { SamplerState smpl; Texture2D tex; };
#define TssaaTexLod0(t, p) t.tex.SampleLevel(t.smpl, p, 0.0)
#define TssaaTexOff(t, p, o, r) t.tex.SampleLevel(t.smpl, p, 0.0, o)
#endif

/*============================================================================

VERTEX SHADER

============================================================================*/
float4 TssaaVertexShader(
float2 pos, // Both x and y range {-1.0 to 1.0 across screen}.
float2 rcpFrame) { // {1.0/frameWidth, 1.0/frameHeight}
/*--------------------------------------------------------------------------*/
float4 posPos;
posPos.xy = (pos.xy * 0.5) + 0.5;
posPos.zw = posPos.xy - (rcpFrame * 0.5);
return posPos; }

/*============================================================================

PIXEL SHADER

============================================================================*/
float3 TssaaPixelShader(
float4 posPos, // TssaaVertexShader() output interpolated across screen.
TssaaTex texNext, // Current frame before TSSAA.
TssaaTex texPrev, // Previous TSSAA output.
float2 posPrev, // Position of pixel in previous TSSAA output frame.
float2 rcpFrame) { // Constant {1.0/frameWidth, 1.0/frameHeight}.
/*--------------------------------------------------------------------------*/
#define TSSAA_FALLOFF (1.0/8.0)
#define TSSAA_SMOOTHER 1
/*--------------------------------------------------------------------------*/
#if TSSAA_SMOOTHER
float3 nextNW = TssaaTexOff(texNext, posPos.zw, TssaaInt2(-1,-1), rcpFrame).xyz;
float3 nextNE = TssaaTexOff(texNext, posPos.zw, TssaaInt2( 2,-1), rcpFrame).xyz;
float3 nextSW = TssaaTexOff(texNext, posPos.zw, TssaaInt2(-1, 2), rcpFrame).xyz;
float3 nextSE = TssaaTexOff(texNext, posPos.zw, TssaaInt2( 2, 2), rcpFrame).xyz;
#else
float3 nextNW = TssaaTexOff(texNext, posPos.zw, TssaaInt2( 0, 0), rcpFrame).xyz;
float3 nextNE = TssaaTexOff(texNext, posPos.zw, TssaaInt2( 1, 0), rcpFrame).xyz;
float3 nextSW = TssaaTexOff(texNext, posPos.zw, TssaaInt2( 0, 1), rcpFrame).xyz;
float3 nextSE = TssaaTexOff(texNext, posPos.zw, TssaaInt2( 1, 1), rcpFrame).xyz;
#endif
/*--------------------------------------------------------------------------*/
float3 nextN = TssaaTexOff(texNext, posPos.xy, TssaaInt2( 0,-1), rcpFrame).xyz;
float3 nextW = TssaaTexOff(texNext, posPos.xy, TssaaInt2(-1, 0), rcpFrame).xyz;
float3 nextM = TssaaTexLod0(texNext, posPos.xy).xyz;
float3 nextE = TssaaTexOff(texNext, posPos.xy, TssaaInt2( 1, 0), rcpFrame).xyz;
float3 nextS = TssaaTexOff(texNext, posPos.xy, TssaaInt2( 0, 1), rcpFrame).xyz;
/*--------------------------------------------------------------------------*/
float3 prevM = TssaaTexLod0(texPrev, posPrev.xy).xyz;
/*--------------------------------------------------------------------------*/
float3 nextNN = nextNW + nextNE;
float3 nextSS = nextSW + nextSE;
float3 nextMinH = min(nextNN, nextSS);
float3 nextMaxH = max(nextNN, nextSS);
float3 nextWW = nextNW + nextSW;
float3 nextEE = nextNE + nextSE;
float3 nextMinV = min(nextWW, nextEE);
float3 nextMaxV = max(nextWW, nextEE);
float3 nextMinHV = min(nextMinH, nextMinV) * 0.5;
float3 nextMaxHV = max(nextMaxH, nextMaxV) * 0.5;
/*--------------------------------------------------------------------------*/
float3 nextMinHVN = min(nextMinHV, nextN);
float3 nextMaxHVN = max(nextMaxHV, nextN);
float3 nextMinWM = min(nextW, nextM);
float3 nextMaxWM = max(nextW, nextM);
float3 nextMFalloff = nextM * TSSAA_FALLOFF;
float3 nextMinSE = min(nextE, nextS);
float3 nextMaxSE = max(nextE, nextS);
float3 nextMinHVNWM = min(nextMinHVN, nextMinWM);
float3 nextMaxHVNWM = max(nextMaxHVN, nextMaxWM);
float3 nextMin = min(nextMinSE, nextMinHVNWM);
float3 nextMax = max(nextMaxSE, nextMaxHVNWM);
/*--------------------------------------------------------------------------*/
return nextMFalloff + (min(nextMax, max(nextMin, prevM)) * (1.0 - TSSAA_FALLOFF)); }