20161018 - Fixed Point Rounding


Posting a few more notes while in the background I continue to work towards the next design try.

=====================
  DSP FUNCTIONALITY
=====================
Some of the Xilinx 7 series DSP functionality and base pipeline (where p as input is from clk-1),

  clk-2  clk-1     clk
  =====  ========  ===========
         c=        p{op}=p
         c=        p{op}=(p>>17)
         c=        p{op}=c
         c=        p=c
  a=,b=  m=a*b     p=m 
  a=,b=  m=a*b     p+=m 
  a=,b=  m=a*b     p-=m 
  a=,b=  m=a*b     p=(p>>17)+m 
  a=,b=  m=a*b     p=(p>>17)-m 
  a=,b=  m=a*b,c=  p=c+m 
  a=,b=  m=a*b,c=  p=c-m
  a=,b=  t=a:b     p{op}=t
  a=,b=  t=a:b     p=(p>>17){op}t
  a=,b=  t=a:b,c=  p=c{op}t

For an 18-bit word accumulator based machine, it is possible to fast min/max
Uses the internal DSP result forwarding paths

  // signed min and max in 3 cycles
  p=min(p,c); -> p-=c; p= (p>>17)&p; p+=c;
  p=max(p,c); -> p-=c; p=~(p>>17)&p; p+=c;

Forwarding is going to be useful for going to one thread and mitigating pipelining

  // branch free, where x and y are immediates in 2 cycles
  p=(p< 0)?x:y; -> p= (p>>17)&(x-y); p+=y;
  p=(p>=0)?x:y; -> p=~(p>>17)&(x-y); p+=y;


====================
  DSP AND ROUNDING
====================
Didn't see how to efficiently implement "round half to even"
However "round half away from zero" seems easy to implement
Logic for rounding (hopefully I got that right),

  if(x>=0) x++; // cin in second example
  x+=(1<<(n-1))-1; // c input in second example
  x>>=n;

DSP supports CIN (carry in) support
Which can take either the inverted sign of P 
(dsp output feed back in the next cycle),
or the inverted sign of a*b
DSP can do the following {multiply, round, shift} 
at a throughput of 0.5 clocks,

  p=a*b+cin+c; // throughput of one clock
  p=c+(p>>17); // gets dedicated forwarding path

Common usage case,

  // divide by a constant via multiply by reciprocal, 
  // or multiply by {0 to 131071} representing {0.0 to nearly 1.0}
  a=number; // up to 25-bit signed number on 7 series DSP
  b=fraction; // second argument is 18-bits signed
  p=a*b+cin+65535;
  p=p>>17;

For reference
Full table for 3-bit signed numbers a*b multiply 
with round away from zero before shift right by 2,

a    b       float   int   binary      a*b+cin   +round     output
------------------------   ------------------------------------------
-4 * -1.00 =  4.00 ->  4   100 * 100 = 010001 -> 0010010 -> 0100 =  4 (overflows) 
-4 * -0.75 =  3.00 ->  3   100 * 101 = 001101 -> 0001110 -> 0011 =  3
-4 * -0.50 =  2.00 ->  2   100 * 110 = 001001 -> 0001010 -> 0010 =  2
-4 * -0.25 =  1.00 ->  1   100 * 111 = 000101 -> 0000110 -> 0001 =  1
-4 *  0.00 = -0.00 ->  0   100 * 000 = 000001 -> 0000010 -> 0000 =  0
-4 *  0.25 = -1.00 -> -1   100 * 001 = 111100 -> 1111101 -> 1111 = -1
-4 *  0.50 = -2.00 -> -2   100 * 010 = 111000 -> 1111001 -> 1110 = -2
-4 *  0.75 = -3.00 -> -3   100 * 011 = 110100 -> 1110101 -> 1101 = -3
------------------------   ------------------------------------------
-3 * -1.00 =  3.00 ->  3   101 * 100 = 001101 -> 0001110 -> 0011 =  3
-3 * -0.75 =  2.25 ->  2   101 * 101 = 001010 -> 0001011 -> 0010 =  2
-3 * -0.50 =  1.50 ->  2   101 * 110 = 000111 -> 0001000 -> 0010 =  2
-3 * -0.25 =  0.75 ->  1   101 * 111 = 000100 -> 0000101 -> 0001 =  1
-3 *  0.00 = -0.00 ->  0   101 * 000 = 000001 -> 0000010 -> 0000 =  0
-3 *  0.25 = -0.75 -> -1   101 * 001 = 111101 -> 1111110 -> 1111 = -1
-3 *  0.50 = -1.50 -> -2   101 * 010 = 111010 -> 1111011 -> 1110 = -2
-3 *  0.75 = -2.25 -> -2   101 * 011 = 110111 -> 1111000 -> 1110 = -2
------------------------   ------------------------------------------
-2 * -1.00 =  2.00 ->  2   110 * 100 = 001001 -> 0001010 -> 0010 =  2
-2 * -0.75 =  1.50 ->  2   110 * 101 = 000111 -> 0001000 -> 0010 =  2
-2 * -0.50 =  1.00 ->  1   110 * 110 = 000101 -> 0000110 -> 0001 =  1
-2 * -0.25 =  0.50 ->  1   110 * 111 = 000011 -> 0000100 -> 0001 =  1
-2 *  0.00 = -0.00 ->  0   110 * 000 = 000001 -> 0000010 -> 0000 =  0
-2 *  0.25 = -0.50 -> -1   110 * 001 = 111110 -> 1111111 -> 1111 = -1
-2 *  0.50 = -1.00 -> -1   110 * 010 = 111100 -> 1111101 -> 1111 = -1
-2 *  0.75 = -1.50 -> -2   110 * 011 = 111010 -> 1111011 -> 1110 = -2
------------------------   ------------------------------------------
-1 * -1.00 =  1.00 ->  1   111 * 100 = 000101 -> 0000110 -> 0001 =  1
-1 * -0.75 =  0.75 ->  1   111 * 101 = 000100 -> 0000101 -> 0001 =  1
-1 * -0.50 =  0.50 ->  1   111 * 110 = 000011 -> 0000100 -> 0001 =  1
-1 * -0.25 =  0.25 ->  0   111 * 111 = 000010 -> 0000011 -> 0000 =  0
-1 *  0.00 = -0.00 ->  0   111 * 000 = 000001 -> 0000010 -> 0000 =  0
-1 *  0.25 = -0.25 ->  0   111 * 001 = 111111 -> 0000000 -> 0000 =  0
-1 *  0.50 = -0.50 -> -1   111 * 010 = 111110 -> 1111111 -> 1111 = -1
-1 *  0.75 = -0.75 -> -1   111 * 011 = 111101 -> 1111110 -> 1111 = -1
------------------------   ------------------------------------------
 0 * -1.00 = -0.00 ->  0   000 * 100 = 000001 -> 0000010 -> 0000 =  0
 0 * -0.75 = -0.00 ->  0   000 * 101 = 000001 -> 0000010 -> 0000 =  0
 0 * -0.50 = -0.00 ->  0   000 * 110 = 000001 -> 0000010 -> 0000 =  0
 0 * -0.25 = -0.00 ->  0   000 * 111 = 000001 -> 0000010 -> 0000 =  0
 0 *  0.00 =  0.00 ->  0   000 * 000 = 000001 -> 0000010 -> 0000 =  0
 0 *  0.25 =  0.00 ->  0   000 * 001 = 000001 -> 0000010 -> 0000 =  0
 0 *  0.50 =  0.00 ->  0   000 * 010 = 000001 -> 0000010 -> 0000 =  0
 0 *  0.75 =  0.00 ->  0   000 * 011 = 000001 -> 0000010 -> 0000 =  0
------------------------   ------------------------------------------
 1 * -1.00 = -1.00 -> -1   001 * 100 = 111100 -> 1111101 -> 1111 = -1
 1 * -0.75 = -0.75 -> -1   001 * 101 = 111101 -> 1111110 -> 1111 = -1
 1 * -0.50 = -0.50 -> -1   001 * 110 = 111110 -> 1111111 -> 1111 = -1
 1 * -0.25 = -0.25 ->  0   001 * 111 = 111111 -> 0000000 -> 0000 =  0
 1 *  0.00 =  0.00 ->  0   001 * 000 = 000001 -> 0000010 -> 0000 =  0
 1 *  0.25 =  0.25 ->  0   001 * 001 = 000010 -> 0000011 -> 0000 =  0
 1 *  0.50 =  0.50 ->  1   001 * 010 = 000011 -> 0000100 -> 0001 =  1
 1 *  0.75 =  0.75 ->  1   001 * 011 = 000100 -> 0000101 -> 0001 =  1
------------------------   ------------------------------------------
 2 * -1.00 = -2.00 -> -2   010 * 100 = 111000 -> 1111001 -> 1110 = -2
 2 * -0.75 = -1.50 -> -2   010 * 101 = 111010 -> 1111011 -> 1110 = -2
 2 * -0.50 = -1.00 -> -1   010 * 110 = 111100 -> 1111101 -> 1111 = -1
 2 * -0.25 = -0.50 -> -1   010 * 111 = 111110 -> 1111111 -> 1111 = -1
 2 *  0.00 =  0.00 ->  0   010 * 000 = 000001 -> 0000010 -> 0000 =  0
 2 *  0.25 =  0.50 ->  1   010 * 001 = 000011 -> 0000100 -> 0001 =  1
 2 *  0.50 =  1.00 ->  1   010 * 010 = 000101 -> 0000110 -> 0001 =  1
 2 *  0.75 =  1.50 ->  2   010 * 011 = 000111 -> 0001000 -> 0010 =  2
------------------------   ------------------------------------------
 3 * -1.00 = -3.00 -> -3   011 * 100 = 110100 -> 1110101 -> 1101 = -3
 3 * -0.75 = -2.25 -> -2   011 * 101 = 110111 -> 1111000 -> 1110 = -2
 3 * -0.50 = -1.50 -> -2   011 * 110 = 111010 -> 1111011 -> 1110 = -2
 3 * -0.25 = -0.75 -> -1   011 * 111 = 111101 -> 1111110 -> 1111 = -1
 3 *  0.00 =  0.00 ->  0   011 * 000 = 000001 -> 0000010 -> 0000 =  0
 3 *  0.25 =  0.75 ->  1   011 * 001 = 000100 -> 0000101 -> 0001 =  1
 3 *  0.50 =  1.50 ->  2   011 * 010 = 000111 -> 0001000 -> 0010 =  2
 3 *  0.75 =  2.25 ->  2   011 * 011 = 001010 -> 0001011 -> 0010 =  2

Keeping the prior rough draft as well..


Round Half Away From Zero
if(x>=0) x++; // cin in second example
x+=(1<<(n-1))-1; // c input in second example
x>>=n;

Above is what I believe is the correct round half away from zero logic when shifting right by 'n' for signed integers. This can be implemented in the FPGA DSP rather easily. The CIN (carry in) can take either the inverted sign of P (dsp output feed back in the next cycle), or the inverted sign of a*b. DSP can do the following {multiply, round, shift} at a throughput of 0.5 clocks.

p=a*b+cin+c; // throughput of one clock
p=p>>17; // gets dedicated forwarding path, can optionally add or subtract in same cycle

Full table for 3-bit signed numbers a*b multiply with round away from zero before shift right by 2,

a    b       float   int   binary      a*b+cin   +round     output
------------------------   ------------------------------------------
-4 * -1.00 =  4.00 ->  4   100 * 100 = 010001 -> 0010010 -> 0100 =  4 (overflows) 
-4 * -0.75 =  3.00 ->  3   100 * 101 = 001101 -> 0001110 -> 0011 =  3
-4 * -0.50 =  2.00 ->  2   100 * 110 = 001001 -> 0001010 -> 0010 =  2
-4 * -0.25 =  1.00 ->  1   100 * 111 = 000101 -> 0000110 -> 0001 =  1
-4 *  0.00 = -0.00 ->  0   100 * 000 = 000001 -> 0000010 -> 0000 =  0
-4 *  0.25 = -1.00 -> -1   100 * 001 = 111100 -> 1111101 -> 1111 = -1
-4 *  0.50 = -2.00 -> -2   100 * 010 = 111000 -> 1111001 -> 1110 = -2
-4 *  0.75 = -3.00 -> -3   100 * 011 = 110100 -> 1110101 -> 1101 = -3
------------------------   ------------------------------------------
-3 * -1.00 =  3.00 ->  3   101 * 100 = 001101 -> 0001110 -> 0011 =  3
-3 * -0.75 =  2.25 ->  2   101 * 101 = 001010 -> 0001011 -> 0010 =  2
-3 * -0.50 =  1.50 ->  2   101 * 110 = 000111 -> 0001000 -> 0010 =  2
-3 * -0.25 =  0.75 ->  1   101 * 111 = 000100 -> 0000101 -> 0001 =  1
-3 *  0.00 = -0.00 ->  0   101 * 000 = 000001 -> 0000010 -> 0000 =  0
-3 *  0.25 = -0.75 -> -1   101 * 001 = 111101 -> 1111110 -> 1111 = -1
-3 *  0.50 = -1.50 -> -2   101 * 010 = 111010 -> 1111011 -> 1110 = -2
-3 *  0.75 = -2.25 -> -2   101 * 011 = 110111 -> 1111000 -> 1110 = -2
------------------------   ------------------------------------------
-2 * -1.00 =  2.00 ->  2   110 * 100 = 001001 -> 0001010 -> 0010 =  2
-2 * -0.75 =  1.50 ->  2   110 * 101 = 000111 -> 0001000 -> 0010 =  2
-2 * -0.50 =  1.00 ->  1   110 * 110 = 000101 -> 0000110 -> 0001 =  1
-2 * -0.25 =  0.50 ->  1   110 * 111 = 000011 -> 0000100 -> 0001 =  1
-2 *  0.00 = -0.00 ->  0   110 * 000 = 000001 -> 0000010 -> 0000 =  0
-2 *  0.25 = -0.50 -> -1   110 * 001 = 111110 -> 1111111 -> 1111 = -1
-2 *  0.50 = -1.00 -> -1   110 * 010 = 111100 -> 1111101 -> 1111 = -1
-2 *  0.75 = -1.50 -> -2   110 * 011 = 111010 -> 1111011 -> 1110 = -2
------------------------   ------------------------------------------
-1 * -1.00 =  1.00 ->  1   111 * 100 = 000101 -> 0000110 -> 0001 =  1
-1 * -0.75 =  0.75 ->  1   111 * 101 = 000100 -> 0000101 -> 0001 =  1
-1 * -0.50 =  0.50 ->  1   111 * 110 = 000011 -> 0000100 -> 0001 =  1
-1 * -0.25 =  0.25 ->  0   111 * 111 = 000010 -> 0000011 -> 0000 =  0
-1 *  0.00 = -0.00 ->  0   111 * 000 = 000001 -> 0000010 -> 0000 =  0
-1 *  0.25 = -0.25 ->  0   111 * 001 = 111111 -> 0000000 -> 0000 =  0
-1 *  0.50 = -0.50 -> -1   111 * 010 = 111110 -> 1111111 -> 1111 = -1
-1 *  0.75 = -0.75 -> -1   111 * 011 = 111101 -> 1111110 -> 1111 = -1
------------------------   ------------------------------------------
 0 * -1.00 = -0.00 ->  0   000 * 100 = 000001 -> 0000010 -> 0000 =  0
 0 * -0.75 = -0.00 ->  0   000 * 101 = 000001 -> 0000010 -> 0000 =  0
 0 * -0.50 = -0.00 ->  0   000 * 110 = 000001 -> 0000010 -> 0000 =  0
 0 * -0.25 = -0.00 ->  0   000 * 111 = 000001 -> 0000010 -> 0000 =  0
 0 *  0.00 =  0.00 ->  0   000 * 000 = 000001 -> 0000010 -> 0000 =  0
 0 *  0.25 =  0.00 ->  0   000 * 001 = 000001 -> 0000010 -> 0000 =  0
 0 *  0.50 =  0.00 ->  0   000 * 010 = 000001 -> 0000010 -> 0000 =  0
 0 *  0.75 =  0.00 ->  0   000 * 011 = 000001 -> 0000010 -> 0000 =  0
------------------------   ------------------------------------------
 1 * -1.00 = -1.00 -> -1   001 * 100 = 111100 -> 1111101 -> 1111 = -1
 1 * -0.75 = -0.75 -> -1   001 * 101 = 111101 -> 1111110 -> 1111 = -1
 1 * -0.50 = -0.50 -> -1   001 * 110 = 111110 -> 1111111 -> 1111 = -1
 1 * -0.25 = -0.25 ->  0   001 * 111 = 111111 -> 0000000 -> 0000 =  0
 1 *  0.00 =  0.00 ->  0   001 * 000 = 000001 -> 0000010 -> 0000 =  0
 1 *  0.25 =  0.25 ->  0   001 * 001 = 000010 -> 0000011 -> 0000 =  0
 1 *  0.50 =  0.50 ->  1   001 * 010 = 000011 -> 0000100 -> 0001 =  1
 1 *  0.75 =  0.75 ->  1   001 * 011 = 000100 -> 0000101 -> 0001 =  1
------------------------   ------------------------------------------
 2 * -1.00 = -2.00 -> -2   010 * 100 = 111000 -> 1111001 -> 1110 = -2
 2 * -0.75 = -1.50 -> -2   010 * 101 = 111010 -> 1111011 -> 1110 = -2
 2 * -0.50 = -1.00 -> -1   010 * 110 = 111100 -> 1111101 -> 1111 = -1
 2 * -0.25 = -0.50 -> -1   010 * 111 = 111110 -> 1111111 -> 1111 = -1
 2 *  0.00 =  0.00 ->  0   010 * 000 = 000001 -> 0000010 -> 0000 =  0
 2 *  0.25 =  0.50 ->  1   010 * 001 = 000011 -> 0000100 -> 0001 =  1
 2 *  0.50 =  1.00 ->  1   010 * 010 = 000101 -> 0000110 -> 0001 =  1
 2 *  0.75 =  1.50 ->  2   010 * 011 = 000111 -> 0001000 -> 0010 =  2
------------------------   ------------------------------------------
 3 * -1.00 = -3.00 -> -3   011 * 100 = 110100 -> 1110101 -> 1101 = -3
 3 * -0.75 = -2.25 -> -2   011 * 101 = 110111 -> 1111000 -> 1110 = -2
 3 * -0.50 = -1.50 -> -2   011 * 110 = 111010 -> 1111011 -> 1110 = -2
 3 * -0.25 = -0.75 -> -1   011 * 111 = 111101 -> 1111110 -> 1111 = -1
 3 *  0.00 =  0.00 ->  0   011 * 000 = 000001 -> 0000010 -> 0000 =  0
 3 *  0.25 =  0.75 ->  1   011 * 001 = 000100 -> 0000101 -> 0001 =  1
 3 *  0.50 =  1.50 ->  2   011 * 010 = 000111 -> 0001000 -> 0010 =  2
 3 *  0.75 =  2.25 ->  2   011 * 011 = 001010 -> 0001011 -> 0010 =  2

Didn't see any easy way to implement "Round Half to Even" on the FPGA DSP and maintain the forward paths.

Forwarded Shift Right by 17-bits
DSP is setup quite nicely for an 18-bit machine, obvious fast path is to use the built-in shift.

// divide by a constant via multiply by reciprocal, or multiply by {0 to less than 1.0} 
a=number; // up to 25-bit signed number on 7 series DSP
b=fraction; // second argument is 18-bits signed, {0 to 131071} representing {0.0 to nearly 1.0}
p=a*b+cin+65535;
p=p>>17;