https://orlp.net/blog/subtraction-is-functionally-complete/

  * orlp.net
  * Blog
  * Tags
  * Github

Subtraction Is Functionally Complete

2023-09-28

To be precise, IEEE-754 floating point subtraction is functionally
complete. That means you can construct any binary circuit using
nothing but floating point subtraction.

To see how, we must start at the bottom. I quote the IEEE 754-2019
standard, section 6.3:

    6.3 The sign bit

    [...] When neither the inputs nor result are NaN, [...]; the sign of
    a sum, or of a difference $x-y$ regarded as a sum $x+(-y)$,
    differs from at most one of the addends' signs; [...]. These rules
    shall apply even when operands or results are zero or infinite.

    When the sum of two operands with opposite signs (or the
    difference of two operands with like signs) is exactly zero, the
    sign of that sum (or difference) shall be $+0$ under all
    rounding-direction attributes except roundTowardNegative; under
    that attribute, the sign of an exact zero sum (or difference)
    shall be $-0$.

Let's dissect that.

 1. A subtraction $x - y$ is considered a sum $x + (-y)$.
 2. Zero can have a sign, $-0$ and $0$ are distinct entities
    (although they compare equal when testing with ==).
 3. If both of the addends have the same sign, the output must have
    that sign. However, for $x - y$ that means if $x$ and $y$ have
    different signs the output must have the sign of $x$.
 4. If $x$ and $y$ have the same sign, and $x - y$ is zero, the
    output will be $+0$ for all rounding modes except
    roundTowardNegative, then it will be $-0$.

Now since the default rounding mode in virtually every context is
roundTiesToEven, we shall assume that from now on. However,
everything works analogously even for roundTowardNegative.

A truth table

So, what does that give us when subtracting zeroes?

-0 - -0 = +0  # Same sign, must be +0.
-0 - +0 = -0  # Different signs, sign from first argument.
+0 - -0 = +0  # Different signs, sign from first argument.
+0 - +0 = +0  # Same sign, must be +0.

Interesting... What if we say that $-0$ is false and $+0$ is true?

A B | O
----+--
0 0 | 1
0 1 | 0
1 0 | 1
1 1 | 1

Our resulting truth table is equivalent to ${A \vee \neg B}$, or ${B
\to A}$ (also known as the IMPLY gate, albeit with the arguments
swapped). It turns out this truth table is functionally complete,
which means we can make arbitrary circuits using only this gate.
Technically speaking, it is only functionally complete if given
access to the constant false. This is necessary to produce a NOT
gate, and NOT + IMPLY is a functionally complete set. I don't know a
better term for 'functionally complete if given access to some
constant value', however.

NAND and NOR are truly functionally complete by themselves, even
without access to any particular constant value. This is very
valuable when constructing microchips as you only need to be able to
produce a single kind of component, and do not need to worry about
routing a consistent low signal anywhere to produce a NOT gate.

Subtraction circuits

Let's build a demo in Python. First we'll define our constants and
allow us to print them nicely. Note that even though they are
distinct entities, $+0$ and $-0$ compare equal in IEEE 754 floating
point, so we must first extract the sign before comparing to 0 to
distinguish.

import math

f_false = -0.0
f_true = 0.0
f_repr = lambda x: True if math.copysign(1, x) > 0 else False

We can now make a NOT gate by using the fact that $-0 - x$ flips the
sign of zero $x$:

f_not = lambda x: f_false - x

Let's test that:

>>> f_repr(f_not(f_false))
True
>>> f_repr(f_not(f_true))
False

Great! We can also build an OR gate by noticing that if we flip the
sign of the second argument before subtracting, we always get $+0$
(true) unless both arguments are $-0$ (false):

f_or = lambda a, b: a - f_not(b)

Let's test it out:

>>> f_repr(f_or(f_false, f_false))
False
>>> f_repr(f_or(f_true,  f_false))
True
>>> f_repr(f_or(f_false, f_true))
True
>>> f_repr(f_or(f_true, f_true))
True

Now that we have OR and NOT, we can make all other gates, e.g.:

f_and = lambda a, b: f_not(f_or(f_not(a), f_not(b)))
f_xor = lambda a, b: f_or(f_and(f_not(a), b), f_and(a, f_not(b)))

>>> f_repr(f_and(f_false, f_false))
False
>>> f_repr(f_and(f_true,  f_false))
False
>>> f_repr(f_and(f_false, f_true))
False
>>> f_repr(f_and(f_true, f_true))
True

>>> f_repr(f_xor(f_false, f_false))
False
>>> f_repr(f_xor(f_true,  f_false))
True
>>> f_repr(f_xor(f_false, f_true))
True
>>> f_repr(f_xor(f_true, f_true))
False

Software integers

You may have heard of soft-float, software implementations of
floating point using integers. Let's turn that on its head: an
integer implementation done in software, using only floating point
ops. Let's do it in Rust so we can look at the final assembly output
to see how [DEL:horrifically slow:DEL] awesome it is.

type Bit = f32;
const ZERO: Bit = -0.0;
const ONE: Bit = 0.0;

fn not(x: Bit) -> Bit { ZERO - x }
fn or(a: Bit, b: Bit) -> Bit { a - not(b) }
fn and(a: Bit, b: Bit) -> Bit { not(or(not(a), not(b))) }
fn xor(a: Bit, b: Bit) -> Bit { or(and(not(a), b), and(a, not(b))) }
fn adder(a: Bit, b: Bit, c: Bit) -> (Bit, Bit) {
    let s = xor(xor(a, b), c);
    let c = or(and(xor(a, b), c), and(a, b));
    (s, c)
}

type SoftU8 = [Bit; 8];

pub fn softu8_add(a: SoftU8, b: SoftU8) -> SoftU8 {
    let (s0, c) = adder(a[0], b[0], ZERO);
    let (s1, c) = adder(a[1], b[1], c);
    let (s2, c) = adder(a[2], b[2], c);
    let (s3, c) = adder(a[3], b[3], c);
    let (s4, c) = adder(a[4], b[4], c);
    let (s5, c) = adder(a[5], b[5], c);
    let (s6, c) = adder(a[6], b[6], c);
    let (s7, _) = adder(a[7], b[7], c);
    [s0, s1, s2, s3, s4, s5, s6, s7]
}

// Hmm? u8? What's that? Shhhh....
pub fn to_softu8(x: u8) -> SoftU8 {
    std::array::from_fn(|i| if (x >> i) & 1 == 1 { ONE } else { ZERO })
}

pub fn from_softu8(x: SoftU8) -> u8 {
    (0..8).filter(|i| x[*i].signum() > 0.0).map(|i| 1 << i).sum()
}

fn main() {
    let a = to_softu8(23);
    let b = to_softu8(19);
    println!("{}", from_softu8(softu8_add(a, b)));
}

It's horrible, but it works, it dutifully prints 42. And it only took
$\approx 120$ floating point instructions to add two 8-bit integers:

On x86-64 there isn't actually a floating point negation instruction,
instead the compiler simply emits a XOR with a mask that toggles the
top bit, which is the sign bit of a IEEE-754 floating point number.

example::softu8_add:
        mov     rax, rdi
        movups  xmm2, xmmword ptr [rsi]
        movups  xmm0, xmmword ptr [rdx]
        movaps  xmm3, xmm2
        subps   xmm3, xmm0
        movaps  xmm4, xmm0
        subps   xmm4, xmm2
        movaps  xmm1, xmmword ptr [rip + .LCPI0_0]
        xorps   xmm4, xmm1
        subps   xmm4, xmm3
        xorps   xmm3, xmm3
        subss   xmm3, xmm4
        movaps  xmm6, xmm0
        xorps   xmm6, xmm1
        subss   xmm6, xmm2
        xorps   xmm6, xmm1
        subss   xmm6, xmm3
        movaps  xmm3, xmm4
        shufps  xmm3, xmm4, 85
        movaps  xmm5, xmm6
        subss   xmm5, xmm3
        xorps   xmm5, xmm1
        movaps  xmm10, xmm6
        xorps   xmm10, xmm1
        subss   xmm10, xmm3
        movaps  xmm7, xmm0
        shufps  xmm7, xmm0, 85
        xorps   xmm7, xmm1
        movaps  xmm3, xmm2
        shufps  xmm3, xmm2, 85
        subss   xmm7, xmm3
        xorps   xmm7, xmm1
        movaps  xmm8, xmm4
        unpckhpd        xmm8, xmm4
        movaps  xmm3, xmm0
        unpckhpd        xmm3, xmm0
        xorps   xmm3, xmm1
        movaps  xmm9, xmm2
        unpckhpd        xmm9, xmm2
        subss   xmm3, xmm9
        xorps   xmm3, xmm1
        xorps   xmm11, xmm11
        movaps  xmm9, xmm4
        shufps  xmm9, xmm4, 255
        subss   xmm7, xmm10
        movaps  xmm10, xmm7
        xorps   xmm10, xmm1
        subss   xmm10, xmm8
        subss   xmm3, xmm10
        unpcklps        xmm7, xmm3
        shufps  xmm6, xmm7, 64
        addps   xmm11, xmm4
        movlhps xmm5, xmm4
        subps   xmm4, xmm6
        movss   xmm4, xmm11
        subps   xmm7, xmm8
        xorps   xmm7, xmm1
        shufps  xmm5, xmm7, 66
        subps   xmm5, xmm4
        xorps   xmm3, xmm1
        subss   xmm3, xmm9
        shufps  xmm0, xmm0, 255
        xorps   xmm0, xmm1
        shufps  xmm2, xmm2, 255
        subss   xmm0, xmm2
        xorps   xmm0, xmm1
        movups  xmmword ptr [rdi], xmm5
        movups  xmm2, xmmword ptr [rdx + 16]
        movaps  xmm5, xmm2
        xorps   xmm5, xmm1
        movups  xmm7, xmmword ptr [rsi + 16]
        subss   xmm5, xmm7
        xorps   xmm5, xmm1
        movaps  xmm4, xmm2
        shufps  xmm4, xmm2, 85
        xorps   xmm4, xmm1
        movaps  xmm6, xmm2
        movaps  xmm8, xmm7
        movaps  xmm9, xmm7
        subps   xmm9, xmm2
        subps   xmm2, xmm7
        shufps  xmm7, xmm7, 85
        subss   xmm4, xmm7
        xorps   xmm4, xmm1
        movhlps xmm6, xmm6
        xorps   xmm6, xmm1
        movhlps xmm8, xmm8
        subss   xmm6, xmm8
        xorps   xmm6, xmm1
        xorps   xmm2, xmm1
        subps   xmm2, xmm9
        subss   xmm0, xmm3
        movaps  xmm3, xmm0
        xorps   xmm3, xmm1
        subss   xmm3, xmm2
        subss   xmm5, xmm3
        unpcklps        xmm0, xmm5
        xorps   xmm5, xmm1
        movaps  xmm3, xmm2
        shufps  xmm3, xmm2, 85
        subss   xmm5, xmm3
        subss   xmm4, xmm5
        movaps  xmm3, xmm4
        xorps   xmm3, xmm1
        movaps  xmm5, xmm2
        unpckhpd        xmm5, xmm2
        subss   xmm3, xmm5
        subss   xmm6, xmm3
        unpcklps        xmm4, xmm6
        movlhps xmm0, xmm4
        movaps  xmm3, xmm2
        subps   xmm3, xmm0
        subps   xmm0, xmm2
        xorps   xmm0, xmm1
        subps   xmm0, xmm3
        movups  xmmword ptr [rdi + 16], xmm0
        ret

  * &longleftarrow; Prev post
  * Archive
  * Next post &longrightarrow;