https://orlp.net/blog/subtraction-is-functionally-complete/ * orlp.net * Blog * Tags * Github Subtraction Is Functionally Complete 2023-09-28 To be precise, IEEE-754 floating point subtraction is functionally complete. That means you can construct any binary circuit using nothing but floating point subtraction. To see how, we must start at the bottom. I quote the IEEE 754-2019 standard, section 6.3: 6.3 The sign bit [...] When neither the inputs nor result are NaN, [...]; the sign of a sum, or of a difference $x-y$ regarded as a sum $x+(-y)$, differs from at most one of the addends' signs; [...]. These rules shall apply even when operands or results are zero or infinite. When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be $+0$ under all rounding-direction attributes except roundTowardNegative; under that attribute, the sign of an exact zero sum (or difference) shall be $-0$. Let's dissect that. 1. A subtraction $x - y$ is considered a sum $x + (-y)$. 2. Zero can have a sign, $-0$ and $0$ are distinct entities (although they compare equal when testing with ==). 3. If both of the addends have the same sign, the output must have that sign. However, for $x - y$ that means if $x$ and $y$ have different signs the output must have the sign of $x$. 4. If $x$ and $y$ have the same sign, and $x - y$ is zero, the output will be $+0$ for all rounding modes except roundTowardNegative, then it will be $-0$. Now since the default rounding mode in virtually every context is roundTiesToEven, we shall assume that from now on. However, everything works analogously even for roundTowardNegative. A truth table So, what does that give us when subtracting zeroes? -0 - -0 = +0 # Same sign, must be +0. -0 - +0 = -0 # Different signs, sign from first argument. +0 - -0 = +0 # Different signs, sign from first argument. +0 - +0 = +0 # Same sign, must be +0. Interesting... What if we say that $-0$ is false and $+0$ is true? A B | O ----+-- 0 0 | 1 0 1 | 0 1 0 | 1 1 1 | 1 Our resulting truth table is equivalent to ${A \vee \neg B}$, or ${B \to A}$ (also known as the IMPLY gate, albeit with the arguments swapped). It turns out this truth table is functionally complete, which means we can make arbitrary circuits using only this gate. Technically speaking, it is only functionally complete if given access to the constant false. This is necessary to produce a NOT gate, and NOT + IMPLY is a functionally complete set. I don't know a better term for 'functionally complete if given access to some constant value', however. NAND and NOR are truly functionally complete by themselves, even without access to any particular constant value. This is very valuable when constructing microchips as you only need to be able to produce a single kind of component, and do not need to worry about routing a consistent low signal anywhere to produce a NOT gate. Subtraction circuits Let's build a demo in Python. First we'll define our constants and allow us to print them nicely. Note that even though they are distinct entities, $+0$ and $-0$ compare equal in IEEE 754 floating point, so we must first extract the sign before comparing to 0 to distinguish. import math f_false = -0.0 f_true = 0.0 f_repr = lambda x: True if math.copysign(1, x) > 0 else False We can now make a NOT gate by using the fact that $-0 - x$ flips the sign of zero $x$: f_not = lambda x: f_false - x Let's test that: >>> f_repr(f_not(f_false)) True >>> f_repr(f_not(f_true)) False Great! We can also build an OR gate by noticing that if we flip the sign of the second argument before subtracting, we always get $+0$ (true) unless both arguments are $-0$ (false): f_or = lambda a, b: a - f_not(b) Let's test it out: >>> f_repr(f_or(f_false, f_false)) False >>> f_repr(f_or(f_true, f_false)) True >>> f_repr(f_or(f_false, f_true)) True >>> f_repr(f_or(f_true, f_true)) True Now that we have OR and NOT, we can make all other gates, e.g.: f_and = lambda a, b: f_not(f_or(f_not(a), f_not(b))) f_xor = lambda a, b: f_or(f_and(f_not(a), b), f_and(a, f_not(b))) >>> f_repr(f_and(f_false, f_false)) False >>> f_repr(f_and(f_true, f_false)) False >>> f_repr(f_and(f_false, f_true)) False >>> f_repr(f_and(f_true, f_true)) True >>> f_repr(f_xor(f_false, f_false)) False >>> f_repr(f_xor(f_true, f_false)) True >>> f_repr(f_xor(f_false, f_true)) True >>> f_repr(f_xor(f_true, f_true)) False Software integers You may have heard of soft-float, software implementations of floating point using integers. Let's turn that on its head: an integer implementation done in software, using only floating point ops. Let's do it in Rust so we can look at the final assembly output to see how [DEL:horrifically slow:DEL] awesome it is. type Bit = f32; const ZERO: Bit = -0.0; const ONE: Bit = 0.0; fn not(x: Bit) -> Bit { ZERO - x } fn or(a: Bit, b: Bit) -> Bit { a - not(b) } fn and(a: Bit, b: Bit) -> Bit { not(or(not(a), not(b))) } fn xor(a: Bit, b: Bit) -> Bit { or(and(not(a), b), and(a, not(b))) } fn adder(a: Bit, b: Bit, c: Bit) -> (Bit, Bit) { let s = xor(xor(a, b), c); let c = or(and(xor(a, b), c), and(a, b)); (s, c) } type SoftU8 = [Bit; 8]; pub fn softu8_add(a: SoftU8, b: SoftU8) -> SoftU8 { let (s0, c) = adder(a[0], b[0], ZERO); let (s1, c) = adder(a[1], b[1], c); let (s2, c) = adder(a[2], b[2], c); let (s3, c) = adder(a[3], b[3], c); let (s4, c) = adder(a[4], b[4], c); let (s5, c) = adder(a[5], b[5], c); let (s6, c) = adder(a[6], b[6], c); let (s7, _) = adder(a[7], b[7], c); [s0, s1, s2, s3, s4, s5, s6, s7] } // Hmm? u8? What's that? Shhhh.... pub fn to_softu8(x: u8) -> SoftU8 { std::array::from_fn(|i| if (x >> i) & 1 == 1 { ONE } else { ZERO }) } pub fn from_softu8(x: SoftU8) -> u8 { (0..8).filter(|i| x[*i].signum() > 0.0).map(|i| 1 << i).sum() } fn main() { let a = to_softu8(23); let b = to_softu8(19); println!("{}", from_softu8(softu8_add(a, b))); } It's horrible, but it works, it dutifully prints 42. And it only took $\approx 120$ floating point instructions to add two 8-bit integers: On x86-64 there isn't actually a floating point negation instruction, instead the compiler simply emits a XOR with a mask that toggles the top bit, which is the sign bit of a IEEE-754 floating point number. example::softu8_add: mov rax, rdi movups xmm2, xmmword ptr [rsi] movups xmm0, xmmword ptr [rdx] movaps xmm3, xmm2 subps xmm3, xmm0 movaps xmm4, xmm0 subps xmm4, xmm2 movaps xmm1, xmmword ptr [rip + .LCPI0_0] xorps xmm4, xmm1 subps xmm4, xmm3 xorps xmm3, xmm3 subss xmm3, xmm4 movaps xmm6, xmm0 xorps xmm6, xmm1 subss xmm6, xmm2 xorps xmm6, xmm1 subss xmm6, xmm3 movaps xmm3, xmm4 shufps xmm3, xmm4, 85 movaps xmm5, xmm6 subss xmm5, xmm3 xorps xmm5, xmm1 movaps xmm10, xmm6 xorps xmm10, xmm1 subss xmm10, xmm3 movaps xmm7, xmm0 shufps xmm7, xmm0, 85 xorps xmm7, xmm1 movaps xmm3, xmm2 shufps xmm3, xmm2, 85 subss xmm7, xmm3 xorps xmm7, xmm1 movaps xmm8, xmm4 unpckhpd xmm8, xmm4 movaps xmm3, xmm0 unpckhpd xmm3, xmm0 xorps xmm3, xmm1 movaps xmm9, xmm2 unpckhpd xmm9, xmm2 subss xmm3, xmm9 xorps xmm3, xmm1 xorps xmm11, xmm11 movaps xmm9, xmm4 shufps xmm9, xmm4, 255 subss xmm7, xmm10 movaps xmm10, xmm7 xorps xmm10, xmm1 subss xmm10, xmm8 subss xmm3, xmm10 unpcklps xmm7, xmm3 shufps xmm6, xmm7, 64 addps xmm11, xmm4 movlhps xmm5, xmm4 subps xmm4, xmm6 movss xmm4, xmm11 subps xmm7, xmm8 xorps xmm7, xmm1 shufps xmm5, xmm7, 66 subps xmm5, xmm4 xorps xmm3, xmm1 subss xmm3, xmm9 shufps xmm0, xmm0, 255 xorps xmm0, xmm1 shufps xmm2, xmm2, 255 subss xmm0, xmm2 xorps xmm0, xmm1 movups xmmword ptr [rdi], xmm5 movups xmm2, xmmword ptr [rdx + 16] movaps xmm5, xmm2 xorps xmm5, xmm1 movups xmm7, xmmword ptr [rsi + 16] subss xmm5, xmm7 xorps xmm5, xmm1 movaps xmm4, xmm2 shufps xmm4, xmm2, 85 xorps xmm4, xmm1 movaps xmm6, xmm2 movaps xmm8, xmm7 movaps xmm9, xmm7 subps xmm9, xmm2 subps xmm2, xmm7 shufps xmm7, xmm7, 85 subss xmm4, xmm7 xorps xmm4, xmm1 movhlps xmm6, xmm6 xorps xmm6, xmm1 movhlps xmm8, xmm8 subss xmm6, xmm8 xorps xmm6, xmm1 xorps xmm2, xmm1 subps xmm2, xmm9 subss xmm0, xmm3 movaps xmm3, xmm0 xorps xmm3, xmm1 subss xmm3, xmm2 subss xmm5, xmm3 unpcklps xmm0, xmm5 xorps xmm5, xmm1 movaps xmm3, xmm2 shufps xmm3, xmm2, 85 subss xmm5, xmm3 subss xmm4, xmm5 movaps xmm3, xmm4 xorps xmm3, xmm1 movaps xmm5, xmm2 unpckhpd xmm5, xmm2 subss xmm3, xmm5 subss xmm6, xmm3 unpcklps xmm4, xmm6 movlhps xmm0, xmm4 movaps xmm3, xmm2 subps xmm3, xmm0 subps xmm0, xmm2 xorps xmm0, xmm1 subps xmm0, xmm3 movups xmmword ptr [rdi + 16], xmm0 ret * ⟵ Prev post * Archive * Next post ⟶