hngopher.com

       [HN Gopher] Inverting a binary tree using x64 assembly
       ___________________________________________________________________
        
       Inverting a binary tree using x64 assembly
        
       Author : teknas
       Score  : 120 points
       Date   : 2022-11-24 15:05 UTC (7 hours ago)
        
 (HTM) web link (sanket.tech)
 (TXT) w3m dump (sanket.tech)
        
       | aftbit wrote:
       | What does it mean to "invert" a binary tree?
        
         | tom_ wrote:
         | "Flip" or "mirror" is probably a better term. It seems the goal
         | is to swap left and right:
         | https://leetcode.com/problems/invert-binary-tree/
        
         | dekhn wrote:
         | Inverting a binary tree is easy; express the tree as a matrix
         | (laplacian), invert the matrix, then convert that back to a
         | tree. What the canonical question is asking is not inversion.
         | 
         | Since too many people were memorizing inversion, I switched to
         | asking how to evert a binary tree. This leads naturally into a
         | discussion of the 1:1 relationship between complex numbers and
         | cohomology sets, I figure if somebody can get that right, they
         | can be a junior programmer on the team.
        
       | raxxorraxor wrote:
       | x32 converter for GNU/Linux:                 sed -i 's/r//g'
       | invert_tree.asm > invet_tee.asm
        
         | junon wrote:
         | x16, you mean :D
        
         | teknas wrote:
         | It's not that easy, you also have to account for the pointer
         | sizes (4 bytes instead of 8). Also alignment requirements are
         | different on x32.
        
           | badrabbit wrote:
           | It's not only that either. OP used rdi because of calling
           | convention, in x32 you would need to arguments on the stack
           | instead and reference them +ebp instead of using registers if
           | you want non-asm code to call your function.
        
       | ummonk wrote:
       | This doesn't actually invert a Merkle tree though, since you have
       | to recompute the hashes (except the leaf hashes) when you invert
       | a Merkle tree. Gonna be a no-hire evaluation from me dawg.
        
         | hinkley wrote:
         | The tweet this is based on is a joke. To invert a Merkle tree
         | would mean to invert cause and effect. I'm pretty sure the
         | tweet author is implying they want you to find a hash key
         | collision for each node. Hope you have a couple spare universes
         | in your pockets because this is gonna take a while.
        
       | [deleted]
        
       | danbruc wrote:
       | Assume RAX points to the root node and nodes just contain two
       | child pointers and everything is aligned and whatever.
       | :invert       cmp rax, 0       jnz swap:       ret       :swap
       | push rax       mov rax, [rax]       call invert       mov
       | rbx,[rsp]       xchg rax, [rbx+8]       mov [rbx], rax       call
       | invert       pop rax       ret
       | 
       | The last time I used an assembler was before x86-64 was invented,
       | I am not even sure I ever used one in protected mode. But that
       | seems a totally reasonable whiteboard interview question. Written
       | in notepad, might not assemble. Might even be totally incorrect
       | and I am posting it so that the internet generates the warning
       | and error messages.
       | 
       | EDIT: After reading the article now, that seems rather
       | inefficient to me, to use local variables on the stack for
       | everything. And why is the function returning a node if it is
       | mutating the tree in place?
        
         | pyinstallwoes wrote:
         | Did you write Forth at all?
        
           | danbruc wrote:
           | No. Why do you ask?
           | 
           | EDIT: As Wikipedia describes it as a stack-oriented language,
           | because of my comment about putting everything onto the
           | stack?
        
         | adrian_b wrote:
         | The variables on the stack are the most efficient after
         | registers, so you are right that a local variable should be
         | kept into a register if possible, otherwise in the stack, and
         | only then in other places (e.g. if it is too large for the
         | stack).
         | 
         | However, when writing in assembly one must pay attention that
         | at least RBX, RBP and R12 through R15 must be preserved by any
         | function (on Windows also RDI and RSI must be preserved).
         | 
         | So in your code you should not use RBX, but a volatile
         | register, e.g. RDX or RCX. If you would insist on using RBX, it
         | would have to be saved and restored.
        
           | danbruc wrote:
           | I know none of the calling conventions in any detail anymore
           | and just used the registers in alphabetical order. Totally
           | expected that this would violate something.
        
             | PixelOfDeath wrote:
             | mov cr0, ...
        
       | fear91 wrote:
       | You can use the 32bit xor to reset the register. Also TEST
       | REG,REG might be better for checking if it's zero.
        
         | badrabbit wrote:
         | Is 32 bit cheaper? Still 1 cycle.
        
           | fear91 wrote:
           | Yes, it's 1 cycle but it's longer to decode and occupies more
           | of l1i cache. It's not all about execution cycles.
        
           | nixpulvis wrote:
           | I was thinking of this from the perspective of CPU pipeline
           | pressure, but in reality it seems prosessors are indeed smart
           | enough to avoid burdoning the ALUs execution with these kinds
           | of special cases.
           | 
           | Read more here https://stackoverflow.com/questions/17981447/m
           | icroarchitectu...
           | 
           | > [...] these zeroing instructions extremely efficient, with
           | a throughput of four zeroing instructons per clock cycle.
           | 
           | Also, the xor instruction takes up the smallest amount of
           | .text space (right?).
        
             | scheme271 wrote:
             | The xor reg, reg is also special cased because it's a quick
             | way for compilers to reinitialize a register and indicate
             | that future uses of that register don't depend on any
             | previous operations. It helps the cpu to parallelize any
             | future instructions that use that register since the cpu
             | knows that those instructions don't depend on anything that
             | happens before the the xor.
        
               | Someone wrote:
               | It's not special cased _because_ it 's a quick way for
               | compilers to reinitialize a register and indicate that
               | future uses of that register don't depend on any previous
               | operations, it's special cased _to give_ compilers a
               | quick way to reinitialize a register and indicate that
               | future uses of that register don 't depend on any
               | previous operations.
               | 
               | https://randomascii.wordpress.com/2012/12/29/the-
               | surprising-... is almost ten years old and thus likely
               | dated, but still may be educational.
        
       | bugfix-66 wrote:
       | Being able to read and understand x86-64 assembly (or PTX/SASS
       | for an Nvidia GPU) is much more important than being able to
       | write it. In practice, even when you're writing assembly, you're
       | looking at reference assembly generated by a compiler from C code
       | you wrote.
       | 
       | Similarly, the reality is that as a professional programmer you
       | spend no time doing work like leetcode.
       | 
       | Instead, you spend a lot of time understanding and slightly
       | modifying (fixing or enhancing/extending) code.
       | 
       | With the rise of language model code completion systems (e.g.,
       | Microsoft Copilot) even more time will be spent inspecting and
       | understanding code to find problems.
       | 
       | With these facts in mind, I have been building a new form of
       | leetcode:
       | 
       | https://BUGFIX-66.com
       | 
       | Most puzzles are interesting algorithms that you will learn
       | useful techniques from, so it's never a waste of time to think
       | about them. And even though the bugs are all quite trivial, I can
       | see it's very challenging for many people.
       | 
       | It's about half-way ready to launch, needing 30 more puzzles. I
       | am working my way through Knuth's The Art of Programming Volume
       | 4B and today I'll see if Algorithm X (Dancing Links Exact Cover
       | Backtracking) can be made to fit for Bugs 38 and 39 (or whether
       | it's too complicated).
        
       | love2read wrote:
       | I love seeing people solve leetcode challenges in asm, are there
       | any more blogposts like this?
        
         | chrisseaton wrote:
         | The hard bit of solving them is usually the algorithm though -
         | when you know that you can code it in anything.
        
           | kjeetgill wrote:
           | I'm not well versed in assembly, so learning assembly first
           | would be the hard part!
           | 
           | I'm with GP, it's fun seeing how solutions differ between
           | languages as a way to peek into other language communities I
           | don't spend as much time in.
        
       | devit wrote:
       | That solution is terrible, with a bad algorithm that requires
       | O(tree_height) space (the optimal one involves temporarily using
       | left/right pointers as a parent pointer so that you only need
       | constant space) and lacking any sort of assembly optimization,
       | being worse than what a compiler would produce (e.g. it's a real
       | mystery how the author managed to decide that local_right should
       | be spilled on the stack).
       | 
       | Definitely not what you want to submit to someone testing your
       | programming skills.
        
         | Kranar wrote:
         | You're alluding to using the Morris traversal algorithm which
         | can traverse a binary tree in O(1) space, but Morris traversal
         | is actually much much slower than using a stack, especially as
         | is used by this algorithm. Doing a Morris traversal requires at
         | a minimum twice the number of operations as using a stack, and
         | due to its cache unfriendly nature will in practice be closer
         | to 4x slower.
         | 
         | You typically only use Morris traversal on exceptionally large
         | trees, and by large I mean when working with data that lives on
         | a disk. It's definitely the exception, not the norm.
        
         | [deleted]
        
       | TheRealPomax wrote:
       | This is pretty damn cool, but unfortunately you failed the
       | interview as you accepted the challenge to use x86 assembler, but
       | solved the problem using a different programming language from
       | the one we asked you to use. We'll keep your resume on file, and
       | if there are any openings in the future we encourage you to apply
       | for those.
        
         | penguin_booze wrote:
         | The feedback we received indicates that you used the backspace
         | key twice during coding. We expect higher precision than that.
         | 
         | Also, we'd like to remind you that you can only re-apply after
         | our cool-off period, which is 25 years.
        
       | wheelerof4te wrote:
       | Oooh. So that's why people invented C.
        
       | cgh wrote:
       | > I will be using x64 assembly with the AT&T syntax as it is
       | objectively superior than the Intel syntax.
       | 
       | This made me laugh because it must be a reference to this:
       | https://news.ycombinator.com/item?id=33652023
       | 
       | > I contend that the AT&T syntax is harmful and bad, and should
       | never be used, for any reason, under any circumstances, by
       | anyone.
        
         | tux3 wrote:
         | >I will be using x64 assembly with the AT&T syntax as it is
         | objectively superior than the Intel syntax.
         | 
         | Them's fighting words.
        
           | EarlKing wrote:
           | I immediately stopped reading the minute I read that in the
           | text. I can't take anything they say seriously after reading
           | that.
        
         | badrabbit wrote:
         | AT&T really is annoying but it feels like the big vs little
         | endian debate. Fairly easy to convert between the two as well.
        
           | efficax wrote:
           | except now that everything depends on the internet, and words
           | that go over networks are big endian, it seems insane to
           | throw away millions and millions of cpu cycles every year
           | converting them to little endian to be processed by our
           | little endian cpus. sure, it's a single cpu instruction, but
           | between every computer in the world, almost all of them being
           | little endian arm or intel, that's billions and billions and
           | billions of instructions wasted.
        
           | sedatk wrote:
           | For those who don't know, that's why big and little endian
           | were called that, because the debate was so frivolous. It's a
           | reference to the book Gulliver's Travels by Jonathan Swift in
           | which an island folk was split about from which end you
           | should crack a boiled egg. (I'm a big endian for example).
        
             | classified wrote:
             | I'm firmly in the little-endian camp for eggs, but big-
             | endian for CPUs.
        
               | monocasa wrote:
               | Watch as every pitchfork gets pointed at you when you
               | talk about middle endian.
               | 
               | Which is a real thing. There are systems that would
               | store, say, a 32-bit word as two 16-bit words that were
               | big endian relative to each other, but little endian
               | within the 16-bit word.
        
               | sedatk wrote:
               | Found my archnemesis.
        
               | nurettin wrote:
               | Do you even care about CPUs anymore?
        
               | classified wrote:
               | Well, boiled eggs definitely taste better.
        
             | dekhn wrote:
             | Wait, hasn't everybody learned about inside-out endian? The
             | highest order bit is in the middle, then proceeds along a
             | hilbert path to the edges
        
             | quelsolaar wrote:
             | I used to be big endian. Then i changed my mind. Have a
             | look at the following:
             | 
             | -123 -12 -1
             | 
             | Here are three numbers. If we read them from left to right,
             | they all begin with 1, but we cant tell what the 1 means,
             | in one case it means 100, in one it means 10 and in one it
             | means 1. We need to parse the entire number to know what
             | the first number means. If we parse from right to left,
             | each number always means the same thing. the first is how
             | many once, the second how many 10s and so on.
             | 
             | So it makes sense to store the smallest first. In a big
             | endian architecture, if i want to read an 8, 16, 32 or 64
             | bit word each byte will always mean the same, if we pad out
             | the type with zeros. So little endian is right, and Arabic
             | numerals are wrong.
        
               | rags2riches wrote:
               | Arabic numerals are little endian when writing right-to-
               | left. Like Arabic is written.
        
               | quietbritishjim wrote:
               | It looks like the comment you're replying to was talking
               | about eggs. I'm little endian by the way.
        
       ___________________________________________________________________
       (page generated 2022-11-24 23:00 UTC)