[HN Gopher] Inverting a binary tree using x64 assembly ___________________________________________________________________ Inverting a binary tree using x64 assembly Author : teknas Score : 120 points Date : 2022-11-24 15:05 UTC (7 hours ago) (HTM) web link (sanket.tech) (TXT) w3m dump (sanket.tech) | aftbit wrote: | What does it mean to "invert" a binary tree? | tom_ wrote: | "Flip" or "mirror" is probably a better term. It seems the goal | is to swap left and right: | https://leetcode.com/problems/invert-binary-tree/ | dekhn wrote: | Inverting a binary tree is easy; express the tree as a matrix | (laplacian), invert the matrix, then convert that back to a | tree. What the canonical question is asking is not inversion. | | Since too many people were memorizing inversion, I switched to | asking how to evert a binary tree. This leads naturally into a | discussion of the 1:1 relationship between complex numbers and | cohomology sets, I figure if somebody can get that right, they | can be a junior programmer on the team. | raxxorraxor wrote: | x32 converter for GNU/Linux: sed -i 's/r//g' | invert_tree.asm > invet_tee.asm | junon wrote: | x16, you mean :D | teknas wrote: | It's not that easy, you also have to account for the pointer | sizes (4 bytes instead of 8). Also alignment requirements are | different on x32. | badrabbit wrote: | It's not only that either. OP used rdi because of calling | convention, in x32 you would need to arguments on the stack | instead and reference them +ebp instead of using registers if | you want non-asm code to call your function. | ummonk wrote: | This doesn't actually invert a Merkle tree though, since you have | to recompute the hashes (except the leaf hashes) when you invert | a Merkle tree. Gonna be a no-hire evaluation from me dawg. | hinkley wrote: | The tweet this is based on is a joke. To invert a Merkle tree | would mean to invert cause and effect. I'm pretty sure the | tweet author is implying they want you to find a hash key | collision for each node. Hope you have a couple spare universes | in your pockets because this is gonna take a while. | [deleted] | danbruc wrote: | Assume RAX points to the root node and nodes just contain two | child pointers and everything is aligned and whatever. | :invert cmp rax, 0 jnz swap: ret :swap | push rax mov rax, [rax] call invert mov | rbx,[rsp] xchg rax, [rbx+8] mov [rbx], rax call | invert pop rax ret | | The last time I used an assembler was before x86-64 was invented, | I am not even sure I ever used one in protected mode. But that | seems a totally reasonable whiteboard interview question. Written | in notepad, might not assemble. Might even be totally incorrect | and I am posting it so that the internet generates the warning | and error messages. | | EDIT: After reading the article now, that seems rather | inefficient to me, to use local variables on the stack for | everything. And why is the function returning a node if it is | mutating the tree in place? | pyinstallwoes wrote: | Did you write Forth at all? | danbruc wrote: | No. Why do you ask? | | EDIT: As Wikipedia describes it as a stack-oriented language, | because of my comment about putting everything onto the | stack? | adrian_b wrote: | The variables on the stack are the most efficient after | registers, so you are right that a local variable should be | kept into a register if possible, otherwise in the stack, and | only then in other places (e.g. if it is too large for the | stack). | | However, when writing in assembly one must pay attention that | at least RBX, RBP and R12 through R15 must be preserved by any | function (on Windows also RDI and RSI must be preserved). | | So in your code you should not use RBX, but a volatile | register, e.g. RDX or RCX. If you would insist on using RBX, it | would have to be saved and restored. | danbruc wrote: | I know none of the calling conventions in any detail anymore | and just used the registers in alphabetical order. Totally | expected that this would violate something. | PixelOfDeath wrote: | mov cr0, ... | fear91 wrote: | You can use the 32bit xor to reset the register. Also TEST | REG,REG might be better for checking if it's zero. | badrabbit wrote: | Is 32 bit cheaper? Still 1 cycle. | fear91 wrote: | Yes, it's 1 cycle but it's longer to decode and occupies more | of l1i cache. It's not all about execution cycles. | nixpulvis wrote: | I was thinking of this from the perspective of CPU pipeline | pressure, but in reality it seems prosessors are indeed smart | enough to avoid burdoning the ALUs execution with these kinds | of special cases. | | Read more here https://stackoverflow.com/questions/17981447/m | icroarchitectu... | | > [...] these zeroing instructions extremely efficient, with | a throughput of four zeroing instructons per clock cycle. | | Also, the xor instruction takes up the smallest amount of | .text space (right?). | scheme271 wrote: | The xor reg, reg is also special cased because it's a quick | way for compilers to reinitialize a register and indicate | that future uses of that register don't depend on any | previous operations. It helps the cpu to parallelize any | future instructions that use that register since the cpu | knows that those instructions don't depend on anything that | happens before the the xor. | Someone wrote: | It's not special cased _because_ it 's a quick way for | compilers to reinitialize a register and indicate that | future uses of that register don't depend on any previous | operations, it's special cased _to give_ compilers a | quick way to reinitialize a register and indicate that | future uses of that register don 't depend on any | previous operations. | | https://randomascii.wordpress.com/2012/12/29/the- | surprising-... is almost ten years old and thus likely | dated, but still may be educational. | bugfix-66 wrote: | Being able to read and understand x86-64 assembly (or PTX/SASS | for an Nvidia GPU) is much more important than being able to | write it. In practice, even when you're writing assembly, you're | looking at reference assembly generated by a compiler from C code | you wrote. | | Similarly, the reality is that as a professional programmer you | spend no time doing work like leetcode. | | Instead, you spend a lot of time understanding and slightly | modifying (fixing or enhancing/extending) code. | | With the rise of language model code completion systems (e.g., | Microsoft Copilot) even more time will be spent inspecting and | understanding code to find problems. | | With these facts in mind, I have been building a new form of | leetcode: | | https://BUGFIX-66.com | | Most puzzles are interesting algorithms that you will learn | useful techniques from, so it's never a waste of time to think | about them. And even though the bugs are all quite trivial, I can | see it's very challenging for many people. | | It's about half-way ready to launch, needing 30 more puzzles. I | am working my way through Knuth's The Art of Programming Volume | 4B and today I'll see if Algorithm X (Dancing Links Exact Cover | Backtracking) can be made to fit for Bugs 38 and 39 (or whether | it's too complicated). | love2read wrote: | I love seeing people solve leetcode challenges in asm, are there | any more blogposts like this? | chrisseaton wrote: | The hard bit of solving them is usually the algorithm though - | when you know that you can code it in anything. | kjeetgill wrote: | I'm not well versed in assembly, so learning assembly first | would be the hard part! | | I'm with GP, it's fun seeing how solutions differ between | languages as a way to peek into other language communities I | don't spend as much time in. | devit wrote: | That solution is terrible, with a bad algorithm that requires | O(tree_height) space (the optimal one involves temporarily using | left/right pointers as a parent pointer so that you only need | constant space) and lacking any sort of assembly optimization, | being worse than what a compiler would produce (e.g. it's a real | mystery how the author managed to decide that local_right should | be spilled on the stack). | | Definitely not what you want to submit to someone testing your | programming skills. | Kranar wrote: | You're alluding to using the Morris traversal algorithm which | can traverse a binary tree in O(1) space, but Morris traversal | is actually much much slower than using a stack, especially as | is used by this algorithm. Doing a Morris traversal requires at | a minimum twice the number of operations as using a stack, and | due to its cache unfriendly nature will in practice be closer | to 4x slower. | | You typically only use Morris traversal on exceptionally large | trees, and by large I mean when working with data that lives on | a disk. It's definitely the exception, not the norm. | [deleted] | TheRealPomax wrote: | This is pretty damn cool, but unfortunately you failed the | interview as you accepted the challenge to use x86 assembler, but | solved the problem using a different programming language from | the one we asked you to use. We'll keep your resume on file, and | if there are any openings in the future we encourage you to apply | for those. | penguin_booze wrote: | The feedback we received indicates that you used the backspace | key twice during coding. We expect higher precision than that. | | Also, we'd like to remind you that you can only re-apply after | our cool-off period, which is 25 years. | wheelerof4te wrote: | Oooh. So that's why people invented C. | cgh wrote: | > I will be using x64 assembly with the AT&T syntax as it is | objectively superior than the Intel syntax. | | This made me laugh because it must be a reference to this: | https://news.ycombinator.com/item?id=33652023 | | > I contend that the AT&T syntax is harmful and bad, and should | never be used, for any reason, under any circumstances, by | anyone. | tux3 wrote: | >I will be using x64 assembly with the AT&T syntax as it is | objectively superior than the Intel syntax. | | Them's fighting words. | EarlKing wrote: | I immediately stopped reading the minute I read that in the | text. I can't take anything they say seriously after reading | that. | badrabbit wrote: | AT&T really is annoying but it feels like the big vs little | endian debate. Fairly easy to convert between the two as well. | efficax wrote: | except now that everything depends on the internet, and words | that go over networks are big endian, it seems insane to | throw away millions and millions of cpu cycles every year | converting them to little endian to be processed by our | little endian cpus. sure, it's a single cpu instruction, but | between every computer in the world, almost all of them being | little endian arm or intel, that's billions and billions and | billions of instructions wasted. | sedatk wrote: | For those who don't know, that's why big and little endian | were called that, because the debate was so frivolous. It's a | reference to the book Gulliver's Travels by Jonathan Swift in | which an island folk was split about from which end you | should crack a boiled egg. (I'm a big endian for example). | classified wrote: | I'm firmly in the little-endian camp for eggs, but big- | endian for CPUs. | monocasa wrote: | Watch as every pitchfork gets pointed at you when you | talk about middle endian. | | Which is a real thing. There are systems that would | store, say, a 32-bit word as two 16-bit words that were | big endian relative to each other, but little endian | within the 16-bit word. | sedatk wrote: | Found my archnemesis. | nurettin wrote: | Do you even care about CPUs anymore? | classified wrote: | Well, boiled eggs definitely taste better. | dekhn wrote: | Wait, hasn't everybody learned about inside-out endian? The | highest order bit is in the middle, then proceeds along a | hilbert path to the edges | quelsolaar wrote: | I used to be big endian. Then i changed my mind. Have a | look at the following: | | -123 -12 -1 | | Here are three numbers. If we read them from left to right, | they all begin with 1, but we cant tell what the 1 means, | in one case it means 100, in one it means 10 and in one it | means 1. We need to parse the entire number to know what | the first number means. If we parse from right to left, | each number always means the same thing. the first is how | many once, the second how many 10s and so on. | | So it makes sense to store the smallest first. In a big | endian architecture, if i want to read an 8, 16, 32 or 64 | bit word each byte will always mean the same, if we pad out | the type with zeros. So little endian is right, and Arabic | numerals are wrong. | rags2riches wrote: | Arabic numerals are little endian when writing right-to- | left. Like Arabic is written. | quietbritishjim wrote: | It looks like the comment you're replying to was talking | about eggs. I'm little endian by the way. ___________________________________________________________________ (page generated 2022-11-24 23:00 UTC)