[HN Gopher] Binary Trees are optimal except when they're not ___________________________________________________________________ Binary Trees are optimal except when they're not Author : jjgreen Score : 20 points Date : 2021-07-21 07:57 UTC (15 hours ago) (HTM) web link (hbfs.wordpress.com) (TXT) w3m dump (hbfs.wordpress.com) | cakoose wrote: | Yup, ignoring the costs of memory/storage access (cache, main | memory, disk) can hurt an algorithm's real-world performance. | Laying your data out in chunks that match the memory/storage | blocks sizes (cache line size, disk block size) can yield much | better performance. | | And while having the algorithm be aware of the chunk sizes in | every layer of your memory hierarchy is optimal, it can be | inconvenient to get all that set up. But what's cool is that you | can often lay your data out in a fractal-like pattern, and get | pretty good performance no matter what the various chunk sizes | are: https://en.wikipedia.org/wiki/Cache-oblivious_algorithm | riggsdk wrote: | Basically algorithmic complexity analysis often ignores the cost | of accessing data because the underlying storage is sometimes | unknown - be it in memory, on a spinning disc or on magnetic | tape. | | Unfortunately that gives suboptimal algorithms because that basic | practically is ignored. You then end up with a bunch of | algorithms that look worse "on paper" but still outperforms the | theoretically optimal ones. | | Often the number of items is also ignored. If you want to | implement a set-like feature but you'll be using less than X | elements, you can often get better performance just using a | linear search through an array than use a full fledged hashed | set. | jsmith45 wrote: | Agreed. There are plenty of cases where an asymptotically | "worse" algorithm performs better than a "better" one for all | practical sizes. | | Like it is very very much possible that the asymptotically | "better" algorithms have coeffcients so high that the break | even point takes like week or more of computation. There are | not many programs where users will tolerate using it with so | much data that operations take that long. | valbaca wrote: | Author seems to have just backed into rediscovering B-trees | | https://en.wikipedia.org/wiki/B-tree | | > To conclude, binary trees are optimal when we ignore the cost | of accessing a node, but they aren't when it becomes costly to | access a node. When we access the nodes on disk, with a high | cost, it becomes interesting to bundles many keys in a node, and | we gave the optimal solution. However, the problem is often | solved quite more directly. We just fit as many keys as possible | in an I/O block. Disks operates on small blocks of (typically) | 512 bytes, but file systems operate in somewhat larger, but | fixed-sized blocks, something like 4 or 8k. A good strategy is | therefore to fit as many keys as possible in that block, since | even if the number of comparisons is large, it will still be | faster than bringing that block into main memory. | bob1029 wrote: | Skiplists are my favorite data structure. There are a few use | cases that they can't satisfy, but for the typical memory-bound | key-value store application, it's a very simple & effective | approach. | | If you have to go to disk, a combination of: splay tree, batching | and append-only log structure are your best bet. ___________________________________________________________________ (page generated 2021-07-21 23:00 UTC)