[HN Gopher] Fluxsort: A stable adaptive partitioning comparison ... ___________________________________________________________________ Fluxsort: A stable adaptive partitioning comparison sort Author : signa11 Score : 58 points Date : 2021-07-25 12:03 UTC (10 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | mettamage wrote: | Perhaps others think differently about this but after all the | interview prep I have done, I can appreciate posts like this a | lot. | | It seems I have gained a new appreciation of a topic! Which is | something I am grateful for :) | gigatexal wrote: | Hmm I kinda want to create a Python package for this. | Blikkentrekker wrote: | > _Fluxsort starts out with an analyzer that detects fully sorted | arrays and sorts reverse order arrays using n comparisons. It | also obtains a measure of presortedness and switches to quadsort | if the array is less than 25% random._ | | This makes me wonder? is it worth it to do a neural a.i. sort by | simply training it? | | I assume it would be hard to prove that the resulting a.i. is | actually infallible and always produces a sorted output, but | perhaps it could for practical measures sort faster than most | current algorithms. | somethingAlex wrote: | There's no neural network you could train using supervised | learning to solve sorting in the general case. | | To train a neural network, the training set and test set need | to come from a common probability distribution but the inputs | of sorting algorithms can be random. There's no relation | between what you could possibly train your network on and what | you'd ask it to do. | | Put another way - the input and output of your model would have | zero correlation. There's nothing to train the model to pick up | on. | | Interesting related work: Graves[0] trained a "neural Turing | machine" to encode instructions for a basic sorting algorithm. | | [0] https://arxiv.org/abs/1410.5401 | stingraycharles wrote: | You mean using a neural network for sorting? I think it will be | very expensive, as sorting algorithms usually take low level | performance optimizations such as cache locality into | consideration, and neural networks are a poor fit for this type | of thing. | HelloNurse wrote: | The currently en vogue deep and large neural networks are not | only designed to have an affordable but significant | evaluation cost (probably not competitive with really cheap | techniques, like the partitioning and medians of this | Fluxsort) but also to justify this cost with the recognition | of fairly subtle patterns: wasted on rough and general worst- | case guarantees, positively dangerous with arbitrarily | crafted inputs. | stingraycharles wrote: | Do you have an example of a sorting algorithm that uses a | neural network that's on par with other sorting algorithms | in terms of performance? | | It just seems so counter-intuitive to me that neural | networks are an efficient way to handle this, but I would | be happy if proven wrong. | mlochbaum wrote: | This is BlockQuicksort with stable partitioning, pumped up with | selective benchmarks (if I sound irritated about this, it's | because I've just called out the author[0] on this exact issue). | I understand that worse performance should be expected for a | stable sort, but what possible reason is there to leave out | benchmarks against the most closely related algorithm? | | Worse, a stable version of BlockQuicksort is pointless, which is | to say that it is actually impossible for there to be a practical | use case for Fluxsort. The branchless partitioning technique is | relevant only if comparison doesn't depend on branching, which | means the input array needs to consist machine-native types, and | in fact Fluxsort limits itself to integers and long double. On | these inputs stability isn't an observable property! The result | of Fluxsort will be indistinguishable from that of any other | quicksort (okay, I guess it will keep negative and positive zeros | in the original order for floats, but an integer-based comparison | could also accomplish this). So all you get is a slower sort that | uses more memory. There _are_ use cases for stable partitioning, | such as sort-by or arg-sort, but as written Fluxsort can 't do | these. | | The statistical method to decide whether to use quicksort or | Quadsort--checking the ratio of comparisons between adjacent | pairs of elements that are smaller versus greater--is really | interesting. It applies to the mergesort versus quicksort problem | in general. | | The use of the ptx pointer is definitely not novel and I think | it's a stretch to even describe it as a brain-twister. It's the | obvious way to do branchless stable partitioning. | | This is good work! It's well-written and well-explained, and | clearly points in the direction of practical applications. | However, it's presented in a way that suggests that it's a good | drop-in sorting algorithm, when instead it's more of a research | effort. Beware! | | [0] https://news.ycombinator.com/item?id=27793636 | mlochbaum wrote: | Okay, with scandum's prodding I've looked into this more deeply | and need to add much more context. Firstly, I was very much | wrong and apologies are in order. What I wrote was based on the | assumption that Fluxsort trades speed for stability. It | doesn't, apparently: the partitioning seems to be faster and | _also_ stable. Possibly more so than the benchmarks show. It 's | a crude measure, but I ran perf top during benchmarks, and | pdqsort spent significantly more time in its partition than | Flux did in its own. So the only real drawback is O(n) rather | than O(log(n)) memory usage. Memory's cheap, although I guess | some users will decide this isn't acceptable. | | A fast partition that _happens_ to be stable is great to have. | The in-place scheme used by pdqsort has one advantage, which | that it naturally puts reverse-sorted data in order. Other than | that it tends to mangle everything. Stable partitioning doesn | 't do this, and might even be able to draw out order from | interleaved sequences that a merge sort (or insertion sort) | could later take advantage of. | | BlockQuicksort uses a fast index generation method to produce | indices for elements that are then swapped. Fluxsort skips all | this and moves an element to _both_ possible positions | immediately; it _then_ uses the comparison result to increment | one partition index allowing the other position to be | overwritten. It 's the simpler organization of stable | partitioning that allows this to be fast. There's also some | cleverness in not immediately moving the second partition to | put it after the first, but instead partitioning it from where | it sits. I've looked into stable partitioning, and I know | branchless algorithms well enough that I could have written | this. But I didn't! | scandum wrote: | I get the impression you wrote your response rather hastily. | | Here's a bench of fluxsort vs pdqsort on strings: | | Name | Items | Type | Best | Average | Compares | | Samples | Distribution | | --------- | -------- | | ---- | -------- | -------- | --------- | ------- | | ---------------- | | fluxsort | 100000 | 64 | | 0.011804 | 0.012044 | 2003293 | 100 | random string | | | pdqsort | 100000 | 64 | 0.012423 | 0.012512 | 1859192 | | 100 | random string | | | If you can sort strings you can sort tables, where stability | matters. | | Here's a graph of relative performance on 100K 32 bit integers. | | https://media.discordapp.net/attachments/737457171377160203/... | | As for the ptx pointer's use, I'm not aware of a prior | implementation. | | As for selective benchmarking, that's a selective accusation. | mlochbaum wrote: | Gah, I did misunderstand some things. The picture is now | worse than I realized! EDIT: and yet also better; see | https://news.ycombinator.com/item?id=27951017 . It's hard to | get it working right but this _is_ a really fast algorithm. | | fluxsort takes a function pointer and _has no way to inline | it_. The fluxsort.c is included from fluxsort.h (this is not | standard style by the way; template files should have a .h | suffix), so the compiler _might_ choose to inline it, and the | benchmarks use noinline on the comparison functions to avoid | this. | | BlockQuicksort makes no sense without inlined comparisons. | Yes, pdqsort allows a comparison function in order to be a | general-purpose sort, but this isn't the case that it | targets. | | Are these benchmarks run with inline comparisons? Seems | pretty hard to get bench.c to inline anything since test_sort | takes a comparison function as input (though I'm working on | it). | scandum wrote: | You can uncomment //#define cmp(a,b) | (*(a) > *(b)) | | in quadsort.h for primitive inline comparisons. | acmj wrote: | My biggest complaint about all scandum's sorting algorithms | is that it doesn't support guaranteed inlined comparisons. | Without this feature, the result depends on the compiler | behavior, making the results hard to interpret. This point | has been raised many times at reddit/hacker news but the | dev is strong minded. | mlochbaum wrote: | I think I mistook the README's comment about ptx as being | about the branchless partition logic when it's talking about | the overall recursive structure (i.e. which arguments are | passed to flux_partition calls). That could be new although I | kind of doubt it. The way it uses the two buffers is fairly | similar to what I've seen called a "ping-pong merge". | mlochbaum wrote: | Got the benchmarks working with a hardcoded 4-byte int | pdqsort. pdqsort is of course faster, but this doesn't tell | us anything about how good fluxsort really is because it | doesn't inline the comparison (I've removed __attribute__ | ((noinline)) and comparisons++ as bench.c suggests, which | improves the speed but doesn't inline it). | | Name | Items | Type | Best | Average | | Loops | Samples | Distribution | | --------- | | -------- | ---- | -------- | -------- | --------- | ------- | | ---------------- | | qsort | 100000 | 32 | | 0.013530 | 0.013901 | 1 | 10 | random order | | | fluxsort | 100000 | 32 | 0.004837 | 0.005009 | | 1 | 10 | random order | | | pdqsort | 100000 | 32 | 0.003661 | 0.003745 | 1 | | 10 | random order | | | Your graph doesn't show how fast pdqsort orders 4-byte | integers. It shows how fast it orders them when required to | use a comparison function, which is an arbitrary restriction | and not what pdqsort is designed for. | scandum wrote: | The graph is with //#define cmp(a,b) ( _(a) > _(b)) in | quadsort.h uncommented. | | Looking forward to your next spin. | mlochbaum wrote: | You're right, with that setting fluxsort beats pdqsort | slightly for either 1e5 or 1e6 random 4-byte ints. I'm | really sorry; I shouldn't have assumed I knew what your | benchmark looked like. Although I guess now I'm even more | confused about why there are no pdqsort benchmarks in the | repository? Beating it is really impressive! | | I'll be running my own timings, but do you know where the | improvement comes from? Is the base case faster than | insertion sort, is the partition faster, or both? I | wasn't expecting a stable partition to beat an unstable | one because it does twice as much data movement, but it | wouldn't be the first time an algorithm using more memory | beats one using less. | scandum wrote: | Main reasons for not benching against pdqsort: | | 1. stability 2. worse performance on long doubles, and I | don't know why 3. A variety of hard to explain | performance differences. 4. pdqsort does better on | generic data, which can be very important. | | So it's tricky to present a fair benchmark when two sorts | behave very differently. | | As to performance advantages of fluxsort: | | 1. It has a faster insertion sort. 2. Branchless | pseudomedian of 15 gives an advantage. 3. Partial loop | unrolling with: while (ptx + 8 < pte) 4. Data movement | should be nearly identical, if not better, with the | recursive calls through the ptx pointer. In the optimal | case the memcpy only triggers when the partition shrinks | below 24 elements, and in half of those cases the | partition will already be in main memory. | | So on random you could expect n / 2 extra data movements | on top of ~ n log n moves. ___________________________________________________________________ (page generated 2021-07-25 23:01 UTC)