[HN Gopher] Timsort (2019) ___________________________________________________________________ Timsort (2019) Author : akbarnama Score : 121 points Date : 2022-07-29 16:00 UTC (7 hours ago) (HTM) web link (skerritt.blog) (TXT) w3m dump (skerritt.blog) | pizlonator wrote: | "you've never heard of" is a bit presumptuous. I've heard of it a | lot, nothing but good things. | hangonhn wrote: | I get annoyed with titles like that. "Never heard of", "Didn't | know.", etc. I know people want clicks but it sometimes feel | normal person to person courtesy aren't valued when the medium | is the Internet. | | It's especially "dangerous" in this case because the people who | would even care about sorting algorithms are going to contain a | subset of people who knows a lot about sorting. Those people | are likely to know about Timsort. It's like going to a heart | surgery conference and having a presentation about "This one | heart surgery method you never knew about." Good chance at | least a few people in the audience would know and may have | invented it. | Beldin wrote: | Then I have an interesting tidbit for you: its sorting wasn't | 100% correct. Should be fixed now though. | | http://www.envisage-project.eu/proving-android-java-and-pyth... | pizlonator wrote: | I had heard of that! I just don't count a bug as a bad thing. | Maybe the opposite. Software that has no bugs usually | achieves this feat by not having any users. Timsort has tons | of users, hence tons of attention, hence someone found a bug. | That's great! | [deleted] | lupire wrote: | dktalks wrote: | This blog steals content from other sites, original is at | https://hackernoon.com/timsort-the-fastest-sorting-algorithm.... | | Unfortunately I cannot downvote. | | Please don't link plagiarized content. This guy also linked to | his own "Big O Notation" in his page, where he says O(n) is | polynomial | | >>>In our shopping list example, in the worst-case of our | algorithm it prints out every item in the list sequentially. | Since there are n items in the list, it takes O(n) polynomial | time to complete the algorithm. | TameAntelope wrote: | There are no downvotes on submissions, FWIW. | rat9988 wrote: | O(n) is polynomial and it's the same author. | caslon wrote: | It's the same author; compare the URL to your link's author's | last name. | thebooktocome wrote: | Well, linear functions are still technically polynomials... | haha! | [deleted] | indigo945 wrote: | This may be the same author ... Anyway f(n)=n is definitely a | polynomial and O(n) is a class of algorithms that run in | polynomial time (i.e. are in P). | Beldin wrote: | Interestingly, Timsort used to be broken [1]. They found that by | trying to prove correctness. I recall Stijn de Gouw remarking | that they wanted to have their proof artefact evaluated, but | you'd need a seriously beefy machine to do so. | | [1] http://www.envisage-project.eu/proving-android-java-and- | pyth... | masklinn wrote: | Timsort was not broken, the implementations were, which is a | slightly (but crucially) different issue: the implementations | don't / didn't uphold the algorithm's invariants at the upper | edges. | | Not only that, but the cpython's implementation's misbehaviour | effectively couldn't be triggered because you couldn't create | an array large enough to reach it. | _old_dude_ wrote: | There is a follow up, the fix used in Java was not good enough | [2]. | | [2] https://bugs.openjdk.org/browse/JDK-8203864 | gavinray wrote: | I thought it was widely known, IIRC a number of languages have | adopted it as the stdlib default sorting algorithm for either | unstable/stable sort (can't remember which is it?) | | I'm not big into algorithms but I believe that pdqsort and | timsort were supposed to be the best two general sorts. | [deleted] | [deleted] | masklinn wrote: | > either unstable/stable sort (can't remember which is it?) | | Stable. It's a hybrid mergesort. | | Some of the langages which adopted it after Python are Java, | Rust, or Swift. | rufus_foreman wrote: | Java uses it in Arrays.sort: * <p>The | implementation was adapted from Tim Peters's list sort for | Python * (<a href="http://svn.python.org/projects/pyth | on/trunk/Objects/listsort.txt"> * TimSort</a>). It | uses techniques from Peter McIlroy's "Optimistic * | Sorting and Information Theoretic Complexity", in Proceedings | of the * Fourth Annual ACM-SIAM Symposium on Discrete | Algorithms, pp 467-474, * January 1993. | valbaca wrote: | Java and Python IIRC | linsomniac wrote: | Among the previous discussions of timsort: | | https://news.ycombinator.com/item?id=21196555 | | https://news.ycombinator.com/item?id=17436591 | | https://news.ycombinator.com/item?id=17436591 | g42gregory wrote: | Apparently a lot of people heard of it. :-) I just love these | headlines. | linsomniac wrote: | Heard of it? I've bought tim an Icelandic hamburger! | eunoia wrote: | Story time, please :) | dang wrote: | Thanks! Macroexpanded: | | _Beating TimSort at Merging_ - | https://news.ycombinator.com/item?id=27823180 - July 2021 (69 | comments - including | https://news.ycombinator.com/threads?id=tim-peters - maybe luck | will strike again...) | | _Timsort, the Python sorting algorithm_ - | https://news.ycombinator.com/item?id=21196555 - Oct 2019 (131 | comments) | | _On the Worst-Case Complexity of TimSort_ - | https://news.ycombinator.com/item?id=17883461 - Aug 2018 (74 | comments) | | _Timsort is a sorting algorithm that is efficient for real- | world data_ - https://news.ycombinator.com/item?id=17436591 - | July 2018 (77 comments) | | _Functional verification with mechanical proofs of TimSort | [pdf]_ - https://news.ycombinator.com/item?id=9778243 - June | 2015 (1 comment) | | _Timsort_ - https://news.ycombinator.com/item?id=3214527 - Nov | 2011 (27 comments) | | _Visualising Sorting Algorithms: Python 's timsort_ - | https://news.ycombinator.com/item?id=2092594 - Jan 2011 (3 | comments) | | _Java has switched from Mergesort to TimSort_ - | https://news.ycombinator.com/item?id=752677 - Aug 2009 (13 | comments) | | _Visualizing Sorting Algorithms_ - | https://news.ycombinator.com/item?id=750858 - Aug 2009 (3 | comments) | CJefferson wrote: | Closely related is pattern defeating quicksort ( | https://github.com/orlp/pdqsort ), which adapts quicksort to take | advantage of sorted runs. I've adapted a few quicksorts to | pdqsort and seen good speedups (as people were often sorting | partially sorted data) | | Basically: Timsort is to mergesort as pdqsort is to quicksort | masklinn wrote: | > Timsort is to mergesort as pdqsort is to quicksort | | It's the other way around, since timsort is a decade older than | pdqsort. | dragonwriter wrote: | Age has nothing to do with the accuracy of the analogy, which | says nothing about the temporal relationship between timsort | and pdqsort. | BizarroLand wrote: | My favorite oddball sorting algorithm is StalinSort. It runs in | (O)1 time, making it probably the fastest sorting algorithm in | the world. | | It iterates through the list, and deletes every item in the | list that isn't ordered. The remaining list is therefore | automatically ordered. | magicalhippo wrote: | Hadn't heard of that one, and looking it up I stumbled upon | this[1] delightful collection of esoteric sorting algorithms. | Perfect for a Friday evening... | | [1]: https://towardsdatascience.com/esoteric-sort-algorithms- | and-... | jsnk wrote: | I was hoping to see some benchmark numbers | varenc wrote: | Creating a single benchmark seems tricky since Timsort's key | strength is it performs particularly well on real world data | where subsequences of the input are often already sorted | (runs). With random data it should have O(n log n) time like | any other comparison sort. | | So with benchmarking the challenge is coming up with a set of | real world-like test data that feels representative but isn't | just excessively playing to Timsort's strengths. Not everyone's | "real world" data is the same. | hinkley wrote: | I'm trying to recall what the expected level of clustering is | even for properly random inputs. | | A few years before Timsort there was someone randomly | shuffling the input in order to avoid the reverse-order | corner case that kills so many sort algorithms. | | For truly random inputs, about half of the entries should be | pairs that are partially ordered, and 25% runs of 3, no? And | if memory serves Timsort also has a fast path for strictly | decrementing runs as well, where it amortizes some of the | comparisons done in the scan phase to do a blind reverse() | call. | sshine wrote: | This kind of almost sorted data is easily synthesised with | any degree of already-sorted-ness. | varenc wrote: | Certainly! But then how do you decide what degree of | already-sorted-ness fairly represents real world data? | not2b wrote: | By collecting a lot of real world data? | lifthrasiir wrote: | CPython source code contains a separate documentation [1] for a | detailed description and discussion for the algorithm. TimSort | optimizes for the number of comparisons (which are expensive in | Python since each comparison can call a Python function) so all | numbers are given as such. | | [1] | https://github.com/python/cpython/blob/main/Objects/listsort... | hinkley wrote: | Number of comparisons is a particular sore point for me. | Nobody is sorting integers. This is the goddamned 21st | century, not 1980's era video games. Just stop. | | Show me a sort algorithm that is still efficient with five | pointer indirections in compare() and I'm satisfied. Show me | primitive data types and I start to wonder what you're hiding | (read: lying about). | elcomet wrote: | What are you talking about? I sort integers every day | Rapzid wrote: | Needs a 2005 in the title. | tpoacher wrote: | > built for the real world -- not constructed in academia. | | Why would lacking in evidence and rigor be considered a plus? | | Does Timsort pride itself of being the homeopathy of sorting | algorithms? | | What a bizzare brag to make. | masklinn wrote: | > Why would lacking in evidence and rigor be considered a plus? | | The point it's making is about heuristics rather than | theoretical complexity. Tim Peters provided plenty of evidence | and rigorous work when he proposed it. | | That you interpret it as "lack of evidence and rigor" is really | about you. | jamestimmins wrote: | I actually found this valuable, without being a "brag". The | best type of sort for real-world problems is build with an | understanding of real-world datasets. The types of questions | and research that interest academia, while often valid, are | distinct from those that concern practical engineers. | scrozier wrote: | I had the same thought at first. But I think the point is that | this sort of heuristically cobbled together sort wouldn't | typically be the stuff of academic research, brilliant and | practical though it is. | threatofrain wrote: | Also interesting is pdqsort, which will become Go's standard | sort. | | https://news.ycombinator.com/item?id=31106157 | | https://news.ycombinator.com/item?id=31098822 | 2143 wrote: | It's well known. | | Nevertheless, the article was good. | | If the author is reading this, consider dropping the "you've | never heard of" from the title; you don't want to assume things | about your reader. | [deleted] | lifthrasiir wrote: | "lesser-known" is probably more appropriate choice, given that | it is not a kind of algorithm you'd heard from CS curricula. | scrozier wrote: | Sometimes a little colorful language makes dry topics a bit | more fun. I don't think the title was meant to be statistically | analyzed. :-) | pclmulqdq wrote: | I've done this too with blog posts, and it works okay, although | many readers don't like it. I wanted a snappier way to say "you | didn't learn about in school," and I think this author did too. | earthicus wrote: | Here [1] is a great (short, readable) paper from 2015 that | explains Timsort and similar stack-merge sorting algorithms. It | also gives a runtime analysis, and a simplified version of the | algorithm that they call 'alpha-sort'. | | [1] https://hal-upec-upem.archives- | ouvertes.fr/hal-01212839/file... ___________________________________________________________________ (page generated 2022-07-29 23:00 UTC)