[HN Gopher] Timsort (2019)
       ___________________________________________________________________
        
       Timsort (2019)
        
       Author : akbarnama
       Score  : 121 points
       Date   : 2022-07-29 16:00 UTC (7 hours ago)
        
 (HTM) web link (skerritt.blog)
 (TXT) w3m dump (skerritt.blog)
        
       | pizlonator wrote:
       | "you've never heard of" is a bit presumptuous. I've heard of it a
       | lot, nothing but good things.
        
         | hangonhn wrote:
         | I get annoyed with titles like that. "Never heard of", "Didn't
         | know.", etc. I know people want clicks but it sometimes feel
         | normal person to person courtesy aren't valued when the medium
         | is the Internet.
         | 
         | It's especially "dangerous" in this case because the people who
         | would even care about sorting algorithms are going to contain a
         | subset of people who knows a lot about sorting. Those people
         | are likely to know about Timsort. It's like going to a heart
         | surgery conference and having a presentation about "This one
         | heart surgery method you never knew about." Good chance at
         | least a few people in the audience would know and may have
         | invented it.
        
         | Beldin wrote:
         | Then I have an interesting tidbit for you: its sorting wasn't
         | 100% correct. Should be fixed now though.
         | 
         | http://www.envisage-project.eu/proving-android-java-and-pyth...
        
           | pizlonator wrote:
           | I had heard of that! I just don't count a bug as a bad thing.
           | Maybe the opposite. Software that has no bugs usually
           | achieves this feat by not having any users. Timsort has tons
           | of users, hence tons of attention, hence someone found a bug.
           | That's great!
        
         | [deleted]
        
       | lupire wrote:
        
       | dktalks wrote:
       | This blog steals content from other sites, original is at
       | https://hackernoon.com/timsort-the-fastest-sorting-algorithm....
       | 
       | Unfortunately I cannot downvote.
       | 
       | Please don't link plagiarized content. This guy also linked to
       | his own "Big O Notation" in his page, where he says O(n) is
       | polynomial
       | 
       | >>>In our shopping list example, in the worst-case of our
       | algorithm it prints out every item in the list sequentially.
       | Since there are n items in the list, it takes O(n) polynomial
       | time to complete the algorithm.
        
         | TameAntelope wrote:
         | There are no downvotes on submissions, FWIW.
        
         | rat9988 wrote:
         | O(n) is polynomial and it's the same author.
        
         | caslon wrote:
         | It's the same author; compare the URL to your link's author's
         | last name.
        
         | thebooktocome wrote:
         | Well, linear functions are still technically polynomials...
         | haha!
        
         | [deleted]
        
         | indigo945 wrote:
         | This may be the same author ... Anyway f(n)=n is definitely a
         | polynomial and O(n) is a class of algorithms that run in
         | polynomial time (i.e. are in P).
        
       | Beldin wrote:
       | Interestingly, Timsort used to be broken [1]. They found that by
       | trying to prove correctness. I recall Stijn de Gouw remarking
       | that they wanted to have their proof artefact evaluated, but
       | you'd need a seriously beefy machine to do so.
       | 
       | [1] http://www.envisage-project.eu/proving-android-java-and-
       | pyth...
        
         | masklinn wrote:
         | Timsort was not broken, the implementations were, which is a
         | slightly (but crucially) different issue: the implementations
         | don't / didn't uphold the algorithm's invariants at the upper
         | edges.
         | 
         | Not only that, but the cpython's implementation's misbehaviour
         | effectively couldn't be triggered because you couldn't create
         | an array large enough to reach it.
        
         | _old_dude_ wrote:
         | There is a follow up, the fix used in Java was not good enough
         | [2].
         | 
         | [2] https://bugs.openjdk.org/browse/JDK-8203864
        
       | gavinray wrote:
       | I thought it was widely known, IIRC a number of languages have
       | adopted it as the stdlib default sorting algorithm for either
       | unstable/stable sort (can't remember which is it?)
       | 
       | I'm not big into algorithms but I believe that pdqsort and
       | timsort were supposed to be the best two general sorts.
        
         | [deleted]
        
           | [deleted]
        
         | masklinn wrote:
         | > either unstable/stable sort (can't remember which is it?)
         | 
         | Stable. It's a hybrid mergesort.
         | 
         | Some of the langages which adopted it after Python are Java,
         | Rust, or Swift.
        
         | rufus_foreman wrote:
         | Java uses it in Arrays.sort:                    * <p>The
         | implementation was adapted from Tim Peters's list sort for
         | Python          * (<a href="http://svn.python.org/projects/pyth
         | on/trunk/Objects/listsort.txt">          * TimSort</a>).  It
         | uses techniques from Peter McIlroy's "Optimistic          *
         | Sorting and Information Theoretic Complexity", in Proceedings
         | of the          * Fourth Annual ACM-SIAM Symposium on Discrete
         | Algorithms, pp 467-474,          * January 1993.
        
         | valbaca wrote:
         | Java and Python IIRC
        
       | linsomniac wrote:
       | Among the previous discussions of timsort:
       | 
       | https://news.ycombinator.com/item?id=21196555
       | 
       | https://news.ycombinator.com/item?id=17436591
       | 
       | https://news.ycombinator.com/item?id=17436591
        
         | g42gregory wrote:
         | Apparently a lot of people heard of it. :-) I just love these
         | headlines.
        
           | linsomniac wrote:
           | Heard of it? I've bought tim an Icelandic hamburger!
        
             | eunoia wrote:
             | Story time, please :)
        
         | dang wrote:
         | Thanks! Macroexpanded:
         | 
         |  _Beating TimSort at Merging_ -
         | https://news.ycombinator.com/item?id=27823180 - July 2021 (69
         | comments - including
         | https://news.ycombinator.com/threads?id=tim-peters - maybe luck
         | will strike again...)
         | 
         |  _Timsort, the Python sorting algorithm_ -
         | https://news.ycombinator.com/item?id=21196555 - Oct 2019 (131
         | comments)
         | 
         |  _On the Worst-Case Complexity of TimSort_ -
         | https://news.ycombinator.com/item?id=17883461 - Aug 2018 (74
         | comments)
         | 
         |  _Timsort is a sorting algorithm that is efficient for real-
         | world data_ - https://news.ycombinator.com/item?id=17436591 -
         | July 2018 (77 comments)
         | 
         |  _Functional verification with mechanical proofs of TimSort
         | [pdf]_ - https://news.ycombinator.com/item?id=9778243 - June
         | 2015 (1 comment)
         | 
         |  _Timsort_ - https://news.ycombinator.com/item?id=3214527 - Nov
         | 2011 (27 comments)
         | 
         |  _Visualising Sorting Algorithms: Python 's timsort_ -
         | https://news.ycombinator.com/item?id=2092594 - Jan 2011 (3
         | comments)
         | 
         |  _Java has switched from Mergesort to TimSort_ -
         | https://news.ycombinator.com/item?id=752677 - Aug 2009 (13
         | comments)
         | 
         |  _Visualizing Sorting Algorithms_ -
         | https://news.ycombinator.com/item?id=750858 - Aug 2009 (3
         | comments)
        
       | CJefferson wrote:
       | Closely related is pattern defeating quicksort (
       | https://github.com/orlp/pdqsort ), which adapts quicksort to take
       | advantage of sorted runs. I've adapted a few quicksorts to
       | pdqsort and seen good speedups (as people were often sorting
       | partially sorted data)
       | 
       | Basically: Timsort is to mergesort as pdqsort is to quicksort
        
         | masklinn wrote:
         | > Timsort is to mergesort as pdqsort is to quicksort
         | 
         | It's the other way around, since timsort is a decade older than
         | pdqsort.
        
           | dragonwriter wrote:
           | Age has nothing to do with the accuracy of the analogy, which
           | says nothing about the temporal relationship between timsort
           | and pdqsort.
        
         | BizarroLand wrote:
         | My favorite oddball sorting algorithm is StalinSort. It runs in
         | (O)1 time, making it probably the fastest sorting algorithm in
         | the world.
         | 
         | It iterates through the list, and deletes every item in the
         | list that isn't ordered. The remaining list is therefore
         | automatically ordered.
        
           | magicalhippo wrote:
           | Hadn't heard of that one, and looking it up I stumbled upon
           | this[1] delightful collection of esoteric sorting algorithms.
           | Perfect for a Friday evening...
           | 
           | [1]: https://towardsdatascience.com/esoteric-sort-algorithms-
           | and-...
        
       | jsnk wrote:
       | I was hoping to see some benchmark numbers
        
         | varenc wrote:
         | Creating a single benchmark seems tricky since Timsort's key
         | strength is it performs particularly well on real world data
         | where subsequences of the input are often already sorted
         | (runs). With random data it should have O(n log n) time like
         | any other comparison sort.
         | 
         | So with benchmarking the challenge is coming up with a set of
         | real world-like test data that feels representative but isn't
         | just excessively playing to Timsort's strengths. Not everyone's
         | "real world" data is the same.
        
           | hinkley wrote:
           | I'm trying to recall what the expected level of clustering is
           | even for properly random inputs.
           | 
           | A few years before Timsort there was someone randomly
           | shuffling the input in order to avoid the reverse-order
           | corner case that kills so many sort algorithms.
           | 
           | For truly random inputs, about half of the entries should be
           | pairs that are partially ordered, and 25% runs of 3, no? And
           | if memory serves Timsort also has a fast path for strictly
           | decrementing runs as well, where it amortizes some of the
           | comparisons done in the scan phase to do a blind reverse()
           | call.
        
           | sshine wrote:
           | This kind of almost sorted data is easily synthesised with
           | any degree of already-sorted-ness.
        
             | varenc wrote:
             | Certainly! But then how do you decide what degree of
             | already-sorted-ness fairly represents real world data?
        
               | not2b wrote:
               | By collecting a lot of real world data?
        
         | lifthrasiir wrote:
         | CPython source code contains a separate documentation [1] for a
         | detailed description and discussion for the algorithm. TimSort
         | optimizes for the number of comparisons (which are expensive in
         | Python since each comparison can call a Python function) so all
         | numbers are given as such.
         | 
         | [1]
         | https://github.com/python/cpython/blob/main/Objects/listsort...
        
           | hinkley wrote:
           | Number of comparisons is a particular sore point for me.
           | Nobody is sorting integers. This is the goddamned 21st
           | century, not 1980's era video games. Just stop.
           | 
           | Show me a sort algorithm that is still efficient with five
           | pointer indirections in compare() and I'm satisfied. Show me
           | primitive data types and I start to wonder what you're hiding
           | (read: lying about).
        
             | elcomet wrote:
             | What are you talking about? I sort integers every day
        
       | Rapzid wrote:
       | Needs a 2005 in the title.
        
       | tpoacher wrote:
       | > built for the real world -- not constructed in academia.
       | 
       | Why would lacking in evidence and rigor be considered a plus?
       | 
       | Does Timsort pride itself of being the homeopathy of sorting
       | algorithms?
       | 
       | What a bizzare brag to make.
        
         | masklinn wrote:
         | > Why would lacking in evidence and rigor be considered a plus?
         | 
         | The point it's making is about heuristics rather than
         | theoretical complexity. Tim Peters provided plenty of evidence
         | and rigorous work when he proposed it.
         | 
         | That you interpret it as "lack of evidence and rigor" is really
         | about you.
        
         | jamestimmins wrote:
         | I actually found this valuable, without being a "brag". The
         | best type of sort for real-world problems is build with an
         | understanding of real-world datasets. The types of questions
         | and research that interest academia, while often valid, are
         | distinct from those that concern practical engineers.
        
         | scrozier wrote:
         | I had the same thought at first. But I think the point is that
         | this sort of heuristically cobbled together sort wouldn't
         | typically be the stuff of academic research, brilliant and
         | practical though it is.
        
       | threatofrain wrote:
       | Also interesting is pdqsort, which will become Go's standard
       | sort.
       | 
       | https://news.ycombinator.com/item?id=31106157
       | 
       | https://news.ycombinator.com/item?id=31098822
        
       | 2143 wrote:
       | It's well known.
       | 
       | Nevertheless, the article was good.
       | 
       | If the author is reading this, consider dropping the "you've
       | never heard of" from the title; you don't want to assume things
       | about your reader.
        
         | [deleted]
        
         | lifthrasiir wrote:
         | "lesser-known" is probably more appropriate choice, given that
         | it is not a kind of algorithm you'd heard from CS curricula.
        
         | scrozier wrote:
         | Sometimes a little colorful language makes dry topics a bit
         | more fun. I don't think the title was meant to be statistically
         | analyzed. :-)
        
         | pclmulqdq wrote:
         | I've done this too with blog posts, and it works okay, although
         | many readers don't like it. I wanted a snappier way to say "you
         | didn't learn about in school," and I think this author did too.
        
       | earthicus wrote:
       | Here [1] is a great (short, readable) paper from 2015 that
       | explains Timsort and similar stack-merge sorting algorithms. It
       | also gives a runtime analysis, and a simplified version of the
       | algorithm that they call 'alpha-sort'.
       | 
       | [1] https://hal-upec-upem.archives-
       | ouvertes.fr/hal-01212839/file...
        
       ___________________________________________________________________
       (page generated 2022-07-29 23:00 UTC)