hngopher.com

       [HN Gopher] From WinZips to Cat GIFs, Jacob Ziv's Algorithms Hav...
       ___________________________________________________________________
        
       From WinZips to Cat GIFs, Jacob Ziv's Algorithms Have Powered
       Decades of Compr
        
       Author : chha
       Score  : 52 points
       Date   : 2021-04-22 07:34 UTC (15 hours ago)
        
 (HTM) web link (spectrum.ieee.org)
 (TXT) w3m dump (spectrum.ieee.org)
        
       | tombert wrote:
       | I know it was meant as a bit of a goof, but i always thought that
       | the "look up the file in pi!" idea was a clever way of doing
       | compression[1], exploiting the fact that pi is normal and thus
       | any given sequence of numbers will happen _eventually_ , meaning
       | that any given _file_ that could possibly exist will show up
       | eventually. Using this, you could theoretically just store two
       | numbers to compress _anything_ , a beginning point and an end
       | point for its position in pi.
       | 
       | I'm aware this would be unbearably slow and the numbers to store
       | really big files might end up taking more memory than LZMA, so I
       | don't need a lecture about it, I just thought it was an
       | interesting idea.
       | 
       | [1] https://github.com/philipl/pifs
        
         | anuragbiyani wrote:
         | It's a commonly held misconception that Pi is proven to be
         | normal - it is NOT (it is a conjecture as of now) [1] [2].
         | Proving normality of number is a very hard problem, and hardly
         | any numbers outside of purposefully constructed ones (such as
         | Champernowne's constant [3]) are proven normal.
         | 
         | In fact it's not even proven that every digit occurs
         | _infinitely_ many times in the decimal expansion of Pi. [4]
         | 
         | So https://github.com/philipl/pifs is wrong in claiming that
         | all byte stream will exist somewhere in Pi (it's not proven).
         | Also it's worth calling out that even if Pi was Normal it will
         | likely take more space to store the indices of two location as
         | it will for original data itself (for at least majority of the
         | integers), so it's not much of a "compression" strictly
         | speaking. It's easy to see how this will work out for a known
         | normal number - Champernowne's constant [3] -> Unlike Pi,
         | Champernowne's constant is guaranteed to contain all the
         | possible natural number sequences, but storing just the
         | starting index of them in this constant is going to take much
         | longer than the entire number itself (e.g., number "11" start
         | at index 12 (1-indexing), number "12" starts at index 14, and
         | so on - the size of index increases much faster than integer
         | being looked up itself).
         | 
         | [1] https://mathworld.wolfram.com/NormalNumber.html
         | 
         | [2] https://math.stackexchange.com/a/216578
         | 
         | [3] Champernowne's constant (in base 10) is the concatenation
         | of all positive integers and treating them as the decimal
         | expansion (following "0."): 0.12345678910111213... It can be
         | trivially seen that it contains all natural number strings. It
         | is also proven to be Normal in base 10 (which is a stronger
         | property). See
         | https://en.wikipedia.org/wiki/Champernowne_constant for
         | details.
         | 
         | [4]
         | https://en.wikipedia.org/wiki/Normal_number#Properties_and_e...
        
         | jonny_eh wrote:
         | How would that be any better than "point to a spot on the
         | number line"?
        
         | crazygringo wrote:
         | It's not just slow but it wouldn't be compression either.
         | 
         | On average, the length of the number required to point to the
         | starting digit will be the same length as the data you're
         | trying to compress.
         | 
         | It would be 0% compression in the long run.
        
           | treesprite82 wrote:
           | > On average, the length of the number required to point to
           | the digit will be the same length as the data you're trying
           | to compress. It would be 0% compression in the long run
           | 
           | Worse than 0%. The average position of numbers 0000-9999 in
           | pi is 9940, for example.
           | 
           | I believe ideally you'd want a De Bruijn sequence
           | (https://en.wikipedia.org/wiki/De_Bruijn_sequence -
           | "optimally short with respect to the property of containing
           | every string of length n at least once") but even that won't
           | get 1:1 compression.
        
       | 0xcde4c3db wrote:
       | > The following year, the two researchers issued a refinement,
       | LZ78. That algorithm became the basis for the Unix compress
       | program used in the early '80s; WinZip and Gzip, born in the
       | early '90s; and the GIF and TIFF image formats.
       | 
       | I don't understand the inclusion of WinZip and gzip here. Those
       | are based on DEFLATE, which I thought was deliberately derived
       | from LZ77 and LZSS rather than anything particular to the
       | LZ78/LZW lineage (for patent reasons). Am I confused about
       | something?
        
         | beagle3 wrote:
         | I'm finding conflicting web docs about what algorithms pkzip
         | prior to 2.0 (deflate == gzip) used. There were "shrink" and
         | "implode", they had some run-length encoding, and some entropy
         | coding (shannon fano trees, later replaced by huffman coding in
         | deflate). My vague memory says there was an LZW among them,
         | with PKZIP 0.8 or something, but I'm not sure.
         | 
         | What I am sure about, is that the predecessor to PKZIP,
         | "PKARC", most certainly did use LZW; It was compatible with SEA
         | ARC, but much, much faster (like 5x or so). SEA sued Phil Katz,
         | and his response was to drop ARC compatibility and release
         | PKZIP which was faster and better -- though the incumbent
         | "deflate" method did not appear until version 2 a few years
         | later.
        
       | forgotmypw17 wrote:
       | accessible link for nojs clients:
       | 
       | https://archive.is/Bb1Aw
        
       | bobbyi_settv wrote:
       | Did the word "compression" get cut off due to title length? Or is
       | it intentionally being compressed in the original title as a
       | meta-joke? Turns out it's the former
        
         | mygoodaccount wrote:
         | It's lossless too (with the context)!
        
           | dogma1138 wrote:
           | Well quite to opposite it's a pretty good example of lossy
           | compression.
           | 
           | Lossless compression has to be lossless outside of any
           | context or interpretation, it's mathematically reversible.
           | 
           | This is rather a good analogy of a compression akin to say
           | MP3.
        
             | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-04-22 23:01 UTC)