[HN Gopher] From WinZips to Cat GIFs, Jacob Ziv's Algorithms Hav... ___________________________________________________________________ From WinZips to Cat GIFs, Jacob Ziv's Algorithms Have Powered Decades of Compr Author : chha Score : 52 points Date : 2021-04-22 07:34 UTC (15 hours ago) (HTM) web link (spectrum.ieee.org) (TXT) w3m dump (spectrum.ieee.org) | tombert wrote: | I know it was meant as a bit of a goof, but i always thought that | the "look up the file in pi!" idea was a clever way of doing | compression[1], exploiting the fact that pi is normal and thus | any given sequence of numbers will happen _eventually_ , meaning | that any given _file_ that could possibly exist will show up | eventually. Using this, you could theoretically just store two | numbers to compress _anything_ , a beginning point and an end | point for its position in pi. | | I'm aware this would be unbearably slow and the numbers to store | really big files might end up taking more memory than LZMA, so I | don't need a lecture about it, I just thought it was an | interesting idea. | | [1] https://github.com/philipl/pifs | anuragbiyani wrote: | It's a commonly held misconception that Pi is proven to be | normal - it is NOT (it is a conjecture as of now) [1] [2]. | Proving normality of number is a very hard problem, and hardly | any numbers outside of purposefully constructed ones (such as | Champernowne's constant [3]) are proven normal. | | In fact it's not even proven that every digit occurs | _infinitely_ many times in the decimal expansion of Pi. [4] | | So https://github.com/philipl/pifs is wrong in claiming that | all byte stream will exist somewhere in Pi (it's not proven). | Also it's worth calling out that even if Pi was Normal it will | likely take more space to store the indices of two location as | it will for original data itself (for at least majority of the | integers), so it's not much of a "compression" strictly | speaking. It's easy to see how this will work out for a known | normal number - Champernowne's constant [3] -> Unlike Pi, | Champernowne's constant is guaranteed to contain all the | possible natural number sequences, but storing just the | starting index of them in this constant is going to take much | longer than the entire number itself (e.g., number "11" start | at index 12 (1-indexing), number "12" starts at index 14, and | so on - the size of index increases much faster than integer | being looked up itself). | | [1] https://mathworld.wolfram.com/NormalNumber.html | | [2] https://math.stackexchange.com/a/216578 | | [3] Champernowne's constant (in base 10) is the concatenation | of all positive integers and treating them as the decimal | expansion (following "0."): 0.12345678910111213... It can be | trivially seen that it contains all natural number strings. It | is also proven to be Normal in base 10 (which is a stronger | property). See | https://en.wikipedia.org/wiki/Champernowne_constant for | details. | | [4] | https://en.wikipedia.org/wiki/Normal_number#Properties_and_e... | jonny_eh wrote: | How would that be any better than "point to a spot on the | number line"? | crazygringo wrote: | It's not just slow but it wouldn't be compression either. | | On average, the length of the number required to point to the | starting digit will be the same length as the data you're | trying to compress. | | It would be 0% compression in the long run. | treesprite82 wrote: | > On average, the length of the number required to point to | the digit will be the same length as the data you're trying | to compress. It would be 0% compression in the long run | | Worse than 0%. The average position of numbers 0000-9999 in | pi is 9940, for example. | | I believe ideally you'd want a De Bruijn sequence | (https://en.wikipedia.org/wiki/De_Bruijn_sequence - | "optimally short with respect to the property of containing | every string of length n at least once") but even that won't | get 1:1 compression. | 0xcde4c3db wrote: | > The following year, the two researchers issued a refinement, | LZ78. That algorithm became the basis for the Unix compress | program used in the early '80s; WinZip and Gzip, born in the | early '90s; and the GIF and TIFF image formats. | | I don't understand the inclusion of WinZip and gzip here. Those | are based on DEFLATE, which I thought was deliberately derived | from LZ77 and LZSS rather than anything particular to the | LZ78/LZW lineage (for patent reasons). Am I confused about | something? | beagle3 wrote: | I'm finding conflicting web docs about what algorithms pkzip | prior to 2.0 (deflate == gzip) used. There were "shrink" and | "implode", they had some run-length encoding, and some entropy | coding (shannon fano trees, later replaced by huffman coding in | deflate). My vague memory says there was an LZW among them, | with PKZIP 0.8 or something, but I'm not sure. | | What I am sure about, is that the predecessor to PKZIP, | "PKARC", most certainly did use LZW; It was compatible with SEA | ARC, but much, much faster (like 5x or so). SEA sued Phil Katz, | and his response was to drop ARC compatibility and release | PKZIP which was faster and better -- though the incumbent | "deflate" method did not appear until version 2 a few years | later. | forgotmypw17 wrote: | accessible link for nojs clients: | | https://archive.is/Bb1Aw | bobbyi_settv wrote: | Did the word "compression" get cut off due to title length? Or is | it intentionally being compressed in the original title as a | meta-joke? Turns out it's the former | mygoodaccount wrote: | It's lossless too (with the context)! | dogma1138 wrote: | Well quite to opposite it's a pretty good example of lossy | compression. | | Lossless compression has to be lossless outside of any | context or interpretation, it's mathematically reversible. | | This is rather a good analogy of a compression akin to say | MP3. | [deleted] ___________________________________________________________________ (page generated 2021-04-22 23:01 UTC)