[HN Gopher] Data Compression Drives the Internet. Here's How It ... ___________________________________________________________________ Data Compression Drives the Internet. Here's How It Works Author : the-mitr Score : 82 points Date : 2023-06-09 00:36 UTC (22 hours ago) (HTM) web link (www.quantamagazine.org) (TXT) w3m dump (www.quantamagazine.org) | fguerraz wrote: | Quanta magazine and all these pop science websites need to be | stopped. | | (Not because popular science is bad, but because they do it | badly, and the clickbait is insufferable) | sgt101 wrote: | Caches drive the internet. | | They are orders of magnitude more important than compression. | 99.99% of requests hit a local cache. (1) | | Compression is important too. | | (1) I worked in a telco. Check it out for yourself! | Solvency wrote: | Ok how do they work then? | parl_match wrote: | There's lot of information about this on the open web, and | tons of resources for varying levels of skill. | joshsabol46 wrote: | Please pardon my ignorance... | | My understanding is that there are 1000s of different compression | algorithms, each with their own pros/cons dependent on the type | and characteristics of the file. And yet we still try to pick the | "generically best" codec for a given file (ex. PNG) and then use | that everywhere. | | Why don't we have context-dependent compression instead? | | I'm imagining a system that scans objects before compression, | selects the optimal algorithm, and then encodes the file. The | selected compression algorithm could be prefixed for easy | decompression. | | Compare a single black image that's 1x1 to one that's 1000x1000. | PNGs are 128bytes and 6KB, respectively. However, Run Length | Encoding would compress the latter to a comparable size as the | former. | deepsun wrote: | > selects the optimal algorithm | | Here's the catch: how does the "system" know which algorithm | would be the best? It could try encoding it with multiple | algorithms and see which one is shorter, but that's extra CPU. | | And the "system" can be called acompression algorithm itself. | U2EF1 wrote: | I mean yeah that's basically what high compression solutions | like paq have done, depending on the compression level | desired apply increasingly speculative and computationally | intensive models to the block and pick whichever one worked | the best. | denvaar wrote: | > The first step involves yet another clever strategy for | identifying repetition and thereby compressing message size, but | the second step is to take the resulting compressed message and | run it through the Huffman process. | | I wonder if this "first step" is Burrows-Wheeler Transform? | | Side note: In Silicon Valley (the show), I'm pretty sure that | Richard has a picture of David Huffman by his bedside. | Scaevolus wrote: | No, BWT is largely unused in modern compression codecs. | | Lempel-Ziv is the basis for almost all modern general purpose | compression, and works more like having a hash table mapping 3 | or 4 byte fragments to their positions, and walking through the | input byte by byte checking the hash table for matches and | inserting the latest fragments&positions into the hash table. | | BWT has nearly identical speed compressing and decompressing, | but searching for matches to compress is _much_ slower than | simply copying data according to instructions to decompress. | pseudotrash wrote: | I came here to find Silicon Valley references and wasn't | disappointed. | devsegal wrote: | Both of us have been satisfied | hinkley wrote: | at the same time? | ThrowawayTestr wrote: | I don't like that show because whenever I watch it I start | imagining a world with inside-out compression and I get sad | we'll never have it. | gfody wrote: | missed opportunity to explain how compression and prediction are | related, and that the better you can predict the next token the | better your compression gets, then your article gets to mention | GPT hey | optimalsolver wrote: | Apparently compression and intelligence are synonymous: | | https://mattmahoney.net/dc/rationale.html | valenterry wrote: | I think in a sense compression is worse - because not only | you want to correctly predict the next token, you also want | to do it fast, with a minimal but efficient algorithm that | also doesn't require much space / a big dictionary. | | You could think of it as taking a "snapshot" if an AI and | then optimizing the hell out of it for a specific case and | you end up with a good compression algorithm. | hackcasual wrote: | Huffman trees are really cool and a neat little example program | to put together. ___________________________________________________________________ (page generated 2023-06-09 23:01 UTC)