[HN Gopher] Brunsli: Practical JPEG repacker (now part of JPEG XL) ___________________________________________________________________ Brunsli: Practical JPEG repacker (now part of JPEG XL) Author : networked Score : 192 points Date : 2020-03-01 13:30 UTC (9 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | crazygringo wrote: | Sorry, can someone here clarify -- | | Does this reduce the size of JPEG files, maintaining JPEG format? | | Or is this about repacking JPEG files into JPEG XL files (while | reducing size), while maintaining the ability to losslessly | revert to JPEG? | | The page never explicitly states what format the "repacking" is | _into_ , and it has so many references to JPEG XL that it no | longer seems obvious that it's into just JPEG? | donatj wrote: | My understanding is it converts normal JPEGs to incompatible | JPEG XL files, losslessly. | | The README does a poor job explaining this. | rndgermandude wrote: | It does not maintain the JPEG format, tho it will be part of | the JPEG XL format. | | It is repacking/recoding jpegs in a lossless and reversible | manner, so that clients supporting brunsli can be served | directly with the optimized version (their apache and nginx | modules seem to serve files with a image/x-j mime), and clients | without support can be served with the original jpeg file (or | served with a brunsli file and decoded with a wasm- | brunsli->native jpeg decoder pipeline if wasm is an option), | while you only have to keep the brunsli (or original jpeg) file | around. | | Since JPEG XL isn't finished yet, there still might be minor | differences in the end that make the current brunsli format | incompatible to the JPEG XL format, so I wouldn't mass-convert | your files just yet. | taf2 wrote: | I like the check list for features... can't wait for | | "Nginx transcoding module" | [deleted] | jncraton wrote: | Does anyone know how this compares to other projects such as | Lepton? | | https://github.com/dropbox/lepton | | The goals and results appear similar. Is the primary difference | that brunsli will likely actually be part of a standard (JPEG | XL)? | gbanfalvi wrote: | And how does it compare against HEIF? | 0-_-0 wrote: | HEIF can't compress JPEG images losslessly | rndgermandude wrote: | Indeed, HEIF/HEIC is basically a slightly dumbed down HEVC | (h.265) i-frame (full frame) (HEIC)[1] and new container | format (HEIF)[2], similar to WEBP being VP8 i-frames in a | RIFF container. So they are used as full blown codecs in | practice, usually not in a lossless mode, so shifting JPEG | to HEIC or WEBP will lose some quality. | | [1] Decoding HEIC in Windows (10) requires you to have | installed a compatible HEVC decoder. Which is 99 cents (and | the hassle of setting up a store accounts and payment | processing with MS) or an alternative free one which will | use the HEVC codec that is shipped with hardware such as | newer Intels (QSV) or GPUs. Thank you patent mess! | | [2] HEIF the container format can contain JPEG data, but in | practice does not or only as a supplementary image | (previews, pre-renders, thumbnails, etc) | [deleted] | ksec wrote: | Below 0.5 bpp, both HEIF or BPG [1] and AVIF performs quite a | bit better than JPEG XL , XL shines in 0.8 bpp. At least in | initial testing. | | [1] https://bellard.org/bpg/ | ajnin wrote: | Do you have a link for JPEG XL comparisons ? The | comparisons on that page are with JPEG XR, which is a | different thing. | duskwuff wrote: | Not comparable. Brunsli and Lepton are lossless compressors | for JPEG files; HEIF is a completely different lossy image | encoder. | | To compare the size of a Brunsli/Lepton encoded JPEG file | with an HEIF image, you'd need to define some sort of quality | equivalence between the two, which gets complicated fast. | todotask wrote: | Have you tried to test on your machine? You'll see various | results with lossless/lossy image optimisers. | | I got <22% with Brunsli. | llarsson wrote: | Their respective READMEs both claim a 22% size reduction, which | sounds like an interesting coincidence. Have they identified a | similar inefficiency in the format itself? | ksec wrote: | JPEG's entropy encoding is ancient. Adding modern arithmetic | coding can save significant bits without changing the actual | visual data. | robryk wrote: | Another inefficiency of JPEG is that each block (8x8 pixels | in size) is compressed independently[^]. This means that | the correlations between pixels that are adjacent across a | block boundary are not used. If I were to take a JPEG image | and randomly flip its blocks (mirror them along x and/or y | axis), the resulting JPEG would have a very similar | filesize, even though it's a much less likely image. | | Brunsli and, IIUC, Lepton, make use of these correlations. | | [^] the average color of blocks is not compressed strictly | independently, but the space used on those is small | compared to all the rest of the information about a block | | Disclaimer: I've worked on Brunsli. | mochomocha wrote: | Very interesting. This independence across blocks can | presumably be leveraged at decode time for faster | decoding though. Surely there must be decoders out there | parallelizing over the blocks on multi-cores arch /GPU? | | Do you know how Brunsli & Lepton fare when it comes to | parallelizability? | robryk wrote: | I assume that you mean parallelizability of decoding and | not of encoding. | | JPEG's decoding is poorly parallelizable: the entropy | decoding is necessarily serial; only inverse FFTs can be | parallelized. | | Sharing the data about boundaries need not hamper | parallelizability in its most simple meaning: imagine a | format where we first encode some data for each boundary, | and then we encode all the blocks that can only be | decoded when provided the data for all its four | boundaries. | | However, what often matters is the total number of non- | cacheable memory roundtrips that each pixel/block of the | image has to take part in: a large cost during decoding | is memory access time. If we assume that a single row of | blocks across the whole image is larger than the cache, | then any approaches similar to the one I described in the | previous paragraph add one roundtrip. | | Another consideration is that a single block is often too | small to be a unit of parallelization -- parallelizing | entropy decoding usually has additional costs in filesize | (e.g. for indices), and any parallelization has startup | costs for each task. | | The end-result is that a reasonably useful approach to | parallelization is to split the image into "large blocks" | that are large enough to be units of parallelization on | their own, and encode _those_ independently. | londons_explore wrote: | That might prove to be a good measure of image | compression 'efficiency'. | | Present to a user two images, one an image compressed by | image compressor X, and one compressed by the same image | compressor with a single bit of output flipped. | | In an ideal image compression scenario, the decompressed | images would not be the same, but a user could not tell | which was the 'correct' image, since both would look | equally realistic. | robryk wrote: | If a scheme had something like that property and | satisfied some simpler conditions, I would wager that it | necessarily is a good compression scheme. However, this | is very much not required of a good compression scheme: | | Imagine that a compression scheme used the first bit to | indicate if the encoded image is an image of a cat or | not. Changing that bit would then have very obvious and | significant implications on the encoded image. | | If that example seems too unrealistic, imagine a | modification of a compression scheme that, before | decoding, xors every non-first bit with the first bit. | Then flipping the first bit in the modified scheme is | equivalent to flipping a lot of bits in the unmodified | scheme, but they are equivalently good at encoding | images. | | Edit: To put it short, the important property is that | equally-long encoded images are "equally plausible": it's | not important how many bits differ between them, and it | doesn't matter if they are similar to each other. | Dylan16807 wrote: | In the thought experiment, I don't think the user is told | beforehand what the image is. | | So you flip the cat bit and get an image of a helicopter, | and they still can't tell which one is 'correct'. | robryk wrote: | Ah, thank you. I misread the GP. It seems that he is | saying nearly[^] exactly what I wanted to say in the | edit. | | [^] the property should hold not only for single-bit | changes, but all length-preserving changes -- it's | perfectly fine for all single bitflips to e.g. result in | invalid codestreams. | tpetry wrote: | The files created by lepton cant be displayed by any client. | Brunsli is a converter for JPEG <-> JPEG XR which is losless, | and by the improved jpeg xr algorithms decreases filesize. | | The interesting part is you can therefore convert between these | formats thousands of times without visual regressions. You | could (in a few years) only store the jpeg xr file and your | webserver may transcode it to jpeg for legacy browsers. | thedance wrote: | They may still be comparable. If the point of Lepton is to | save space, and if a round-trip through Brunsli costs less | than Lepton encoding while saving similar amounts of space, | then it could be a design alternative. | ksec wrote: | >Brunsli is a converter for JPEG <-> JPEG XR which is | losless... | | You mean JPEG XL? | | JPEG XR is a completely different thing [1] | | [1] https://en.wikipedia.org/wiki/JPEG_XR | DagAgren wrote: | Extremely good branding work there, JPEG. | zuminator wrote: | Don't forget JPEG XS and JPEG XT. | | https://jpeg.org/jpeg/index.html | wyoh wrote: | > Brunsli has been specified as the lossless JPEG transport layer | in the Committee Draft of JPEG XL Image Coding System and is | ready to power faster and more economical transfer and storage of | photographs. | | I thought the lossless part of JPEG XL was done by FUIF, am I | misunderstanding something? | janwas wrote: | Lossless transcoding of JPEG bitstreams is done by Brunsli, | there is also lossless storage of pixels, based on tech from | FUIF plus an adaptive predictor by Alex Rhatushnyak. | zxcvbn4038 wrote: | I understand what your saying but when your storing millions of | images and transferring them frequently those small reductions in | size are very significant when you get your storage and/or | transit bill at end of month. A lot of the fluff and filler you | mentioned is compressed and cached, so your not bringing it down | every request and not in raw form. Even if you are sending a huge | amount of uncachable stuff with each request it doesn't mean | savings wouldn't be appreciated. It's funds for a team lunch if | nothing else! | rndgermandude wrote: | Preemptive disclaimer: I don't want to belittle the work the | authors did here in any way, and are actually excited especially | about the reversible, lossless jpeg<->brunsli coding and that | google's buy in and network effects will most likely mean this | comes to a viewer/editing software near you in the not so distant | future (unlike lepton, which never got out of it's tiny niche). | | But seeing the 22% improvement figure reminded me that the | typical JPEG file on the internet is rather unoptimized even on | write-once-read-many services like imgur or i.reddit.com which | transform files (stripping meta data etc) and do not preserve the | original files. Just using the regular vanilla libjpeg encoder | you can usually save 5%-10% just by lossless recoding of the | coeffs and the better-yet-more-computionally-intense mozjpeg | coder can even get you a bit further than that. | | Then again, the imgur single image view page (desktop) I just | randomly opened by randomly clicking an item on their homepage | transfers 2.9MiB of data with ads blocked (3.9MiB deflated), | 385KiB of which was the actual image, and that image can be | lossless recoded by mozjpeg to 361KiB (24KiB difference, a 6.2% | reduction), so the 24KiB (0.8%) reduction out of 2.9MiB of cruft | hardly matters to them I suppose and may be cheaper in bandwidth | and storage cost to them than the compute cost (and humans | writing and maintaining the code). | | Using brunsli, that same 385KiB imgur file went down to 307KiB so | roughly a 20% reduction, but still only 2.6% reduction of that | massive 2.9MiB the imgur site transferred in total. | zxcvbn4038 wrote: | I get what your saying but when your storing millions of images | and transferring them frequently those small reductions in size | are very significant when you get your storage and/or transit | bill. A.lot of the fluff and filler you mentioned is cached, so | your not bringing it down every request. Even if it wasn't you | can assume it's deemed becc | rndgermandude wrote: | You'd think so, but imgur for example is literally in the | business of serving images to a lot of users (and some | videos, too) and yet they did not even implement the vanilla | libjpeg optimized coding it seems. And I am not just picking | on imgur, they are just an example for today; other similar | services didn't do much better last I checked, neither when | it comes to serving optimized JPEGs nor overall page sizes | either. | IshKebab wrote: | I think that's his point - you can _already_ reduce JPEG size | a bit by using fancier libraries, but even huge sites like | Imgur apparently don 't bother. | Santosh83 wrote: | Yes not so useful for small images but when users view multi- | megabyte images then I guess the savings will start to be | significant even with the page overhead. Say a 4Mb image can be | optimised to 2.5Mb then total download comes down from 6.9Mb to | 5.4Mb which is not trivial, especially when the same page has | multiple images (some people upload a gallery of images on | sites like imgur) or when viewing lots of image URLs. | Especially on a limited data plan these small savings will | start adding up. | rndgermandude wrote: | Yes, it would be a win for the users, especially on limited | data plans. Though I wouldn't hold my breath for most sites | to actually implement it; most sites do not mind serving | megabytes of data for a single page view to users, incl | megabytes of scripts, even to mobile users. | | 4MiB to 2.5MiB is out of range for brunsli, more like 3.2MiB | if you're lucky. | | I also tried with a 54 MiB JPEG[1] just now. The brunsli | coding is 49MiB so not even a 10% reduction on this | particular file. And it took a wallclock time of 16 seconds | on my last gen Intel. Decoding it back to JPEG took 11 | seconds on the same box. | | [1] A picture of a wedding cake, taken with a NIKON D810, one | of the largest JPEGs I had available. I was "exported" by | Lightroom 9 it seems from a NEF/RAW source, and is full of | meta data too, around 100KiB of it. | hinkley wrote: | I tried to order some takeout the other day and the place was | busy so I got voicemail. The beginning was pretty solid, but as | he went on you could tell that he was making it up as he went. | By the end he was struggling to pull it out of a spiral and it | sounded awkward. | | This lead to a conversation with my friend about how it was a | recording, and I never needed to hear this cut of the message. | You could do it over and over until you got it right and I | would be none the wiser. But somehow when we record things we | feel like it's "out there" and we can't take it back. | | Or, we make the reverse mistake and do things "live", resulting | in tremendous amounts of resources being spent to redo work | that could have been one and done, or really only changes | infrequently. In the analog world, or with software. | | In the middle on the software side are tools in the vein of | continuous improvement. There's that service that will file PRs | to fix you dependencies. There should be linters and test | fuzzers that do the same in the easy cases. | | We have tools to scan the assets we already have and try to | precompress them better. New ones like this one arrive from | time to time. But doing them prior to publication introduces | friction and people push back. And once it ships to our servers | we erroneously believe it's too late to change them and I don't | know why. | | Are we stuck in the old headspace of shrinkwrapped software | that you can't change without enormous difficulty? Or is | something else going on? | sischoel wrote: | If someone is wondering where the name comes from: | https://www.saveur.com/article/Recipes/Basler-Brunsli-Chocol... | | Not sure if it is the best recipe - the ones I use are usually | written in German. | waffle_ss wrote: | Brotli, another compression algorithm created by Google is also | named after a Swiss baked good: | https://en.wikipedia.org/wiki/Spanisch_Br%C3%B6tli | pw6hv wrote: | Zopfli also comes from a Swiss bakery product. Developers | must like carbohydrates :) | spider-mario wrote: | And let's not forget Gipfeli and Knusperli ;) | | https://github.com/google/gipfeli | | https://github.com/google/knusperli | camillovisini wrote: | How about grittibanzli? | | https://github.com/google/grittibanzli | wiradikusuma wrote: | Could someone explain what a "repacker" is? | | What I know: In Photoshop, when I save an image as JPEG, I can | decide the "quality" (Low, Medium, High, etc). The lower it is, | the smaller the file size but the image will have (more) | artifacts. The resulting image can then be opened in any image | viewer including browsers. | | Also, I was told to save the "master" copy in a lossless format | (e.g. TIFF or PNG) because JPEG is a lossy format (like MP3 to | WAV). | | So how does a "repacker" come to play? | detaro wrote: | > _Brunsli allows for a 22% decrease in file size while | allowing the original JPEG to be recovered byte-by-byte._ | | It's a lossless file compression application that's specialized | on compressing a specific file format, so it can beat generic | compression tools like gzip. | mkl wrote: | JPEG compression works in two phases: | | 1. Discrete Cosine Transform, discarding insignificant | information. | | 2. Compression of bitstream. | | Step 1 is lossy in practice because it throws away things that | seem insignificant. The "quality" control determines what | counts as insignificant. Step 2 is lossless, and just tries to | make the data coming out of Step 1 take up as little space as | possible. | | A repacker redoes Step 2 better: it makes the file smaller | without reducing the quality, by changing how the data is | compressed, not changing which parts are kept. | oefrha wrote: | Huh, I always just used good ol' jpegtran: | jpegtran -copy none -optimize -progressive -outfile "$image" | "$image" | | I have a wrapper script around this to allow bulk optimization as | well as calculating stats. | | Time to switch I guess. | marton78 wrote: | You should use mozjpeg instead. It's a jpegtran fork and drop | in replacement which optimizes the Huffman table even better. | oefrha wrote: | Actually, I've been using mozjpeg's version of jpegtran | instead of libjpeg's for god knows how long. ___________________________________________________________________ (page generated 2020-03-01 23:00 UTC)