[HN Gopher] Brunsli: Practical JPEG repacker (now part of JPEG XL)
       ___________________________________________________________________
        
       Brunsli: Practical JPEG repacker (now part of JPEG XL)
        
       Author : networked
       Score  : 192 points
       Date   : 2020-03-01 13:30 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | crazygringo wrote:
       | Sorry, can someone here clarify --
       | 
       | Does this reduce the size of JPEG files, maintaining JPEG format?
       | 
       | Or is this about repacking JPEG files into JPEG XL files (while
       | reducing size), while maintaining the ability to losslessly
       | revert to JPEG?
       | 
       | The page never explicitly states what format the "repacking" is
       | _into_ , and it has so many references to JPEG XL that it no
       | longer seems obvious that it's into just JPEG?
        
         | donatj wrote:
         | My understanding is it converts normal JPEGs to incompatible
         | JPEG XL files, losslessly.
         | 
         | The README does a poor job explaining this.
        
         | rndgermandude wrote:
         | It does not maintain the JPEG format, tho it will be part of
         | the JPEG XL format.
         | 
         | It is repacking/recoding jpegs in a lossless and reversible
         | manner, so that clients supporting brunsli can be served
         | directly with the optimized version (their apache and nginx
         | modules seem to serve files with a image/x-j mime), and clients
         | without support can be served with the original jpeg file (or
         | served with a brunsli file and decoded with a wasm-
         | brunsli->native jpeg decoder pipeline if wasm is an option),
         | while you only have to keep the brunsli (or original jpeg) file
         | around.
         | 
         | Since JPEG XL isn't finished yet, there still might be minor
         | differences in the end that make the current brunsli format
         | incompatible to the JPEG XL format, so I wouldn't mass-convert
         | your files just yet.
        
       | taf2 wrote:
       | I like the check list for features... can't wait for
       | 
       | "Nginx transcoding module"
        
         | [deleted]
        
       | jncraton wrote:
       | Does anyone know how this compares to other projects such as
       | Lepton?
       | 
       | https://github.com/dropbox/lepton
       | 
       | The goals and results appear similar. Is the primary difference
       | that brunsli will likely actually be part of a standard (JPEG
       | XL)?
        
         | gbanfalvi wrote:
         | And how does it compare against HEIF?
        
           | 0-_-0 wrote:
           | HEIF can't compress JPEG images losslessly
        
             | rndgermandude wrote:
             | Indeed, HEIF/HEIC is basically a slightly dumbed down HEVC
             | (h.265) i-frame (full frame) (HEIC)[1] and new container
             | format (HEIF)[2], similar to WEBP being VP8 i-frames in a
             | RIFF container. So they are used as full blown codecs in
             | practice, usually not in a lossless mode, so shifting JPEG
             | to HEIC or WEBP will lose some quality.
             | 
             | [1] Decoding HEIC in Windows (10) requires you to have
             | installed a compatible HEVC decoder. Which is 99 cents (and
             | the hassle of setting up a store accounts and payment
             | processing with MS) or an alternative free one which will
             | use the HEVC codec that is shipped with hardware such as
             | newer Intels (QSV) or GPUs. Thank you patent mess!
             | 
             | [2] HEIF the container format can contain JPEG data, but in
             | practice does not or only as a supplementary image
             | (previews, pre-renders, thumbnails, etc)
        
             | [deleted]
        
           | ksec wrote:
           | Below 0.5 bpp, both HEIF or BPG [1] and AVIF performs quite a
           | bit better than JPEG XL , XL shines in 0.8 bpp. At least in
           | initial testing.
           | 
           | [1] https://bellard.org/bpg/
        
             | ajnin wrote:
             | Do you have a link for JPEG XL comparisons ? The
             | comparisons on that page are with JPEG XR, which is a
             | different thing.
        
           | duskwuff wrote:
           | Not comparable. Brunsli and Lepton are lossless compressors
           | for JPEG files; HEIF is a completely different lossy image
           | encoder.
           | 
           | To compare the size of a Brunsli/Lepton encoded JPEG file
           | with an HEIF image, you'd need to define some sort of quality
           | equivalence between the two, which gets complicated fast.
        
         | todotask wrote:
         | Have you tried to test on your machine? You'll see various
         | results with lossless/lossy image optimisers.
         | 
         | I got <22% with Brunsli.
        
         | llarsson wrote:
         | Their respective READMEs both claim a 22% size reduction, which
         | sounds like an interesting coincidence. Have they identified a
         | similar inefficiency in the format itself?
        
           | ksec wrote:
           | JPEG's entropy encoding is ancient. Adding modern arithmetic
           | coding can save significant bits without changing the actual
           | visual data.
        
             | robryk wrote:
             | Another inefficiency of JPEG is that each block (8x8 pixels
             | in size) is compressed independently[^]. This means that
             | the correlations between pixels that are adjacent across a
             | block boundary are not used. If I were to take a JPEG image
             | and randomly flip its blocks (mirror them along x and/or y
             | axis), the resulting JPEG would have a very similar
             | filesize, even though it's a much less likely image.
             | 
             | Brunsli and, IIUC, Lepton, make use of these correlations.
             | 
             | [^] the average color of blocks is not compressed strictly
             | independently, but the space used on those is small
             | compared to all the rest of the information about a block
             | 
             | Disclaimer: I've worked on Brunsli.
        
               | mochomocha wrote:
               | Very interesting. This independence across blocks can
               | presumably be leveraged at decode time for faster
               | decoding though. Surely there must be decoders out there
               | parallelizing over the blocks on multi-cores arch /GPU?
               | 
               | Do you know how Brunsli & Lepton fare when it comes to
               | parallelizability?
        
               | robryk wrote:
               | I assume that you mean parallelizability of decoding and
               | not of encoding.
               | 
               | JPEG's decoding is poorly parallelizable: the entropy
               | decoding is necessarily serial; only inverse FFTs can be
               | parallelized.
               | 
               | Sharing the data about boundaries need not hamper
               | parallelizability in its most simple meaning: imagine a
               | format where we first encode some data for each boundary,
               | and then we encode all the blocks that can only be
               | decoded when provided the data for all its four
               | boundaries.
               | 
               | However, what often matters is the total number of non-
               | cacheable memory roundtrips that each pixel/block of the
               | image has to take part in: a large cost during decoding
               | is memory access time. If we assume that a single row of
               | blocks across the whole image is larger than the cache,
               | then any approaches similar to the one I described in the
               | previous paragraph add one roundtrip.
               | 
               | Another consideration is that a single block is often too
               | small to be a unit of parallelization -- parallelizing
               | entropy decoding usually has additional costs in filesize
               | (e.g. for indices), and any parallelization has startup
               | costs for each task.
               | 
               | The end-result is that a reasonably useful approach to
               | parallelization is to split the image into "large blocks"
               | that are large enough to be units of parallelization on
               | their own, and encode _those_ independently.
        
               | londons_explore wrote:
               | That might prove to be a good measure of image
               | compression 'efficiency'.
               | 
               | Present to a user two images, one an image compressed by
               | image compressor X, and one compressed by the same image
               | compressor with a single bit of output flipped.
               | 
               | In an ideal image compression scenario, the decompressed
               | images would not be the same, but a user could not tell
               | which was the 'correct' image, since both would look
               | equally realistic.
        
               | robryk wrote:
               | If a scheme had something like that property and
               | satisfied some simpler conditions, I would wager that it
               | necessarily is a good compression scheme. However, this
               | is very much not required of a good compression scheme:
               | 
               | Imagine that a compression scheme used the first bit to
               | indicate if the encoded image is an image of a cat or
               | not. Changing that bit would then have very obvious and
               | significant implications on the encoded image.
               | 
               | If that example seems too unrealistic, imagine a
               | modification of a compression scheme that, before
               | decoding, xors every non-first bit with the first bit.
               | Then flipping the first bit in the modified scheme is
               | equivalent to flipping a lot of bits in the unmodified
               | scheme, but they are equivalently good at encoding
               | images.
               | 
               | Edit: To put it short, the important property is that
               | equally-long encoded images are "equally plausible": it's
               | not important how many bits differ between them, and it
               | doesn't matter if they are similar to each other.
        
               | Dylan16807 wrote:
               | In the thought experiment, I don't think the user is told
               | beforehand what the image is.
               | 
               | So you flip the cat bit and get an image of a helicopter,
               | and they still can't tell which one is 'correct'.
        
               | robryk wrote:
               | Ah, thank you. I misread the GP. It seems that he is
               | saying nearly[^] exactly what I wanted to say in the
               | edit.
               | 
               | [^] the property should hold not only for single-bit
               | changes, but all length-preserving changes -- it's
               | perfectly fine for all single bitflips to e.g. result in
               | invalid codestreams.
        
         | tpetry wrote:
         | The files created by lepton cant be displayed by any client.
         | Brunsli is a converter for JPEG <-> JPEG XR which is losless,
         | and by the improved jpeg xr algorithms decreases filesize.
         | 
         | The interesting part is you can therefore convert between these
         | formats thousands of times without visual regressions. You
         | could (in a few years) only store the jpeg xr file and your
         | webserver may transcode it to jpeg for legacy browsers.
        
           | thedance wrote:
           | They may still be comparable. If the point of Lepton is to
           | save space, and if a round-trip through Brunsli costs less
           | than Lepton encoding while saving similar amounts of space,
           | then it could be a design alternative.
        
           | ksec wrote:
           | >Brunsli is a converter for JPEG <-> JPEG XR which is
           | losless...
           | 
           | You mean JPEG XL?
           | 
           | JPEG XR is a completely different thing [1]
           | 
           | [1] https://en.wikipedia.org/wiki/JPEG_XR
        
             | DagAgren wrote:
             | Extremely good branding work there, JPEG.
        
               | zuminator wrote:
               | Don't forget JPEG XS and JPEG XT.
               | 
               | https://jpeg.org/jpeg/index.html
        
       | wyoh wrote:
       | > Brunsli has been specified as the lossless JPEG transport layer
       | in the Committee Draft of JPEG XL Image Coding System and is
       | ready to power faster and more economical transfer and storage of
       | photographs.
       | 
       | I thought the lossless part of JPEG XL was done by FUIF, am I
       | misunderstanding something?
        
         | janwas wrote:
         | Lossless transcoding of JPEG bitstreams is done by Brunsli,
         | there is also lossless storage of pixels, based on tech from
         | FUIF plus an adaptive predictor by Alex Rhatushnyak.
        
       | zxcvbn4038 wrote:
       | I understand what your saying but when your storing millions of
       | images and transferring them frequently those small reductions in
       | size are very significant when you get your storage and/or
       | transit bill at end of month. A lot of the fluff and filler you
       | mentioned is compressed and cached, so your not bringing it down
       | every request and not in raw form. Even if you are sending a huge
       | amount of uncachable stuff with each request it doesn't mean
       | savings wouldn't be appreciated. It's funds for a team lunch if
       | nothing else!
        
       | rndgermandude wrote:
       | Preemptive disclaimer: I don't want to belittle the work the
       | authors did here in any way, and are actually excited especially
       | about the reversible, lossless jpeg<->brunsli coding and that
       | google's buy in and network effects will most likely mean this
       | comes to a viewer/editing software near you in the not so distant
       | future (unlike lepton, which never got out of it's tiny niche).
       | 
       | But seeing the 22% improvement figure reminded me that the
       | typical JPEG file on the internet is rather unoptimized even on
       | write-once-read-many services like imgur or i.reddit.com which
       | transform files (stripping meta data etc) and do not preserve the
       | original files. Just using the regular vanilla libjpeg encoder
       | you can usually save 5%-10% just by lossless recoding of the
       | coeffs and the better-yet-more-computionally-intense mozjpeg
       | coder can even get you a bit further than that.
       | 
       | Then again, the imgur single image view page (desktop) I just
       | randomly opened by randomly clicking an item on their homepage
       | transfers 2.9MiB of data with ads blocked (3.9MiB deflated),
       | 385KiB of which was the actual image, and that image can be
       | lossless recoded by mozjpeg to 361KiB (24KiB difference, a 6.2%
       | reduction), so the 24KiB (0.8%) reduction out of 2.9MiB of cruft
       | hardly matters to them I suppose and may be cheaper in bandwidth
       | and storage cost to them than the compute cost (and humans
       | writing and maintaining the code).
       | 
       | Using brunsli, that same 385KiB imgur file went down to 307KiB so
       | roughly a 20% reduction, but still only 2.6% reduction of that
       | massive 2.9MiB the imgur site transferred in total.
        
         | zxcvbn4038 wrote:
         | I get what your saying but when your storing millions of images
         | and transferring them frequently those small reductions in size
         | are very significant when you get your storage and/or transit
         | bill. A.lot of the fluff and filler you mentioned is cached, so
         | your not bringing it down every request. Even if it wasn't you
         | can assume it's deemed becc
        
           | rndgermandude wrote:
           | You'd think so, but imgur for example is literally in the
           | business of serving images to a lot of users (and some
           | videos, too) and yet they did not even implement the vanilla
           | libjpeg optimized coding it seems. And I am not just picking
           | on imgur, they are just an example for today; other similar
           | services didn't do much better last I checked, neither when
           | it comes to serving optimized JPEGs nor overall page sizes
           | either.
        
           | IshKebab wrote:
           | I think that's his point - you can _already_ reduce JPEG size
           | a bit by using fancier libraries, but even huge sites like
           | Imgur apparently don 't bother.
        
         | Santosh83 wrote:
         | Yes not so useful for small images but when users view multi-
         | megabyte images then I guess the savings will start to be
         | significant even with the page overhead. Say a 4Mb image can be
         | optimised to 2.5Mb then total download comes down from 6.9Mb to
         | 5.4Mb which is not trivial, especially when the same page has
         | multiple images (some people upload a gallery of images on
         | sites like imgur) or when viewing lots of image URLs.
         | Especially on a limited data plan these small savings will
         | start adding up.
        
           | rndgermandude wrote:
           | Yes, it would be a win for the users, especially on limited
           | data plans. Though I wouldn't hold my breath for most sites
           | to actually implement it; most sites do not mind serving
           | megabytes of data for a single page view to users, incl
           | megabytes of scripts, even to mobile users.
           | 
           | 4MiB to 2.5MiB is out of range for brunsli, more like 3.2MiB
           | if you're lucky.
           | 
           | I also tried with a 54 MiB JPEG[1] just now. The brunsli
           | coding is 49MiB so not even a 10% reduction on this
           | particular file. And it took a wallclock time of 16 seconds
           | on my last gen Intel. Decoding it back to JPEG took 11
           | seconds on the same box.
           | 
           | [1] A picture of a wedding cake, taken with a NIKON D810, one
           | of the largest JPEGs I had available. I was "exported" by
           | Lightroom 9 it seems from a NEF/RAW source, and is full of
           | meta data too, around 100KiB of it.
        
         | hinkley wrote:
         | I tried to order some takeout the other day and the place was
         | busy so I got voicemail. The beginning was pretty solid, but as
         | he went on you could tell that he was making it up as he went.
         | By the end he was struggling to pull it out of a spiral and it
         | sounded awkward.
         | 
         | This lead to a conversation with my friend about how it was a
         | recording, and I never needed to hear this cut of the message.
         | You could do it over and over until you got it right and I
         | would be none the wiser. But somehow when we record things we
         | feel like it's "out there" and we can't take it back.
         | 
         | Or, we make the reverse mistake and do things "live", resulting
         | in tremendous amounts of resources being spent to redo work
         | that could have been one and done, or really only changes
         | infrequently. In the analog world, or with software.
         | 
         | In the middle on the software side are tools in the vein of
         | continuous improvement. There's that service that will file PRs
         | to fix you dependencies. There should be linters and test
         | fuzzers that do the same in the easy cases.
         | 
         | We have tools to scan the assets we already have and try to
         | precompress them better. New ones like this one arrive from
         | time to time. But doing them prior to publication introduces
         | friction and people push back. And once it ships to our servers
         | we erroneously believe it's too late to change them and I don't
         | know why.
         | 
         | Are we stuck in the old headspace of shrinkwrapped software
         | that you can't change without enormous difficulty? Or is
         | something else going on?
        
       | sischoel wrote:
       | If someone is wondering where the name comes from:
       | https://www.saveur.com/article/Recipes/Basler-Brunsli-Chocol...
       | 
       | Not sure if it is the best recipe - the ones I use are usually
       | written in German.
        
         | waffle_ss wrote:
         | Brotli, another compression algorithm created by Google is also
         | named after a Swiss baked good:
         | https://en.wikipedia.org/wiki/Spanisch_Br%C3%B6tli
        
           | pw6hv wrote:
           | Zopfli also comes from a Swiss bakery product. Developers
           | must like carbohydrates :)
        
             | spider-mario wrote:
             | And let's not forget Gipfeli and Knusperli ;)
             | 
             | https://github.com/google/gipfeli
             | 
             | https://github.com/google/knusperli
        
               | camillovisini wrote:
               | How about grittibanzli?
               | 
               | https://github.com/google/grittibanzli
        
       | wiradikusuma wrote:
       | Could someone explain what a "repacker" is?
       | 
       | What I know: In Photoshop, when I save an image as JPEG, I can
       | decide the "quality" (Low, Medium, High, etc). The lower it is,
       | the smaller the file size but the image will have (more)
       | artifacts. The resulting image can then be opened in any image
       | viewer including browsers.
       | 
       | Also, I was told to save the "master" copy in a lossless format
       | (e.g. TIFF or PNG) because JPEG is a lossy format (like MP3 to
       | WAV).
       | 
       | So how does a "repacker" come to play?
        
         | detaro wrote:
         | > _Brunsli allows for a 22% decrease in file size while
         | allowing the original JPEG to be recovered byte-by-byte._
         | 
         | It's a lossless file compression application that's specialized
         | on compressing a specific file format, so it can beat generic
         | compression tools like gzip.
        
         | mkl wrote:
         | JPEG compression works in two phases:
         | 
         | 1. Discrete Cosine Transform, discarding insignificant
         | information.
         | 
         | 2. Compression of bitstream.
         | 
         | Step 1 is lossy in practice because it throws away things that
         | seem insignificant. The "quality" control determines what
         | counts as insignificant. Step 2 is lossless, and just tries to
         | make the data coming out of Step 1 take up as little space as
         | possible.
         | 
         | A repacker redoes Step 2 better: it makes the file smaller
         | without reducing the quality, by changing how the data is
         | compressed, not changing which parts are kept.
        
       | oefrha wrote:
       | Huh, I always just used good ol' jpegtran:
       | jpegtran -copy none -optimize -progressive -outfile "$image"
       | "$image"
       | 
       | I have a wrapper script around this to allow bulk optimization as
       | well as calculating stats.
       | 
       | Time to switch I guess.
        
         | marton78 wrote:
         | You should use mozjpeg instead. It's a jpegtran fork and drop
         | in replacement which optimizes the Huffman table even better.
        
           | oefrha wrote:
           | Actually, I've been using mozjpeg's version of jpegtran
           | instead of libjpeg's for god knows how long.
        
       ___________________________________________________________________
       (page generated 2020-03-01 23:00 UTC)