[HN Gopher] Decoding AVIF: Deep dive with cats and imgproxy
       ___________________________________________________________________
        
       Decoding AVIF: Deep dive with cats and imgproxy
        
       Author : progapandist
       Score  : 33 points
       Date   : 2021-08-15 11:27 UTC (1 days ago)
        
 (HTM) web link (evilmartians.com)
 (TXT) w3m dump (evilmartians.com)
        
       | SilverRed wrote:
       | I had a really good go at reading this and trying to understand
       | it but I feel I don't understand a whole lot more about image
       | decoding/encoding than before I started. Like I get the core
       | concepts of key frames, motion vectors and such on a high level
       | but if you asked me to actually create a decoder I wouldn't have
       | a clue where to start.
       | 
       | I feel like I would need a full hour long video on each paragraph
       | of this post to really understand it.
        
       | dylan604 wrote:
       | "Commonly, three numbers are used to specify downsampling:
       | the first is always 4, don't ask me why"
       | 
       | Wow. Someone going to this much detail on explaining how a
       | video/image codec works, and cannot bother learning what the
       | numbers of chroma subsampling mean?
       | 
       | The first number represents the luminance.[0] Even if they know
       | the first number represents luminance, the "don't ask me why" is
       | just horrible on its own. The detail in the image is preserved
       | through the luminance channel. The subsampling in the chroma is
       | much less perceptable to humans, but more more noticeable in the
       | luminance. Therefore, some very smart people learned to cheat the
       | data saved for chroma, but not the luminance. "don't ask me why"
       | in detailed write ups is just bad in so many ways.
       | 
       | [0]https://en.wikipedia.org/wiki/Chroma_subsampling
        
         | jaffathecake wrote:
         | Not sure that explains why the first number has to be 4, which
         | was their point.
        
           | dylan604 wrote:
           | Then they did not look/research very hard. See my response
           | above. I provide a link to someone else's blog that was
           | easily found with a DDG search.
        
             | [deleted]
        
         | vlovich123 wrote:
         | I don't think you're being generous with the author's
         | statement, especially since this is in the section within which
         | he's describing chroma subsampling. The author is stating "We
         | use 4 as a convention. why is that the convention? No one
         | really knows". That seems accurate to me. Do you have a clearer
         | answer? Your Wikipedia link doesn't provide any enlightment
         | AFAICT, although maybe I missed explanation?
        
           | dylan604 wrote:
           | Just before this section the author discusses how the image
           | is broken down into blocks. This section is where the
           | definition of the 4s could have come from, but they left out,
           | for brevity's sake I'm assuming, how those blocks are shaped.
           | 
           | "Now, let's break it down the differences between 4:4:4;
           | 4:2:2 and 4:2:0:
           | 
           | The number of pixels that share color is determined by what
           | type of chroma subsampling it is. Each sample is defined by a
           | block of 8 pixels. The first number refers to the size of the
           | sample and its pattern, which is typically 4 pixels wide. The
           | second number refers to how many pixels in the top row will
           | receive color or chroma sampling. The third number shows how
           | many pixels on the bottom row will receive chroma samples"[0]
           | 
           | The block sizes and sub-sampling methods are also why there
           | are warnings issued when trying to scale an image when the
           | dimensions are not divisible by the block sizes. If you try
           | to scale to an odd number, then the sampling within the
           | blocks is broken. If you scale to a number not divisible
           | evenly by the largest block sizes requested, then you also
           | get issues.
           | 
           | [0] https://blog.westpennwire.com/what-is-chroma-subsampling
        
         | LeoPanthera wrote:
         | You have not explained why the first number is always 4. (In
         | fact, it's not always 4, it just usually is.)
        
         | HALtheWise wrote:
         | One thing I never understood is why _downsampling_ is the most
         | efficient way to compress the data about chroma into fewer bits
         | while maximizing perceptual accuracy. It really seems like for
         | any given target bitrate for the chroma data, there should
         | always be a more efficient compression scheme available than
         | simply throwing out 3/4 of the pixels and running compression
         | algorithms on the rest. Surely modern compression can do better
         | with a continuous low pass filter or a adaptive compression
         | scheme that focuses data on interesting edges or something?
         | Maybe someone here can better explain the intuition for this.
         | I'm similarly curious for resolution in general (i.e. why does
         | 480p upsampled ever look better than 1080p at the same bitrate)
         | but chroma seems like a good place to start.
        
           | Scaevolus wrote:
           | JPEG XL doesn't perform chroma subsampling in its native
           | color space of XYB. https://cloudinary.com/blog/how_jpeg_xl_c
           | ompares_to_other_im...
        
           | dylan604 wrote:
           | >Surely modern compression can do better
           | 
           | I "surely" look forward to your Show HN write up on your new
           | compression algorithm. We've been iteratively getting better
           | at compression for some time now. It seems like everytime it
           | looks like we've wrung every bit out of DCT, someone comes up
           | with some a little more clever. Wavelets looked promising,
           | but never took off.
           | 
           | >why does 480p upsampled ever look better than 1080p at the
           | same bitrate
           | 
           | That's a very vague question. Are you stating that you think
           | 480p upsampled to 1080p at 1.5Mbps looks better than a source
           | at 1080p at 1.5Mbps? I have a hard time believing this to be
           | true.
           | 
           | To understand why the chroma is sub-sampled and not the
           | luminance has to do with how the cones/rods in the eyes work.
           | There's a lot of things you can get away with (or trick if
           | you will) the brain in what it is seeing. Is it better to
           | lose half the height or half the width? Is it better loose
           | more red than green or blue?
        
         | cycomanic wrote:
         | Just after he says: "4:2:0 is the most popular case. Four luma
         | samples per one chroma" so he does understand and write what it
         | means. That does not explain why the value is 4.
         | 
         | I assume it is because with 4 you have 3 different subsampling
         | ratios (if you want to keep factors of two, which you typically
         | want to keep algorithms simple)
        
         | keithwinstein wrote:
         | Poynton has a pretty plausible-sounding explanation here
         | (https://poynton.ca/PDFs/Chroma_subsampling_notation.pdf):
         | 
         | "The commonly used leading digit of 4 is a historical reference
         | to a sample rate roughly four times the NTSC or PAL color
         | subcarrier frequency; the notation originated when subcarrier-
         | locked sampling was under discussion for component video. Upon
         | the adoption of component video sampling at 13.5 MHz, the first
         | digit came to specify luma sample rate relative to 3 3/8 MHz.
         | HDTV was once supposed to be described as 22:11:11! Since then,
         | the leading digit has - thank-fully - come to be relative to
         | the sample rate in use. Until recently, the initial digit was
         | always 4, since all chroma ratios have been powers of two - 4,
         | 2, or 1. However, 3:1:1 subsampling has been commercialized in
         | an HDTV production system (Sony's HDCAM), so 3 may now appear
         | as the leading digit. By convention, a leading digit of 2 is
         | never used."
        
         | zinekeller wrote:
         | Okay, the _real_ reason, as far as the bundles of paper I have*
         | is accurate, is that digital chroma subsampling was first
         | invented for MUSE, a Japanese analogue HD video standard (with
         | pre-broadcast digital components). They chose four for
         | horizontal because it 's relatively easy to manipulate using
         | their digital systems at the time and two for vertical so that
         | it's easy to handle interlacing stuff. Unfortunately, I'm not
         | Sony or NHK so I can't say for certain why not eight or any
         | other powers of two. Also, Americans (aka the SMPTE) set the
         | 1,080 lines (the Japanese standard is 1,025), the 16:9
         | compromise (between the European and Japanese 15:9 and cinema
         | 21:9) and the "limited RGB" dilemma that is experienced in
         | digital video systems (that's literally from the days of NTSC
         | signalling!). Both the Japanese NHK/Sony MUSE system and the
         | British IBA (adopted as European) D-MAC system uses the full-
         | range 8-bit system that is used for JPEG (pre-broadcast to
         | analogue, of course).
         | 
         | Analogous to this, the reason why CD audio is 44,100 Hz is
         | because that's the commonality between NTSC (System M, 525-line
         | 480-visible 60-Hz) and 625-line (576-visible 50-Hz) systems.
         | Digital audio was literally stored on U-matic systems at the
         | time, and it was initially only 14-bit PCM rather than the
         | 16-bit PCM of CDs.
         | 
         | * or rather, my employer's mini-library.
        
       ___________________________________________________________________
       (page generated 2021-08-16 23:00 UTC)