[HN Gopher] Decoding AVIF: Deep dive with cats and imgproxy ___________________________________________________________________ Decoding AVIF: Deep dive with cats and imgproxy Author : progapandist Score : 33 points Date : 2021-08-15 11:27 UTC (1 days ago) (HTM) web link (evilmartians.com) (TXT) w3m dump (evilmartians.com) | SilverRed wrote: | I had a really good go at reading this and trying to understand | it but I feel I don't understand a whole lot more about image | decoding/encoding than before I started. Like I get the core | concepts of key frames, motion vectors and such on a high level | but if you asked me to actually create a decoder I wouldn't have | a clue where to start. | | I feel like I would need a full hour long video on each paragraph | of this post to really understand it. | dylan604 wrote: | "Commonly, three numbers are used to specify downsampling: | the first is always 4, don't ask me why" | | Wow. Someone going to this much detail on explaining how a | video/image codec works, and cannot bother learning what the | numbers of chroma subsampling mean? | | The first number represents the luminance.[0] Even if they know | the first number represents luminance, the "don't ask me why" is | just horrible on its own. The detail in the image is preserved | through the luminance channel. The subsampling in the chroma is | much less perceptable to humans, but more more noticeable in the | luminance. Therefore, some very smart people learned to cheat the | data saved for chroma, but not the luminance. "don't ask me why" | in detailed write ups is just bad in so many ways. | | [0]https://en.wikipedia.org/wiki/Chroma_subsampling | jaffathecake wrote: | Not sure that explains why the first number has to be 4, which | was their point. | dylan604 wrote: | Then they did not look/research very hard. See my response | above. I provide a link to someone else's blog that was | easily found with a DDG search. | [deleted] | vlovich123 wrote: | I don't think you're being generous with the author's | statement, especially since this is in the section within which | he's describing chroma subsampling. The author is stating "We | use 4 as a convention. why is that the convention? No one | really knows". That seems accurate to me. Do you have a clearer | answer? Your Wikipedia link doesn't provide any enlightment | AFAICT, although maybe I missed explanation? | dylan604 wrote: | Just before this section the author discusses how the image | is broken down into blocks. This section is where the | definition of the 4s could have come from, but they left out, | for brevity's sake I'm assuming, how those blocks are shaped. | | "Now, let's break it down the differences between 4:4:4; | 4:2:2 and 4:2:0: | | The number of pixels that share color is determined by what | type of chroma subsampling it is. Each sample is defined by a | block of 8 pixels. The first number refers to the size of the | sample and its pattern, which is typically 4 pixels wide. The | second number refers to how many pixels in the top row will | receive color or chroma sampling. The third number shows how | many pixels on the bottom row will receive chroma samples"[0] | | The block sizes and sub-sampling methods are also why there | are warnings issued when trying to scale an image when the | dimensions are not divisible by the block sizes. If you try | to scale to an odd number, then the sampling within the | blocks is broken. If you scale to a number not divisible | evenly by the largest block sizes requested, then you also | get issues. | | [0] https://blog.westpennwire.com/what-is-chroma-subsampling | LeoPanthera wrote: | You have not explained why the first number is always 4. (In | fact, it's not always 4, it just usually is.) | HALtheWise wrote: | One thing I never understood is why _downsampling_ is the most | efficient way to compress the data about chroma into fewer bits | while maximizing perceptual accuracy. It really seems like for | any given target bitrate for the chroma data, there should | always be a more efficient compression scheme available than | simply throwing out 3/4 of the pixels and running compression | algorithms on the rest. Surely modern compression can do better | with a continuous low pass filter or a adaptive compression | scheme that focuses data on interesting edges or something? | Maybe someone here can better explain the intuition for this. | I'm similarly curious for resolution in general (i.e. why does | 480p upsampled ever look better than 1080p at the same bitrate) | but chroma seems like a good place to start. | Scaevolus wrote: | JPEG XL doesn't perform chroma subsampling in its native | color space of XYB. https://cloudinary.com/blog/how_jpeg_xl_c | ompares_to_other_im... | dylan604 wrote: | >Surely modern compression can do better | | I "surely" look forward to your Show HN write up on your new | compression algorithm. We've been iteratively getting better | at compression for some time now. It seems like everytime it | looks like we've wrung every bit out of DCT, someone comes up | with some a little more clever. Wavelets looked promising, | but never took off. | | >why does 480p upsampled ever look better than 1080p at the | same bitrate | | That's a very vague question. Are you stating that you think | 480p upsampled to 1080p at 1.5Mbps looks better than a source | at 1080p at 1.5Mbps? I have a hard time believing this to be | true. | | To understand why the chroma is sub-sampled and not the | luminance has to do with how the cones/rods in the eyes work. | There's a lot of things you can get away with (or trick if | you will) the brain in what it is seeing. Is it better to | lose half the height or half the width? Is it better loose | more red than green or blue? | cycomanic wrote: | Just after he says: "4:2:0 is the most popular case. Four luma | samples per one chroma" so he does understand and write what it | means. That does not explain why the value is 4. | | I assume it is because with 4 you have 3 different subsampling | ratios (if you want to keep factors of two, which you typically | want to keep algorithms simple) | keithwinstein wrote: | Poynton has a pretty plausible-sounding explanation here | (https://poynton.ca/PDFs/Chroma_subsampling_notation.pdf): | | "The commonly used leading digit of 4 is a historical reference | to a sample rate roughly four times the NTSC or PAL color | subcarrier frequency; the notation originated when subcarrier- | locked sampling was under discussion for component video. Upon | the adoption of component video sampling at 13.5 MHz, the first | digit came to specify luma sample rate relative to 3 3/8 MHz. | HDTV was once supposed to be described as 22:11:11! Since then, | the leading digit has - thank-fully - come to be relative to | the sample rate in use. Until recently, the initial digit was | always 4, since all chroma ratios have been powers of two - 4, | 2, or 1. However, 3:1:1 subsampling has been commercialized in | an HDTV production system (Sony's HDCAM), so 3 may now appear | as the leading digit. By convention, a leading digit of 2 is | never used." | zinekeller wrote: | Okay, the _real_ reason, as far as the bundles of paper I have* | is accurate, is that digital chroma subsampling was first | invented for MUSE, a Japanese analogue HD video standard (with | pre-broadcast digital components). They chose four for | horizontal because it 's relatively easy to manipulate using | their digital systems at the time and two for vertical so that | it's easy to handle interlacing stuff. Unfortunately, I'm not | Sony or NHK so I can't say for certain why not eight or any | other powers of two. Also, Americans (aka the SMPTE) set the | 1,080 lines (the Japanese standard is 1,025), the 16:9 | compromise (between the European and Japanese 15:9 and cinema | 21:9) and the "limited RGB" dilemma that is experienced in | digital video systems (that's literally from the days of NTSC | signalling!). Both the Japanese NHK/Sony MUSE system and the | British IBA (adopted as European) D-MAC system uses the full- | range 8-bit system that is used for JPEG (pre-broadcast to | analogue, of course). | | Analogous to this, the reason why CD audio is 44,100 Hz is | because that's the commonality between NTSC (System M, 525-line | 480-visible 60-Hz) and 625-line (576-visible 50-Hz) systems. | Digital audio was literally stored on U-matic systems at the | time, and it was initially only 14-bit PCM rather than the | 16-bit PCM of CDs. | | * or rather, my employer's mini-library. ___________________________________________________________________ (page generated 2021-08-16 23:00 UTC)