[HN Gopher] Show HN: HiFiC - High-Fidelity Generative Image Comp... ___________________________________________________________________ Show HN: HiFiC - High-Fidelity Generative Image Compression - demo and paper Author : atorodius Score : 199 points Date : 2020-06-26 15:03 UTC (7 hours ago) (HTM) web link (hific.github.io) (TXT) w3m dump (hific.github.io) | mmastrac wrote: | In addition to being an amazing result, props for including a | variety different sets of skin tones in the hero sample box. It's | extremely good to see that there was thought put into reducing | any potential bias for this compression algorithm which has | unfortunately been an issue in the history of image capture [1] | | [1] https://www.nytimes.com/2019/04/25/lens/sarah-lewis- | racial-b... | stereo wrote: | The obvious question when looking at the comparison is, what kind | of jpeg did they pick, and was it fair? It turns out that | mozjpeg, which is pretty much state of the art, can squeeze out | an extra 5% out of shades_jpg_4x_0.807.jpg. | | Your comparison would be even more impressive if you produced the | best jpegs you can at those sizes. | atorodius wrote: | Thank you for your suggestion. We use libjpeg for the Demo. | Keep in mind that we compare against JPEG files which are at | around 100%-400% of the size of the proposed method, so 5% | would not really add much as far as we can tell. | jasonjayr wrote: | This is impressive; but isn't this also susceptible to the type | of bug that Xerox copiers got hit with many years ago? | | https://www.theregister.com/2013/08/06/xerox_copier_flaw_mea... | atorodius wrote: | Small text is indeed one of the fail cases, and more research | needs to be done here (see also my other comment [1]). We | mention this issue in the supplementary materials [2], and you | can check out an example here at [3] | | [1] https://news.ycombinator.com/item?id=23654161 | | [2] | https://storage.googleapis.com/hific/data/appendixb/appendix... | | [3] | https://storage.googleapis.com/hific/userstudy/visualize.htm... | mmastrac wrote: | This might be a good opportunity to lead to research on a | "good faith watermark" for GAN-compressed images that may | include hallucinated details. | gtoderici wrote: | One of the things we discussed to address this is to have | the ability to: a) turn off detail hallucination completely | given the same bitstream; and b) store the median/maximum | absolute error across the image | | (b) should allow the user to determine whether the image is | suitable for their use-case. | the8472 wrote: | Is it possible to put the decoder into a feedback loop | and search among multiple possible encodings that | minimize the residual errors? Similar to trellis | optimization in video codecs. | http://akuvian.org/src/x264/trellis.txt | atorodius wrote: | It would be possible - but by minimizing residual errors | you end up in a similar regime as when minimizing MSE | again, likely making reconstructions blurry! | AbrahamParangi wrote: | You could use an OCR encoder and cat that to the feature | vector. Great work by the way! | atorodius wrote: | Agreed! Alternatively, some semantic segmentation network | could be used and a masked MSE loss. In this paper, we | focussed on showing the crazy potential of GANs for | compression - let's see what future work brings. | deweller wrote: | This is really impressive. But it raises some questions for me. | | What size library would be required to decode these type of | images? | | And would the decoding library be updated on a regular basis? | Would the image change when decoding library is updated? Would | images be tagged with a version of the library when encoded | (HiFiC v1.0.2?) | ache7 wrote: | And I also have questions about decompression speed and memory | requirements. | atorodius wrote: | Thanks for the kind words! | | > What size library would be required to decode these type of | images? | | The model is 726MB. Keep in mind that this is a research | project - further research needs to be done on how to get | faster now that we know that this kind of results are possible! | | > And would the decoding library be updated on a regular basis? | | Only if you want even better reconstructions! | | > Would the image change when decoding library is updated? | Would images be tagged with a version of the library when | encoded (HiFiC v1.0.2?) | | Yes, some header would have to be added. | pornel wrote: | I'm very curious how such thing could be standardized as an | image format. With classic image formats there's an | expectation one can write a spec and an independent | implementation from scratch. "Take my X-MB large pre-trained | model" is unprecedented. | | Would it still be competitive with H.265 if the model was | 10MB or 50MB in size? 0.7GB may be difficult for widespread | adoption. | gtoderici wrote: | Independently of this work, we have models which are | competitive with HEVC while being significantly smaller | (this is from previous work). They will not look nearly as | good as what you see in the website demo, but they're still | better. | | I don't have any such model handy but perhaps it's 10x-20x | smaller. | | We don't claim that this (or even the previous work) is the | way to go forward for images, but we hope to incentivize | more researchers to look in this direction so that we can | figure out how to deploy these kinds of methods since the | results they produce are very compelling. | roywiggins wrote: | Finally someone's instantiated something like Vernor Vinge's | "evocation" idea, quoted here: | | https://www.danvk.org/wp/2014-02-10/vernor-vinge-on-video-ca... | atorodius wrote: | Hi, author here! This is the demo of a research project | demonstrating the potential of using GANs for compression. For | this to be used in practice, more research is required to make | the model smaller/faster, or to build a deticated chip! | Currently, this runs at approximately 0.7 megapixel/sec on a GPU | with unoptimized TensorFlow. Happy to answer any questions. | ksec wrote: | >Currently, this runs at approximately 0.7 megapixel/sec on a | GPU with unoptimized TensorFlow. | | Forgive me, is that in encoding or decoding? And size of | library required to decode it? ( Edit: Looks like answered | below ) | | It was only an hour ago I was looking at some pictures encoded | with VVC / H.266 reference encoder, it was better than BPG / | HEVC, but this HiFiC still beats it by quite a bit. | | This whole thing blows up my mind on limitation I thought we | had on image compression. | | May be I should take back my words on JPEG XL is the future. | This seems revolutionary. | gtoderici wrote: | (coauthor here) The 0.7 megapixels/sec is PNG decode (to get | input)+encoding+decoding+PNG encoding (to get output we can | visualize in a browser) speed. | | Thanks for your kind comment! | cs702 wrote: | Great work. Thank you for making it so accessible online and | sharing it on HN! | | The first thought I had, after looking at some of the examples, | is that it should be fairly trivial to train or finetune the | compression network to hallucinate missing detail with "BPG- | style" or "JPG-style" smoothing artifacts by feeding compressed | images as "real" samples to the adversarial network during | training/finetuning. | | I wonder if doing that would make it possible to achieve even | greater compression with loss of detail that is | indistinguishable from traditional compression methods and | therefore still acceptable to the human eye. | gtoderici wrote: | What you suggest has already been done: train a neural | network with the output of BPG or JPEG, and ask to | reconstruct the input with just the decompressed pixels being | available. | | It definitely is a valid approach but the limitation is that | if the network needs some texture-specific information that | cannot be extracted from the decoded pixels, it can't really | do much. | | There were approaches where such information was also sent on | the side, which yielded better results, of course. | | The field is wide-open and each approach has its own | challenges (e.g., you may need to train one network for | quantization level for example if you're going to do | restoration). | UnbugMe wrote: | The demo images used on hific.github.io appear to be part of | the datasets used to train the system. In another comment you | say the trained model is 726MB. The combined size of the | training datasets appear to be about 8GB zipped. Is the | currently trained model usable on images that are not part of | the training datasets? with output of similar quality and size? | atorodius wrote: | Hi- no the images on the demo page are not part of the | training set, they are only used to evaluate the method. | Arbitrary images of natural scenes will look similar at these | bitrates. We'll release trained models and a colab soon! | UnbugMe wrote: | Thank you and my apologies. I should have read the pdf more | carefully where the distinction between the training data | and evaluation datasets is described | | _Our training set consists of a large set of high- | resolution images collected from theInternet, which we | downscale to a random size between 500 and 1000 pixels, and | then randomlycrop to256x256. We evaluate on three diverse | benchmark datasets collected independently ofour training | set to demonstrate that our method generalizes beyond the | training distribution: thewidely usedKodak[22] dataset (24 | images), as well as theCLIC2020[43] testset (428 images), | andDIV2K[2] validation set (100 images)._ | petermcneeley wrote: | These are truly amazing results. Looking closely at the results | vs original it would appear that much of the details are very | different at almost a noise level. Is the perceptual evaluation | allowing for these similar but completely different noise | details? | gtoderici wrote: | (coauthor here) We used an adversarial loss in addition to a | perceptual loss and MSE. None of these work super-well when | the others are not used. | | The adversarial loss "learns" what is a compressed image and | tries to make the decoder go away from such outputs. | | The perceptual (LPIPS) is not so sensitive to pure noise and | allows for it, but is sensitive to texture details. | | MSE tries to get the rough shape right. | | We also asked users in a study to tell us which images they | preferred when having access to the original. Most prefer the | added details even if they're not exactly right. | nbardy wrote: | > The adversarial loss "learns" what is a compressed image | and tries to make the decoder go away from such outputs. | | Could you expand on this point | atorodius wrote: | The idea is that any distortion loss imposes specific | artifacts. For example, MSE tends to blur outputs, CNN- | feature-based losses (VGG, LPIPS) tend to produce | gridding patterns or also blur. Now, when the | discriminator network sees these artifacts, those | artifacts very obviously distinguish the reconstructions | from the input, and thus the gradients from the | discriminator guide the optimization away from these | artifacts. Let me know if this helps! | fiddlerwoaroof wrote: | Yeah, it seems like the HiFiC version has the same large- | scale structure, but the details are sort of "regenerated": | e.g. it's as if the image format says "there's some hair | here", and the decoder just paints in hair. | gtoderici wrote: | That was exactly the goal of the project! Basically if the | size doesn't allow to have detail, we need to "hallucinate | it". This of course is not necessary if there's enough | bandwidth available for transmission or enough storage. | | On the other hand, in our paper we show that some generated | detail can help even at higher bitrates. | magicalhippo wrote: | Looks really interesting, thanks for sharing! | | One silly question, the red door example image... why is the | red so saturated in all the compressed versions vs the | original? | | edit: Ah seems to be some kind of display issue in Firefox. | When saving the images and comparing the saturation level is | roughly equal. | keyi wrote: | Have you compared the result with latest image compression | formats, such as AVIF[0]? | | [0]: https://aomediacodec.github.io/av1-avif/ | gtoderici wrote: | We haven't specifically compared to AVIF, which as far as we | know is still under development. We'd be happy to compare, | but it's unlikely that we'd learn much out of it. As far as | we know, AVIF is better by <100% than HEVC, but we're | comparing against HEVC at 300% of the bitrate. | | Of course, we'd be happy to add any additional images from | other codecs if they're available. | nullc wrote: | Generally lossy compression methods have a 'knee' below which the | percieved quality rapidly drops off. The default jpeg examples | here are well below that knee. | | Usually comparing below the knee isn't very useful except to | better help understand the sort of artefacts that one might want | to also look for in higher rate images. | | It would be interesting to see some examples from a lower rate | HiFiC specifically for the purpose of bringing out some artefacts | to help inform comparisons at higher rates. | ebg13 wrote: | > _The default jpeg examples here are well below that knee._ | | The examples are chosen as multiples of their file size | 1x,2x,... | nullc wrote: | I used the word 'default' for a reason! :) | atorodius wrote: | The HiFiC model we show is already the low model :) We show | JPEG and BPG at the same rate to visualize how low the | bitrate of our model actually is. And for JPEG and BPG we | show 1x, 2x, and so on to visualize how many more bits the | previous methods need to look similar visually. | nullc wrote: | Sure, but it isn't low rate enough to produce the level | of gross artefacts needed to train the viewer to | recognize faults in the images. | | E.g. after looking at those jpeg images you're able to | flip to much higher rate jpegs and notice blocking | artefacts that a completely unprepared subject would fail | to notice. | | In my work on lossy codecs I found that having trained | listeners was critical to detecting artifacts that would | pass on quick inspection but would become annoying after | prolonged exposure. | | From a marketing fairness point since the only codec you | evaluate 'below knee' is jpeg, it risks exaggerating the | difference. It would be just about as accurate to say | that jpeg can't achieve those low rates-- since no one | will use jpeg at a point where its falling apart. This | work is impressive enough that it doesn't need any | (accidental) exaggeration. | | I think it's best to either have all comparison points | above the point where the codecs fall apart, or have a | below-knee example for all so that it's clear how they | fail and where they fail... rather than asking users to | compare the gigantic difference between a failed and non- | failing output. | atorodius wrote: | In addition to my sibling comment, I would like to add: | | > since no one will use jpeg at a point where its falling | apart. | | Hence we also put it at bitrates people actually use :) | And then the point should be that HiFiC uses much fewer | bits. | | > below-knee for HiFiC | | We actually did not train lower than this model, | HiFiC^lo. That it works so well was somewhat surprising | to us also! From other GAN literature, it seems | reasonable to expect the "below-knee" point for this kind | of approach to still look realistic, but not faithful | anymore. I.e., without the original, it may be hard to | tell that it's fake. | ebg13 wrote: | > _asking users to compare the gigantic difference | between a failed and non-failing output_ | | It isn't. It's asking users to compare what happens at a | particular bitrate as a demonstration that it works | (well!) at bitrates significantly lower than what | anything else can safely achieve. To get comparable | quality you need several _times_ the number of bytes with | another codec. | | While I agree that it would be _interesting_ to see what | happens below HiFiC's knee, the location of the knee for | individual codecs is irrelevant to the comparison that | really matters, because users just want high quality | without high size or low size without low quality. And | this clearly produces the best quality at the least size | by a mile. | | I think they did a fantastic job of showing both how each | code looks at the same size and also how many bytes it | takes for other codecs to reach the same quality. Showing | what it looks like when HiFiC degrades if you go to an | even lower than already extremely low bitrate would be | fun and neat for comparing HiFiC's bitrates but has | little bearing on how it compares to other codecs. | ricardobeat wrote: | This is amazing. Not only it completely blows JPEG and BPG out of | the water in efficiency, but it makes some images look _better_ | than the original! It seems to reduce local contrast which works | well for skin (photos 1, 2 and 5), and what it does for repeating | patterns is quite pleasant to the eyes (1 and 4). | | The only downside I see is the introduction of artifacts like you | can see on the right-side face of the clock tower. | pbowyer wrote: | > it makes some images look better than the original! It seems | to reduce local contrast which works well for skin (photos 1, 2 | and 5) | | It's better than the lossy JPEGs but I strongly disagree it's | better than the original - or even close. Examining image 1 | closely, the loss of contrast and the regularising of the | texture of the skin make the HiFIC image feel fake when I look | at it. I downloaded and opened the images at random on my | computer to get rid of the bias of "the right one is the | original" and it was the original I preferred each time. | | Regardless, it's impressive work and I look forward to future | developments! | atorodius wrote: | Thanks for the kind words! | | We hope that the algorithm we presented is a step in the right | direction, and we acknowledge that there's more work to be | done! Just like with any algorithm there are ways in which it | can be improved. Please check out the supplementary materials | [1], to find some examples that can definitely be improved: | small faces and small text. Overall, as you say, the algorithm | does a great job though! | | [1] | https://storage.googleapis.com/hific/data/appendixb/appendix... | lokl wrote: | Has this been extended to video (beyond compressing each frame | separately)? | atorodius wrote: | This is the first time that GANs have been used to obtain high- | fidelity reconstructions at useful bitrates. We haven't tried | extending it to video. | isoprophlex wrote: | Super exciting! Fantastic work and cool demo! | | I hope this can be 'stabilized' to allow video compression. If | you run eg. style transfer networks on video, the artifacts in | the outputted frames arent stable, they jump around from frame to | frame. | | (Not sure if you can decode this at 24 fps on normal hardware, | but still...) | atorodius wrote: | Thanks! And agreed - doing is in real time for high-resolution | videos still needs quite some research though. | mikepurvis wrote: | Shipping the network in device ROMs seems like a pretty far- | off thing at this point, but I wonder if there could be | something in the nearer term around time-shifting bandwidth | usage, eg: | | - Media appliance under your TV downloads a 5GB updated | network during the night so that you can use smaller-encoded | streams of arbitrary content in the evening. | | - Spotify maintains a network on your phone that is updated | while you're at home so that you can stream audio on the road | with minimal data usage. | | - Your car has a network in it that is periodically updated | over wifi and allows you to receive streetview images over a | manufacturer-supplied M2M connection. | fizixer wrote: | Fantastic. | | When can we install the 1TiB video player/codec/plugin, that'll | allow us to stream 4K movies on a 1Mbps connection? | atorodius wrote: | Hopefully soon! However, in addition to research needed to make | this work for video, it also needs research to make it smaller | and/or to put it in silicon (see my other comments). The | network shown is "only" 726MB BTW :) | ebg13 wrote: | A number of comments here mention the regularity of the noise in | the Lo variant as producing a slightly artificial appearance | (zoom on the woman's forehead in picture 1). Would it improve | anything to look for swathes where the dominant texture is | noticeably uniform noise and mix it up a bit? | crazygringo wrote: | I've been waiting for something to implement this concept for _so | long_ , and I'm so happy to finally get a chance to explore how | it works in practice! | | It's really fascinating to zoom comparing the compressed version | with the original -- the "general idea" is all there and the | quality is roughly the same, but strands of hair move to | different positions, skin pores are in a completely different | pattern, etc. -- it's all regenerated to look "right", but at the | lowest level the texture isn't "real". | | Which is so fascinatingly different from JPEG or other lossy | schemes -- the compression artifacts are obviously exactly that, | you'd never confuse them for real detail. While here, you could | never guess what is real detail vs. what is reconstructed detail. | | My hunch is that this won't actually take off at first for | images, because images on the internet don't use up _that_ much | of our bandwidth anyways, and the performance and model storage | requirements here are huge. | | BUT -- if the same idea can be applied to 1080p _video_ with a | temporal dimension, so that even if reconstructed textures are | "false" they persist across frames, then suddenly the economics | here would start to make sense: dedicated chips for encoding and | decoding, with a built-in 10 GB model in ROM, trained on a decent | proportion of every movie and TV show ever made... imagine if | 2-hour movies were 200 MB instead of 2 GB? | | (And once we have chips for video, they could be trivially re- | used for still images, much like Apple reuses h.265 for HEIC.) | dividuum wrote: | Regarding your last paragraph: I wouldn't be surprised if | existing "upscale" chips already work similar to what you | suggested: They convert lower res content to (for example) 4K. | See | https://www.sony.com/electronics/4k-resolution-4k-upscaling-... | for an example. | ksec wrote: | > because images on the internet don't use up that much of our | bandwidth anyways, | | On the Internet, Yes. But for web pages, Image is still by far | the largest payload. It could be anywhere between 60 to 90% of | total content size. | | Thinking in the context of latency, where you could benefits | with images appearing much quicker or instantaneous ( embedding | inside the HTML ) because of its reduction. Especially on first | screen. | IshKebab wrote: | > Image is still by far the largest payload. | | A good point that has been made before is that images are | often by far the biggest part of a web page, but most sites | _still_ don 't even do basic optimisations like resizing them | to the viewing size, using WebP, using pngcrush, etc. | | That said, this would be cool for making really well | optimised sites. | crazygringo wrote: | Progressive JPEG's go a long way towards that though, no? | ksec wrote: | Not JPEG in its current form. ( May be JPEG XL ) The user | experience is horrible. And during the last HN discussion | on Medium using progressive JPEG it turns out a lot of | people have similar thoughts. | phs318u wrote: | Given the seemingly random nature of the compression | "artefacts", versus the more predictable compression artefacts | with other algos, I wonder if this would limit use of such | images in scenarios where the resulting images had to be relied | on in court as evidence of something? | crazygringo wrote: | Absolutely. Exactly for this reason, I would hope not to see | this ever become the default format for cell phone cameras | unless there were somehow explicit guarantees around what is | or is not faithful. Same with security cameras, etc. | | _But_ , anywhere where the purpose of output is | social/entertainment, it seems this would be freely adopted. | Profile pictures, marketing images, videos and movies. If | could very much mean, however, that a photo on someone's | Facebook profile might not be admissable as evidence, if used | to identify an exact match with someone, rather than just | "someone who looks similar". | swiley wrote: | I think it will be extremely popular. Most people are concerned | about how computers look. Wether or not the behavior is correct | usually only bothers people who spend time working on | computers. | | I'm not sure if it's because the first group is just used to | life being unpredictable and don't realize things could be | better, or if they really don't care. | 0-_-0 wrote: | I'm very impressed, I was waiting for an image codec that | combines something like VGG loss + GANs! (Another thing that I'm | waiting for is a neural JPEG decoder with a GAN, which would be | backwards compatible with all the pictures already out there!) | Now we need to get some massive standardisation process going to | make this more practical and perfect it, just like it was done | for JPEG in the old days! (And then do it for video and audio | too!) | | What happens if you compress a noisy image? Does the compression | denoise the image? | gtoderici wrote: | On the standardization issue: the advantage of such a method | that we presented is that as long as there exists a standard | for model specification, we can encode every image with an | arbitrary computational graph that can be linked from the | container. | | Imagine being able to have domain specific models - say we | could have a high accuracy/precision model for medical images | (super-close to lossless), and one for low bandwidth | applications where detail generation is paramount. Also imagine | having a program written today (assuming the standard is out), | and it being able to decode images created with a model | invented 10 years from today doing things that were not even | thought possible when the program was originally written. This | should be possible because most of the low level building | blocks (like convolution and other mathematical operations) is | how we define new models! | | On noise: I'll let my coauthors find some links to noisy images | to see what happens when you process those. | 0-_-0 wrote: | Absolutely! Being able to improve a decoder for an existing | encoder (and vica versa) is a great advantage! | atorodius wrote: | Thanks! Noise is actually preserved really well, and is one of | the strengths of using a GAN. Check out this visual example | from the "All Evaluation Images" link: | https://storage.googleapis.com/hific/clic2020/visualize.html... | Darkphibre wrote: | ... wow ___________________________________________________________________ (page generated 2020-06-26 23:00 UTC)