[HN Gopher] DALL-E Paper and Code
       ___________________________________________________________________
        
       DALL-E Paper and Code
        
       Author : david2016
       Score  : 15 points
       Date   : 2021-02-24 20:26 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | MrUssek wrote:
       | So, uhh, where's the paper? The link in the readme isn't active.
        
       | campac wrote:
       | Has anyone tried this out?
        
         | CasperDern wrote:
         | The repository linked is just a part of the entire model so it
         | can't be used as is.
         | 
         | That said there is a completely implementation made by
         | lucidrains[1] with some results, the only missing component now
         | is the dataset.
         | 
         | [1]: https://github.com/lucidrains/DALLE-pytorch
        
         | minimaxir wrote:
         | A thread of examples from the provided notebook:
         | https://twitter.com/ak92501/status/1364666124919447558
         | 
         | Note that these just demonstrate that arbitrary encoded input
         | images match the decoded images, which is what would be
         | expected from a VAE.
        
       | minimaxir wrote:
       | Note that this is just the VAE component as used to help training
       | and generating images, it will not let you create crazy images
       | with natural language as used in the blog post
       | (https://openai.com/blog/dall-e/).
       | 
       | More specifically from that link:
       | 
       | > [...] the image is represented using 1024 tokens with a
       | vocabulary size of 8192.
       | 
       | > The images are preprocessed to 256x256 resolution during
       | training. Similar to VQVAE, each image is compressed to a 32x32
       | grid of discrete latent codes using a discrete VAE1 that we
       | pretrained using a continuous relaxation.
       | 
       | OpenAI also provides the encoder and decoder models and their
       | weights.
       | 
       | However, with the decoder model, it's now possible to say train a
       | text-encoding model to link up to that decoder (training on say
       | an annotated image dataset) to get something close to the DALL-E
       | demo OpenAI posted. Or something even better!
        
         | indiv0 wrote:
         | Yeah unfortunately OpenAI has only released the weaker resnets
         | and vision transformers they trained.
         | 
         | Some brilliant folks (Ryan Murdock [@advadnoun], Phil Wang
         | [@lucidrains]) have tried to replicate their results with
         | projects like big-sleep [0] with decent results, but even with
         | this improved VAE we're still a ways from DALL-E quality
         | results.
         | 
         | If anyone would like to play with the model check out either
         | the Google Colab [1] (if you wanna run it on Google's cloud) or
         | my site [2] (if you want a simplified UI).
         | 
         | [0]: https://github.com/lucidrains/big-sleep/
         | 
         | [1]: https://colab.research.google.com/drive/1MEWKbm-
         | driRNF8PrU7o...
         | 
         | [2]: https://dank.xyz
        
       | make3 wrote:
       | the title should be updated, this doesn't have the paper, and
       | it's not the code for DALL-E but for its VAE component only
        
       ___________________________________________________________________
       (page generated 2021-02-24 23:01 UTC)