[HN Gopher] DALL-E Paper and Code ___________________________________________________________________ DALL-E Paper and Code Author : david2016 Score : 15 points Date : 2021-02-24 20:26 UTC (2 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | MrUssek wrote: | So, uhh, where's the paper? The link in the readme isn't active. | campac wrote: | Has anyone tried this out? | CasperDern wrote: | The repository linked is just a part of the entire model so it | can't be used as is. | | That said there is a completely implementation made by | lucidrains[1] with some results, the only missing component now | is the dataset. | | [1]: https://github.com/lucidrains/DALLE-pytorch | minimaxir wrote: | A thread of examples from the provided notebook: | https://twitter.com/ak92501/status/1364666124919447558 | | Note that these just demonstrate that arbitrary encoded input | images match the decoded images, which is what would be | expected from a VAE. | minimaxir wrote: | Note that this is just the VAE component as used to help training | and generating images, it will not let you create crazy images | with natural language as used in the blog post | (https://openai.com/blog/dall-e/). | | More specifically from that link: | | > [...] the image is represented using 1024 tokens with a | vocabulary size of 8192. | | > The images are preprocessed to 256x256 resolution during | training. Similar to VQVAE, each image is compressed to a 32x32 | grid of discrete latent codes using a discrete VAE1 that we | pretrained using a continuous relaxation. | | OpenAI also provides the encoder and decoder models and their | weights. | | However, with the decoder model, it's now possible to say train a | text-encoding model to link up to that decoder (training on say | an annotated image dataset) to get something close to the DALL-E | demo OpenAI posted. Or something even better! | indiv0 wrote: | Yeah unfortunately OpenAI has only released the weaker resnets | and vision transformers they trained. | | Some brilliant folks (Ryan Murdock [@advadnoun], Phil Wang | [@lucidrains]) have tried to replicate their results with | projects like big-sleep [0] with decent results, but even with | this improved VAE we're still a ways from DALL-E quality | results. | | If anyone would like to play with the model check out either | the Google Colab [1] (if you wanna run it on Google's cloud) or | my site [2] (if you want a simplified UI). | | [0]: https://github.com/lucidrains/big-sleep/ | | [1]: https://colab.research.google.com/drive/1MEWKbm- | driRNF8PrU7o... | | [2]: https://dank.xyz | make3 wrote: | the title should be updated, this doesn't have the paper, and | it's not the code for DALL-E but for its VAE component only ___________________________________________________________________ (page generated 2021-02-24 23:01 UTC)