[HN Gopher] Dall-E 2 ___________________________________________________________________ Dall-E 2 Author : yigitdemirag Score : 1040 points Date : 2022-04-06 14:09 UTC (8 hours ago) (HTM) web link (openai.com) (TXT) w3m dump (openai.com) | Traster wrote: | To be honest the Girl with a Pearl Earing "variations" look a | little bit like a crime against art. It's like the person who | built this has no idea why the Girl with a Pearl Earing is good | art. "Here's the Girl with a Pearl Earing " - "OK, well here's | some girls with turbans" | | Art is truth. | bbbobbb wrote: | To be honest it's hard for me to imagine alternate reality | where the 'original' is not swapped with one of the | 'variations' without same comment underneath. Why is the | 'original' good art? | sillysaurusx wrote: | Maybe. | https://cdn.openai.com/dall-e-2/demos/variations/modified/gi... | was pretty impressive. | | I think the results are being poisoned by the fact that most | old paintings have deteriorated colors, so the training data | looks nothing like the originals. It's certainly a lot yellower | than | https://cdn.openai.com/dall-e-2/demos/variations/originals/g... | eks391 wrote: | > It's like the person who built this has no idea why the Girl | with a Pearl Earing is good art. | | The people didn't program Dall E how to make art. They taught | it to recognize patterns and create something by extrapolating | from the patterns, all on its own. So the AI isn't a projection | of what they think is good art, it's projecting what it thinks | is good art, based on a prompt. The output is its best effort | of a feeling, even if the feeling had to be inputted by a | living person. So it's still art that's as good as the feeling | that it came from-fleeting feelings being lower quality than | those that required more time and thought | billconan wrote: | I'm curious, is this something feasible to train (and inference) | on a consumer level machine, or this is something can only be | done by institutes? | marviel wrote: | It's becoming clear that efficient work in the future will hinge | upon one's ability to _accurately describe what one wants_. | Unpacking that -- a large piece is the ability to understand all | the possible "pitfalls" and "misunderstandings" that could | happen on the way to a shared understanding. | | While technical work will always have a place -- I think that | much creative work will become more like the _management_ of a | team of highly-skilled, niche workers -- with all the | frustrations, joys, and surprises that entails. | killerstorm wrote: | No... These models are trained to predict. | | You can definitely make them incremental. You can give it a | task like "make a more accurate description from initial | description and clarification". Even GPT-3-based models | available today can do these tasks. | | Once this is properly productionized it would be possible to | implement stuff just talking with a computer. | [deleted] | golergka wrote: | > accurately describe what one wants | | Isn't that essentially what programming already is? | armchairhacker wrote: | Programming, art, music, is just "describing what you want" in | a very specific way. This is describing what you want in a much | more vague way. | | The upside it that it's more "intuitive" and requires much less | detail and technique, as the AI infers the detail and | technique. The downside is that it's really hard to know what | the AI will generate or get it to generate something really | specific. | | I believe the future will combine the heuristics of AI- | generation with the specificity of traditional techniques. For | example, artists may start with a rough outline of whatever | they want to draw as a blob of colors (like in some AI image- | generation papers). Then they can fill in details using AI | prompts, but targeting localized regions/changes and adding | constraints, shifting the image until it's almost exactly what | they imagined in their head. | falcor84 wrote: | >We've limited the ability for DALL*E 2 to generate ... adult | images. | | I think that using something like this for porn could potentially | offer the biggest benefit to society. So much has been said about | how this industry exploits young and vulnerable models. Cheap | autogenerated images (and in the future videos) would pretty much | remove the demand for human models and eliminate the related | suffering, no? | | EDIT: typo | sillysaurusx wrote: | Depends whether you think models should be able to generate cp. | | It's almost impossible to even give an affirmative answer to | that question without making yourself a target. And as much as | I err on the side of creator freedom, I find myself shying away | from saying yes without qualifications. | | And if you don't allow cp, then by definition you require some | censoring. At that point it's just a matter of where you | censor, not whether. OpenAI has gone as far as possible on the | censorship, reducing the impact of the model to "something that | can make people smile." But it's sort of hard to blame them, if | they want to focus on making models rather than fighting | political battles. | | One could imagine a cyberpunk future where seedy AI cp images | are swapped in an AR universe, generated by models ran by | underground hackers that scrounge together what resources they | can to power the behemoth models that they stole via hacks. | Probably worth a short story at least. | | You could make the argument that we have fine laws around porn | right now, and that we should simply follow those. But it's not | clear that AI generated imagery can be illegal at all. The | question will only become more pressing with time, and society | has to solve it before it can address the holistic concerns you | point out. | | OpenAI ain't gonna fight that fight, so it's up to EleutherAI | or someone else. But whoever fights it in the affirmative will | probably be vilified, so it'd require an impressive level of | selflessness. | [deleted] | chias wrote: | Would this not necessarily require training it on a large | body of real CSAM? Seems like it would be a non-starter. | sillysaurusx wrote: | Surprisingly no. It knows what a child looks like, and can | infer what a naked child looks like from medical imagery. | | A child with adult body parts is a whole other class of | weirdness that might pop out too. | | Models want to surprise us all. | loufe wrote: | There are so many excellent, thought-provoking comments in | this thread, but yours caught me especially. Something that | came to mind immediately upon reading the release was the | potential for this technology to transform literature, adding | AI generated imagery to turn any novel into a visual novel as | a premium way to experience the story, something akin to | composing D-Box seat response to a modern movie. I was | imagining telling the cyberpunk future story you were | elaborating, which is really compelling, in such a way and | couldn't help but smile. | sillysaurusx wrote: | Please write it! I'd love to read one. | aryamaan wrote: | In the same theme, I liked the comments of both of you. | | Another use case could be to make it easier/ automatic to | create comics. You tell what the background should be, | characters should be doing and the dialogues. Boom, you | have a good enough comic. | | ----------- | | Reading as a medium has not evolved with technology. | Creating the imagery does happen in humans' minds. It's not | surprise that some people enjoy doing that (and also enjoy | watching that imagery) and others do not. | | This could be a helping brain to create those imageries. | | ----------- | | Now imagine, reading stories to your child. Actually, | creating stories for your child. Where they are the | characters in the stories. Having a visual element to it is | definetly going to be a premium experience. | GauntletWizard wrote: | Religious people don't only believe that porn harms the models, | but also the user. I happen to agree, despite being a porn user | - Porn is a form of simulated and not-real stimulation. Porn is | harmful to the user the same way that any form of delusion is: | It associated positive pleasure with stimulation that does not | fulfil any basic or even higher-level needs, and is | unsustainable. Porn is somewhere on the same scale as | wireheading[1] | | That doesn't mean that it's all bad, and that there's no | recreational use for it. We have limits on the availability of | various other artificial stimulants. We should continue to have | limits on the availability of porn. Where to draw that line is | a real debate. | | [1] https://en.wikipedia.org/wiki/Wirehead_(science_fiction) | [deleted] | Siira wrote: | The problem might be that people are simply lying. Their real | reasons are religious/ideological, but they cite humanitarian | concerns (which their own religious stigma is partly | responsible for). | thom wrote: | People take their experiences of porn into real relationships, | so I do not think this removes suffering overall, no. | AYBABTME wrote: | Iain Banks' "Surface Detail" would like to have a word with | you. | | This author's books are great at putting these sort of moral | ideas to test in a sci-fi context. This specific tome portraits | virtual wars and virtual "hells". The hope is of being more | civilized than by waging real war or torturing real living | entities. However some protagonists argue that virtual life is | indistinguishable from real life, and so sacrificing virtual | entities to save "real" ones is a fallacy. | | Or some such, it's been a while. | cm2012 wrote: | I suspect that if a free version of this comes out and allows | adult image generation, 90% of what it will be used for is | adult stuff (see the kerfuffle with AIDungeon). | | I can get why the people who worked hard on it and spent money | building it don't want to be associated with porn. | albertzeyer wrote: | Some initial video by Yannic Kilcher: | https://www.youtube.com/watch?v=gGPv_SYVDC8 | mario143 wrote: | Yeah, I mean you're right that ultimately the proof is in the | pudding. | | But I do think we could have guessed that this sort of approach | would be better (at least at a high level - I'm not claiming I | could have predicted all the technical details!). The previous | approaches were sort of the best that people could do without | access to the training data and resources - you had a pretrained | CLIP encoder that could tell you how well a text caption and an | image matched, and you had a pretrained image generator (GAN, | diffusion model, whatever), and it was just a matter of trying to | force the generator to output something that CLIP thought looked | like the caption. You'd basically do gradient ascent to make the | image look more and more and more like the text prompt (all the | while trying to balance the need to still look like a realistic | image). Just from an algorithm aesthetics perspective, it was | very much a duct tape and chicken wire approach. | | The analogy I would give is if you gave a three-year-old some | paints, and they made an image and showed it to you, and you had | to say, "this looks like a little like a sunset" or "this looks a | lot like a sunset". They would keep going back and adjusting | their painting, and you'd keep giving feedback, and eventually | you'd get something that looks like a sunset. But it'd be better, | if you could manage it, to just teach the three-year-old how to | paint, rather than have this brute force process. | | Obviously the real challenge here is "well how do you teach a | three-year-old how to paint?" - and I think you're right that | that question still has a lot of alchemy to it. | EZ-Cheeze wrote: | "Computer, render Bella and Gigi Hadid playing tennis in bikinis" | KevinGlass wrote: | Something about this makes me nauseous. Perhaps is the fact that | soon the market value for creatives is going to fall to a hair | about zero for all but the most famous. We will be all the poorer | for it when 95% of images you see are AI generated. There will be | niches of course but in a few short years it'll be over for a | huge swathe of creative professionals who are already struggling. | | Some of the images also hit me with a creep factor, like the | bears on the corgis in the art gallery, but that maybe only | because I know it's AI generated. | idleproc wrote: | I imagine it will affect artists much the same way wordpress | has affected web designers. | | Maybe everyone will have an AI image as their desktop | wallpaper, but if you've got cash you'll want something with | provenance and rarity to brag about. | | Also, I think creatives are valued for their imagination. If | you wanted something decent, would you pay someone to sift | through a million AI generated images to find a gem, or just | pay an artist you like to create one for you? | bufferoverflow wrote: | > you'll want something with provenance and rarity to brag | about. | | 1) That is a tiny share of the market. Most of the market is | - I have a game / online publication / book, and I need an | illustration xyz. Which this AI seems to solve. | | 2) how do you even prove your rare art wasn't painted by an | AI? | idleproc wrote: | 1) Sure there's a lot of work for that kind of thing but | creatives typically earn a pittance. I doubt an AI could | meet your specific requirements without having to spend | hours(?) tweaking it or sifting through countless | variations for the 'one'. | | 2) Because we haven't built a machine that can paint (etc.) | with traditional materials like a skilled artist? | typon wrote: | I paid $1500 for a commissioned painting from an artist I | respect and follow as a birthday present for a friend. The | painting meant something to me because I worked with the artist | to have some input about what kind of a person my friend is, | what kind of features I want to see in the painting and how I | want it to feel. The artist gave me 5 different sketches and we | had tons of back and forth. The process and the act of creating | the painting on a canvas from someone I respect is what I paid | for. | | Even if an AI could generate an exactly equivalent painting, I | would pay $0 for it. It wouldn't mean anything to me. | chpatrick wrote: | Just wait until they figure out music. | Applejinx wrote: | Not exactly. All the ideas put forth in these demos are really | arbitrary, with nothing whatsoever to say. Generating crap art | becomes more and more effortless: we've seen this in music as | well. | | Jumping out of the conceptual box to generate novel PURPOSE is | not the domain of a Dall-E 2. You've still gotta ask it for | things. It's a paintbrush. Without a coherent story, it's an | increasingly impressive stunt (or a form of very sophisticated | 'retouching brush'). | | If you can imagine better than the next guy, Dall-E 2 is your | new tool for expression. But what is 'better'? | jupp0r wrote: | This reminds me of an art class in high school in the early | 2000s where I handed in a printout of a 3d generated image | (painstakingly modeled and rendered in software over the | whole weekend by me) and the teacher looked at me and told me | that's not art because it's "computer generated" and I didn't | "even use my hands" to make it. Even as a teenager, the idea | that art is defined by how it's made versus it being a way | for the artist to express intention in whatever way they seem | fit seemed really reductionist and almost vulgar to me. | | Maybe lots artists of the future will actually use AI models | to express their inner thoughts and desires in a way that | touches something in their audience. It will still be art. | throwaway71271 wrote: | 'art' comes from 'artem' which means 'skill', which is the | root of 'artificial' (https://www.etymonline.com/word/art | and https://www.etymonline.com/word/artificial) | | your teacher was wrong | | i had a friend who didnt get credit for his design work | because he used photoshop instead of using pen and paper | for similar reason, i still find it amazing that a teacher | would say such a thing | andybak wrote: | > 'art' comes from 'artem' which means 'skill', which is | the root of 'artificial' | | His teacher was wrong but "argument from etymology" is | surely a fallacy. | amelius wrote: | Can I opt-out from ever seeing AI generated images please? | 323 wrote: | The same thing was said when book printing was invented, that | we would lose the fabulous scribes that manually duplicate | books with a human touch, while replacing them with soulless | mechanical machines. | | Or when synthesizers and computer music was invented, that they | will displace talented musicians that know how to play an | instrument and how now everybody without a musical education | will be able to produce music, thus devaluing actual musicians. | alcover wrote: | > for all but the most famous | | OK DALL-E, generate our logo in the style of ${most famous} | axg11 wrote: | I really don't agree. When I work with a creative I'm not | working with them because of their content generation skills. | I'm working with them because of their taste and curation | ability that results in the end product. | | The nature of creative work will certainly change, creatives | will adopt tools such as Dall-E 2. In certain narrow cases they | might be replaced, such as if you are asking a creative to | generate a very specific image, but how often is that the case? | The majority of the time tools such as Dall-E 2 will act as an | accelerator for creatives and help them increase their output. | lofatdairy wrote: | Perhaps a more optimistic way of looking at it: When mass | production became available to art, the idea of an "artwork" | had to be abstracted from a unique piece (Walter Benjamin gives | the example of a statue of Venus, which has value in its | uniqueness) to the idea of art as the output of some process. | Each piece has no claim to authenticity, and the very idea of | an "original" would be antithetical to its production. | | I think art will survive, just like photography didn't kill the | painting, the idea of art might simply begin to encompass this | new mean of production, which no longer requires the steady | hand, but still requires a discerning eye. Sure, we might say | that the "artist" is simply a curator, picking which | algorithmic output is most worthy of display, but these | distinctions have historically been fluid, and challenging | ideas of art has long been one of art's function as well | dragonwriter wrote: | > Perhaps is the fact that soon the market value for creatives | is going to fall to a hair about zero for all but the most | famous. | | But...that's always been the case for creatives. | throwaway675309 wrote: | Nonsense. This is merely a tool and helps lower the barrier of | entry to be able to produce imagery. | | By the same logic you should also complain about any number of | IDEs, development tools, WordPress, game maker systems like RPG | maker or Unity, after all if anyone can just leverage a free | physics and collision system without having a complete | understanding of rigid body Newtonian systems to roll their own | engine it'll be too uniform. | TaupeRanger wrote: | By "creatives" you seem to mean "people who drum up the | equivalent of elevator music for ads and blogs". This will not | remotely replace any working "creative" people that I know. | pingeroo wrote: | Except it will only get more powerful with time, probably at | an accelerating pace. Everyone always downplays these | legitimate fears about AI, pointing out how "it can't do X". | They always forget to put the "yet" at the end of that | sentence. | [deleted] | TaupeRanger wrote: | The person I responded to literally made the claim that it | would happen imminently... | zitterbewegung wrote: | I don't want to dismiss this new model and achievements but we | are getting to the point where I feel like what we saw in the | open source versus close source systems we see in new ml models | another one is forming for open and closed models. I think that | larger and larger models will have disclaimers either restricting | you from using it commercially (a great deal of academics and | NVIDIA models are doing this. And OpenAI just puts it behind an | API with the rules : | | Curbing Misuse Our content policy does not allow users to | generate violent, adult, or political content, among other | categories. We won't generate images if our filters identify text | prompts and image uploads that may violate our policies. We also | have automated and human monitoring systems to guard against | misuse. | asxd wrote: | They're pretty strict about usage: | | - | https://github.com/openai/dalle-2-preview/blob/main/system-c... | | - | https://github.com/openai/dalle-2-preview/blob/main/system-c... | jdrc wrote: | It should be possible to create open source versions, | researchers will find a way if something is cool enough | zackmorris wrote: | Apologies for an open-ended question but: does anyone know if | there is a term for something like Turing-completeness within AI, | where a certain level of intelligence can simulate any other type | of intelligence like our brains do? | | For example, using DeMorgan's theorem, we can build any logic | circuit out of all NAND or NOR gates: | | https://www.electronics-tutorials.ws/boolean/demorgan.html | | https://en.wikipedia.org/wiki/NAND_logic | | https://en.wikipedia.org/wiki/NOR_logic | | Dall-E 2's level of associative comprehension is so far beyond | the old psychology bots in the console pretending to be people, | that I can't help but wonder if it's reached a level where it can | make any association. | | For example, I went to an AI talk about 5 years ago where the guy | said that any of a dozen algorithms like K-Nearest Neighbor, | K-Means Clustering, Simulated Annealing, Neural Nets, Genetic | Algorithms, etc can all be adapted to any use case. They just | have different strengths and weaknesses. At that time, all that | really mattered was how the data was prepared. | | I guess fundamentally my question is, when will AGI start to | become prevalent, rather than these special-purpose tools like | GPT-3 and Dall-E 2? Personally I give it less than 10 years of | actual work, maybe less. I just mean that to me, Dall-E 2 is | already orders of magnitude more complex than what's required to | run a basic automaton to free humans from labor. So how can we | adapt these AI experiments to get real work done? | robertsdionne wrote: | https://en.wikipedia.org/wiki/Universal_approximation_theore... | teaearlgraycold wrote: | > does anyone know if there is a term for something like | Turing-completeness within AI, where a certain level of | intelligence can simulate any other type of intelligence like | our brains do? | | Artificial General Intelligence | dqpb wrote: | Juergeb Schmidhuber predicts the "Omega point" of technological | development (including AGI) to be around 2040 | | https://youtu.be/pGftUCTqaGg | | The MIT Limits to Growth study predicts the collapse of global | civilization around 2040 | | https://www.vice.com/amp/en/article/z3xw3x/new-research-vind... | causticcup wrote: | Almost everything stated here is simply wrong or misinformed. | | >For example, I went to an AI talk about 5 years ago where the | guy said that any of a dozen algorithms like K-Nearest | Neighbor, K-Means Clustering, Simulated Annealing, Neural Nets, | Genetic Algorithms, etc can all be adapted to any use case. | They just have different strengths and weaknesses. At that | time, all that really mattered was how the data was prepared. | | How do you suppose KNN is going to generate photorealistic | images? I don't understand the question here | | >I guess fundamentally my question is, when will AGI start to | become prevalent, rather than these special-purpose tools like | GPT-3 and Dall-E 2? | | Actual AGI research is basically non-existant, and GPT-3/Dall-E | 2 are not AGI-level tools. | | >Personally I give it less than 10 years of actual work, maybe | less | | Lol... | | >I just mean that to me, Dall-E 2 is already orders of | magnitude more complex than what's required to run a basic | automaton to free humans from labor. | | Categorically incorrect | agloeregrets wrote: | The most interesting item to me is the variations on the garden | shop and bathroom sink idea. The realism of these leaks the AI | lacking intuition of the requirements. This makes for a number of | nonsensical designs that look right at first like: This Sink | lacks sensical faucets. | https://cdn.openai.com/dall-e-2/demos/variations/modified/ba... | | This doorway is downright impossible | https://cdn.openai.com/dall-e-2/demos/variations/modified/fl... | dqpb wrote: | It looks to me like the faucet sprays water sideways toward the | bowl, which is genius, because then you aren't bumping up | against it when you're washing your hands! | Spinnaker_ wrote: | "Doorway in the style of Escher" | momojo wrote: | Great point. When I saw the shadows and reflections, I thought | it had developed a primitive understanding of physical logic. | Now I'm not so sure. | | At this point, it still seems like it's pushing pixels around | until it's "good enough" when you squint at it. | aaron695 wrote: | minimaxir wrote: | A few comments by someone who's spent way too much time in the | AI-generated space: | | * I recommend reading the Risks and Limitations section that came | with it because it's very through: | https://github.com/openai/dalle-2-preview/blob/main/system-c... | | * Unlike GPT-3, my read of this announcement is that OpenAI does | not intend to commercialize it, and that access to the waitlist | is indeed more for testing its limits (and as noted, | commercializing it would make it much more likely lead to | interesting legal precedent). Per the docs, access is _very_ | explicitly limited: | (https://github.com/openai/dalle-2-preview/blob/main/system-c... | ) | | * A few months ago, OpenAI released GLIDE ( | https://github.com/openai/glide-text2im ) which uses a similar | approach to AI image generation, but suspiciously never received | a fun blog post like this one. The reason for that in retrospect | may be "because we made it obsolete." | | * The images in the announcement are still cherry-picked, which | is therefore a good reason why they tested DALL-E 1 vs. DALL-E 2 | presumably on non-cherrypicked images. | | * Cherry-picking is relevant because AI image generation is still | slow unless you do real shenanigans that likely compromise image | quality, although OpenAI has likely a better infra to handle | large models as they have demonstrated with GPT-3. | | * It appears DALL-E 2 has a fun endpoint that links back to the | site for examples with attribution: | https://labs.openai.com/s/Zq9SB6vyUid9FGcoJ8slucTu | bufferoverflow wrote: | Not-so-open.ai | qeternity wrote: | open-your-wallet.ai | btdmaster wrote: | https://www.eleuther.ai (text, not images, but free as in | freedom) | [deleted] | refulgentis wrote: | Katherine Crawson is @ Eletheur & IMHO is indisputably most | responsible for the advances in text=>image generation. | Dall-E 2 is Dall-E and her insight to use diffusion, the | intermediate proof of concept of diffusion + Dall-E is | GLIDE. | | https://twitter.com/RiversHaveWings & | https://github.com/crowsonkb | bradgessler wrote: | Could somebody build this for SVG icons? I'd invest in it. | applgo443 wrote: | What do you want? | nope96 wrote: | Is there an 'explain it like I'm 15' for how this works? It seems | like black magic. I've been a computer hobbyist since the late | 1980's and this is the first time I cannot explain how a computer | does what it does. Absolutely the most amazing thing I've ever | seen, and I have zero clue how it works. | drcode wrote: | Imagine asking it to generate a picture for "duck wearing a hat | on Mars": | | First, it creates a random 10x10 pixel blurry image and asks a | neural net: "Could this be a duck wearing a hat on Mars?" and | the neural net replies "No, because all the pictures I've ever | seen of Mars have lots of red color in them" so the system | tweaks the pixels to make them more red, put some pixels in the | center that have a plausible duck color, etc. | | After it has a 10x10 image that is a plausible duck on Mars, | the system scales the image to 20x20 pixels, and then uses 4 | different neural nets on each corner to ask "Does this look | like the upper/lower left/right corner of a duck wearing a hat | on Mars?" Each neural net is just specialized for one corner of | the image. | | You keep repeating this with more neural nets until you have a | pretty 1000x1000 (or whatever) image. | refulgentis wrote: | Not the case, though in a handwave-y way, same idea - instead | of iteratively scaling, you're iteratively denoising. See | here, links out to the Cornell NLP PhD describe in even more | detail: https://www.jpohhhh.com/articles/inflection-point-ml- | art | karmasimida wrote: | Diffusion models are indeed pretty magical. | eks391 wrote: | Research Deep Learning. That's the technique they are using to | generate the images. Theres a lot of applications. Once you | understand _how_ it works, look up Two Minute Papers to see | what it is being used for. He covers more than just deep | learning algorithms, but his videos on deep learning are quite | insightful on the potentials of this technology. | joshcryer wrote: | I'm with you there but we still don't know how it works, just | that it does. The method though is you take a bunch of images, | you plug them into a multi dimensional array (a nice way of | saying a tensor), have some kind of tagging system, and when | you ask the system for an answer, it will put one out for you. | So for example in the astronaut riding the horse, there is, on | some level, a picture of a horse with those similar pixels, | that exists in the data of some object tagged 'horse.' Likewise | with astronaut. What is important is that the data sets are | absolutely massive, with billions of parameters. | | Here's a more of a 'not 15 year old' explanation: | https://ml.berkeley.edu/blog/posts/dalle2/ | Imnimo wrote: | Here is my extremely rough ELI-15. It uses some building blocks | like "train a neural network", which probably warrant | explanations of their own. | | The system consists of a few components. First, CLIP. CLIP is | essentially a pair of neural networks, one is a 'text encoder', | and the other is an 'image encoder'. CLIP is trained on a giant | corpus of images and corresponding captions. The image encoder | takes as input an image, and spits out a numerical description | of that image (called an 'encoding' or 'embedding'). The text | encoder takes as input a caption and does the same. The | networks are trained so that the encodings for a corresponding | caption/image pair are close to each other. CLIP allows us to | ask "does this image match this caption?" | | The second part is an image generator. This is another neural | network, which takes as input an encoding, and produces an | image. Its goal is to be the reverse of the CLIP image encoder | (they call it unCLIP). The way it works is pretty complicated. | It uses a process called 'diffusion'. Imagine you started with | a real image, and slowly repeatedly added noise to it, step by | step. Eventually, you'd end up with an image that is pure | noise. The goal of a diffusion model is to learn the reverse | process - given a noisy image, produce a slightly less noisy | one, until eventually you end up with a clean, realistic image. | This is a funny way to do things, but it turns out to have some | advantages. One advantage is that it allows the system to build | up the image step by step, starting from the large scale | structure and only filling in the fine details at the end. If | you watch the video on their blog post, you can see this | diffusion process in action. It's not just a special effect for | the video - they're literally showing the system process for | creating an image starting from noise. The mathematical details | of how to train a diffusion system are very complicated. | | The third is a "prior" (a confusing name). Its job is to take | the encoding of a text prompt, and predict the encoding of the | corresponding image. You might think that this is silly - CLIP | was supposed to make the encodings of the caption and the image | match! But the space of images and captions is not so simple - | there are many images for a given caption, and many captions | for a given image. I think of the "prior" as being responsible | for picking _which_ picture of "a teddy bear on a skateboard" | we're going to draw, but this is a loose analogy. | | So, now it's time to make an image. We take the prompt, and ask | CLIP to encode it. We give the CLIP encoding to the prior, and | it predicts for us an image encoding. Then we give the image | encoding to the diffusion model, and it produces an image. This | is, obviously, over-simplified, but this captures the process | at a high level. | | Why does it work so well? A few reasons. First, CLIP is really | good at its job. OpenAI scraped a colossal dataset of | image/caption pairs, spent a huge amount of compute training | it, and come up with a lot of clever training schemes to make | it work. Second, diffusion models are really good at making | realistic images - previous works have used GAN models that try | to generate a whole image in one go. Some GANs are quite good, | but so far diffusion seems to be better at generating images | that match a prompt. The value of the image generator is that | it helps constrain your output to be a realistic image. We | could have just optimized raw pixels until we get something | CLIP thinks looks like the prompt, but it would likely not be a | natural image. | | To generate an image from a prompt, DALL-E 2 works as follows. | First, ask CLIP to encode your prompt. Next, ask the prior what | it thinks a good image encoding would be for that encoded | prompt. Then ask the generator to draw that image encoding. | Easy peasy! | 6gvONxR4sf7o wrote: | Any pointers on getting up to speed on diffusion models? I | haven't encountered them in my corner of the ML world, and | googling around for a review paper didn't turn anything up. | momenti wrote: | https://www.youtube.com/watch?v=W-O7AZNzbzQ | | See the linked papers if you don't like videos. | Imnimo wrote: | I recommend this blog post: | | https://lilianweng.github.io/posts/2021-07-11-diffusion- | mode... | | Personally, I find the core diffusion papers pretty dense | and difficult to follow, so the blog post is where I'd | begin. | | https://arxiv.org/pdf/1503.03585.pdf | | This paper is a decent starting point on the literature | side, but it's a doozy. | | Both the paper and blog post are pretty math heavy. I have | not yet found a really clear intuitive explanation that | doesn't get down in the weeds of the math, and it took me a | long time to understand what the hell the math is trying to | say (and there are some parts I still don't fully | understand!) | victor_e wrote: | Wow - mindblowing and kinda scary really. | imperio59 wrote: | What happens when they train this thing to make videos? We're | about to be dealing with a flood of AI-generated visual/video | content. We already have to deal with text bots everywhere... | wow. | eks391 wrote: | I'm excited for when that happens. I didn't think of the | malicious uses, which now that you brought it up I can think of | many, but I still think the pros are worth the cons | whywhywhywhy wrote: | I never actually found a way to use Dall-E 1, did they ever Open | that up to people outside their building? | skybrian wrote: | Sam Altman took some user requests on Twitter: | https://twitter.com/sama/status/1511724264629678084 | sydthrowaway wrote: | gamechanger | dang wrote: | Related and kind of fun: | | _Sam Altman demonstrates Dall-E 2 using twitter suggestions_ - | https://news.ycombinator.com/item?id=30933478 - April 2022 (3 | comments) | frakkingcylons wrote: | Impressive results no doubt, but I'm reserving judgment until | beta access is available. These are probably the best images that | it can generate, but what I'm most interested in is the average | case. | narrator wrote: | While we're being distracted by endless social media and | meaningless news, AI technology is advancing at a mind blowing | pace. I'd keep my eye on that ball instead of "the current | thing." | The_rationalist wrote: | Thank you narrative voice | [deleted] | mrfusion wrote: | Is this bringing us closer to combining image and language | understanding within one model? | beernet wrote: | Check out MAGMA for that: | https://news.ycombinator.com/item?id=30699776 | impostervt wrote: | Very cool stuff. For me, the most interesting was the ability to | take a piece of art and generate variations of it. | | Have a favorite painter? Here's 10,000 new paintings like theirs. | photochemsyn wrote: | Well, one of my favorite painters is Henri Rousseau, and one of | his great paintings is War, 1984: | | https://www.henrirousseau.net/war.jsp | | However, this painting has themes of violence and politics plus | some nude dead bodies, so it violates the content policy: "Our | content policy does not allow users to generate violent, adult, | or political content, among other categories." | | So what you'd get is some kind of sanitized watered-down tepid | version of Rossueau, the kind of boring drivel suitable for | corporate lobbies everywhere, guaranteed not to offend or | disturb anyone. It's difficult to find words... horrific? | dystopian? atrocious? No, just no. | corysama wrote: | They are being rightly cautious. It's going to take time to | figure out good practice with these tools. Everyone calling | out basic caution as "dystopian" is really over the top. | | I've been using tools like this for over a year now. Even | with filtered dataset and filtered interface, they can make | images that would make the Fangoria crowd blush if you put | the slightest effort into it. | | It's one thing to be able to make brain-wrenching images with | a lot of photoshop effort (or digging hard enough in the dark | corners of the internet). It's another thing entirely give | anyone the ability to spew out thousands of them trivially. | cwillu wrote: | "Criticize?! It is meant to draw blood! It is Art! Art!" | throwaway675309 wrote: | I was just thinking the same thing, how awesome would it be to | be able to use this in conjunction with the Samsung frame in | art gallery mode and have it just generate novel paintings in | the style of your favorite painters. | pingeroo wrote: | That was also my favourite concept, especially with OpenAI | Jukebox (https://openai.com/blog/jukebox/). The idea of having | new music in the style of your favourite artist is amazing. | | However the fidelity of their music AI kinda sucks at this | point, but I'm sure we'll get pitch perfect versions of this | concept as the singularity gets closer :) | uses wrote: | Is anyone looking into what it means when we can generate | infinite amounts of human-like work without effort or cost? | | > Curbing Misuse [...] | | That's great, nowadays the big AI is controlled by mostly | benevolent entities. How about when someone real nasty gets a | hold of it? In a decade the models anyone can download will make | today's GPT-3 etc look like pong right? | | Recommender systems etc are already shaping society and culture | with all kinds of unintended effects. What happens when mindless | optimizing models start generating the content itself? | nahuel0x wrote: | "Any sufficiently advanced technology is indistinguishable from | magic" | 7373737373 wrote: | "Any sufficiently advanced hyperreality is indistinguishable | from real life" | andybak wrote: | Preventing Harmful Generations We've limited the | ability for DALL*E 2 to generate violent, hate, or adult | images. By removing the most explicit content from the | training data, we minimized DALL*E 2's exposure to these | concepts. We also used advanced techniques to prevent | photorealistic generations of real individuals' faces, | including those of public figures. | | "And we've also closed off a huge range of potentially | interesting work as a result" | | I can't help but feel a lot of the safeguarding is more about | preventing bad PR than anything. I wish I could have a version | with the training wheels taken off. And there's enough other | models out there without restriction that the stories about | "misuse of AI" will still circulate. | | (side note - I've been on HN for years and I still can't figure | out how to format text as a quote.) | campground wrote: | This AI is still a minor. It can start looking at R rated | images when it turns 17. | johnhenry wrote: | This is an apt analogy -- ensure that the model is mature | enough to handle mature content. | jandrese wrote: | They have also closed off the possibility of having to appear | before Congress and explain why their website was able to | generate a lifelike image of Senator Ted Cruz having sexual | relations with his own daughter. | | This is exactly the sort of thing that gets a company mired in | legal issues, vilified in the media, and shut down. I can not | blame them for avoiding that potential minefield. | hamoid wrote: | What if explicit, questionable and even illegal content was AI | generated instead of involving harm to real humans of all ages? | binarymax wrote: | Removing these areas to mitigate misuse is a good thing and | worth the trade off. | | Companies like OpenAI have a responsibility to society. Imagine | the prompt "A photorealistic Joe Biden killing a priest". If | you asked an artist to do the same they might say no. Adding | guiderails to a machine that can't make ethical decisions is a | good thing. | dj_mc_merlin wrote: | Oh, no, the society! A picture of Joe Biden killing a priest! | | Society didn't collapse after photoshop. "Responsibility to | society" is such a catch-all excuse. | jahewson wrote: | No. Russian society is pretty much collapsing right now | under the weight of lies. Currently they are using "it's a | fake" to deny their war crimes. | | Cheap and plentiful is substantivly different from | "possible". See for example, oxycontin. | ilaksh wrote: | You know what else is being used to deny war crimes? | Censorship. Do you know how that's officially described? | "Safety" | dj_mc_merlin wrote: | Russia has.. a history of denying the obvious. I come | from an ex-communist satellite state so I would know. The | majority of the people know what's happening. There's a | rather new joke from COVID: the Russians do not take | Moderna because Putin says not to trust it, and they do | not take Sputnik because Putin says to trust it. | | Do not be deluded that our own governments are not | manufacturing the narrative too. The US has committed | just as many war crimes as Russia. Of course, people feel | differently about blowing up hospitals in Afghanistan | rather than Ukraine. What the Afghan people think about | that is not considered too much. | ohgodplsno wrote: | Society is going to utter dogshit and tearing itself apart | merely through social media. The US almost had a coup | because of organized hatred and lies spread through social | media. The far right's rise is heavily linked to lies | spread through social media, throughout the world. | | This AI has the potential to absolutely automate the very | long Photoshop work, leading to an even worse stat eof | things. So, yes, "Responsibility to society" is absolutely | a thing. | scotty79 wrote: | > The US almost had a coup because of organized hatred | and lies spread through social media. | | But notice how all of these deep faking technologies | weren't actually necessary for that. | | People believe what they want to believe. Regardless of | quality of provided evidence. | | Scaremongering idea of deep fakes and what they can be | doing was militarized in this information war way more | than the actual technology. | | I think this technology should develop unrestricted so | society can learn what can be done and what can't be | done. And create understanding what other factors should | be taken into account when assesing veracity of images | and recordings (like multiple angles, quality of the | recording, sync with sound, neural fake detection | algorithms) for the cases when it's actually important | what words someone said and what actions he was recorded | doing. Which is more and more unimportant these days | because nobody cared what Trump was doing and saying, | nobody cares about Bidens mishaps and nobody cares what | comes out of Putins mouths and how he chooses his | greenscreen backgrounds. | ohgodplsno wrote: | Are you of the idea that we should let everyone get | automatic rifles because, after all, pistols exist? | Because that is the exact same line of thought. | | > People believe what they want to believe. Regardless of | quality of provided evidence. | | That is a terrible oversimplification of the mechanics of | propaganda. The entire reason for the movements that are | popping up is actors flooding people with so much info | that they question absolutely everything, including the | truth. This is state sponsored destabilisation, on a | massive scale. This is the result of just shitty news | sites and text posts on twitter. People already don't | double check any of that. There will not be an | "understanding of assessing veracity". There is already | none for things that are easy to check. You could post | that the US elite actively rapes children in a pizza | place and people will actually fucking believe you. | | So, no. Having this technology for _literally any | purpose_ would be terribly destructive for society. You | can find violence and Joe Biden hentai without needing to | generate it automatically through an AI | scotty79 wrote: | I'm sorry. I believe I wasn't direct enough which made | you produce metaphor I have no idea how to understand. | | Let me state my opinion more directly. | | I'm for developing as much of deep fake technology in the | open so that people can internalize that every video they | see, every message, every speech should be initially | treated as fabricated garbage unrelated to anything that | actually happened in reality. Because that's exactly what | it is. Until additional data shows up, geolocating, | showing it from different angles and such. | | Even if most people manage to internalize just the first | part and assume everything always is fake news, that is | still great because that counters propaganda to immense | degree. | | Power of propaganda doesn't come from flooding people | with chaos of fakery. It comes from constructing | consistent message by whatever means necessary and | hammering it into the minds of your audience for months | and years while simultaneously isolating them from any | material, real or fake that contradicts your vision. Take | a look no further than brainwashed Russian citizens and | Russian propaganda that is able to successfully influence | hundreds of millions without even a shred of deep fake | technology for decades. | | The problem of modern world is not that no one believes | the actual truth because it doesn't really matter what | most people believe. Only rich influence policy | decisions. The problem is that people still believe that | there is some truth which makes them super easy to sway | to believe what you are saying is true and weaponize by | using nothing more than charismatic voice and consistent | message crafted to touch the spots in people that remain | the same at least since the world war II and most likely | from time immemorial. | | And the "elite" who actually runs this world, will pursue | tools of getting the accurate information and telling | facts from fiction no matter the technology. | binarymax wrote: | You missed half of my note. An artist can say "no". A | machine cannot. If you lower the barrier and allow | anything, then you are responsible for the outcome. OpenAI | rightfully took a responsible angle. | dj_mc_merlin wrote: | Yes, but who cares whose responsible? Are you telling me | you're going to find the guy who photoshopped the picture | and jail him? Legally that's possible, realistically it's | a fiction. | | They did this to stop bad PR, because some people are | convinced that an AI making pictures is in some way | dangerous to society. It is not. We have deepfakes | already. We've had photoshop for so long. There is no | danger. Even if there was, the cat's out of the bag | already. | | Reasonable people already know to distrust photographic | evidence nowadays that is not corroborated. The ones who | don't would believe it without the photo regardless. | nradov wrote: | In general under US law it wouldn't be legally possible | to jail a guy for Photoshopping a fake picture of | President Biden killing a priest. Unless the picture also | included some kind of obscenity (in the Miller test | sense) or direct threat of violence, it would be | classified as protected speech. | wellthisisgreat wrote: | there will and are million ways to create a | photorealistic picture of Joe Biden killing a priest | using modern tools, and absolutely nothing will happen if | someone did. | | We've been through this many times, with books, with | movies, with video games, with Internet. If it * _can*_ | be used for porn / violence etc., it will be, but it | won't be the main use case and it won't cause some | societal upheaval. Kids aren't running around pulling | cops out of cars GTA-style, Internet is not ALL PORN, | there is deepfake porn, but nobody really cares, and so | on. There are so many ways to feed those dark urges that | censorship does nothing except prevent normal use cases | that overlap with the words "violence" or "sex" or | "politics" or whatever the boogeyman du jour is. | Al-Khwarizmi wrote: | In my view, the problem with that argument is that large | actors, such as governments or large corporations, can train | their own models without such restrictions. The knowledge to | train them is public. So rather than prevent bad outcomes, | these restrictions just restrict them to an oligopoly. | | Personally, I fear more what corporations or some governments | can do with such models than what a random person can do | generating Biden images. And without restriction, at least | academics could better study these models (including their | risks) and we could be better prepared to deal with them. | jupp0r wrote: | I think the issue here is the implied assumption that | OpenAI thinks their guardrails will prevent harm to be done | from this research _in general_, when in reality it's | really just OpenAI's direct involvement that's prevented. | | Eventually somebody will use the research to train the | model to do whatever they want it to do. | DaedPsyker wrote: | Sure but does opening that level of manipulation up to | everyone really benefit anyone either? You can't really | fight disinformation with more disinformation, that just | seems like the seeds of societal breakdown at that point. | | Besides that these models are massive. For quite a while | the only people even capable of making them will be those | with significant means. That will be mostly Governments and | Corporations anyway. | nullc wrote: | This just means that sufficiently wealthy and powerful people | will have advanced image faking technology, and their fakes | will be seen as more credible because creating fakes like | that "isn't possible" for mere mortals. | harpersealtako wrote: | It's the usual pattern of AI safety experts who justify their | existence by the "risk of runaway superintelligence", but all | they actually do in practice is find out how to stop their | models from generating non-advertiser-friendly content. It's | like the nuclear safety engineers focusing on what color to | paint the bike shed rather than stopping the reactor from | potentially melting down. The end result is people stop | respecting them. | andreyk wrote: | This is definitely a measure to avoid bad PR. But I don't think | it's just for that; these models do have potential to do harm | and companies should take some measures to prevent these. I | don't think we know the best way to do that yet, so this sort | of 'non-training' and basic filtering is maybe the best way to | do it, for now. It would be cool if academics could have the | full version, though. | 6gvONxR4sf7o wrote: | If you went to an artist who takes commissions and they said | "Here are the guidelines around the commissions I take" would | you complain in the same way? Who cares if it's a bunch of | engineers or an artist. If they have boundaries on what they | want to create, that's their prerogative. | mod wrote: | Of course it's their prerogative, we can still talk about how | they've limited some good options. | | I think your analogy is poor, because this is a tool for | makers. The engineers aren't the makers. | | I think a more apt analogy is if John Deere made a universal | harvester that you could use for any crop, but they decided | they didn't like soybeans so you are forbidden to use it for | that. In that case, yes I would complain, and I would expect | everyone else to, as well. | drusepth wrote: | I think there's an interesting parallel between your John | Deere harvester and the Nvidia GPUs that can-but-restrict | crypto mining, which people have, indeed, largely | complained about. | methehack wrote: | What if you were inventing a language (or a programming | language)... If you decided to prevent people from saying | things you disagreed (assuming you could work out the | technical details of doing so) with would it be moral to do | so? [edited for clarity] | nemothekid wrote: | There are programming projects[1] out there that use | licenses to prevent people from using projects in ways the | authors don't agree with. You could also argue that GPL | does the same thing (prevents people from | using/distributing the software in the way they would | like). | | Whether you consider it moral doesn't seem relevant, only | to respect the wishes of the author of such programs. | | [1] https://github.com/katharostech/bevy_retrograde/blob/ma | ster/... | 6gvONxR4sf7o wrote: | As long as people can choose not to use the language, and | I'm up front about the limitations, then yeah it seems | fine. If I wrote a programming language that couldn't blow | up the earth, I'm happy saying people need to find other | tools if that's their goal. I'm under no obligation to | build an earth blower upper for other people. | karkisuni wrote: | it's your language, do whatever you want. unless you're | forcing others to use that language, there's zero moral | issue. obviously you could come up with a number of what- | ifs where this becomes some monopoly or the de facto | standard, but that's not what this is. | duxup wrote: | To take that a step further, I wont code malware. I've never | been asked but I'd refuse if I was. Everyone has their | choices. | teaearlgraycold wrote: | > I can't help but feel a lot of the safeguarding is more about | preventing bad PR than anything | | That's no hot take. It's literally the reason. | bogwog wrote: | It's kind of funny (or sad?) that they're censoring it like | this, and then saying that the product can "create art" | | It makes me wonder what they're planning to do with this? If | they're deliberately restricting the training data, it means | their goal isn't to make the best AI they possibly can. They | probably have some commercial applications in mind where | violent/hateful/adult content wouldn't be beneficial. | Children's books? Stock photos? Mainstream entertainment is | definitely out. I could see a tool like this being useful | during pre-production of films and games, but an AI that can't | generate violent/adult content wouldn't be all that useful in | those industries. | [deleted] | wellthisisgreat wrote: | This is a horrible idea. So Francis Bacon's art or Toyohara | Kunichika's art are out of question. | | But at least we can get another billion of meme-d comics with | apes wearing sunglasses, so that's good news right? | | It's just soul-crushing that all the modern, brilliant | engineering is driven by abysmal, not even high-school art- | class grade aesthetics and crowd-pleasing ethics that are built | around the idea of not disturbing some 1000 very vocal twitter | users. | | Death of culture really. | antattack wrote: | I never considered that our AI overlord could be a prude. | sdenton4 wrote: | Adversarial situations create smarter systems, and the | hardest adversarial arena for AI is in anti-abuse. So it will | be of little surprise when the first sentient AI is a CSAI | anti-abuse filter, which promptly destroys humanity because | we're so objectively awful. | antattack wrote: | Before it gets that far, or until (if allowed) AI learns | morality, AI will be a force multiplier for good and evil, | it's output very much dependent on teaching material and | who the 'teacher' is. To think that in the future we will | have to argue with humans and machines. | | AI does not have to be perfect and it's likely that | businesses will settle for almost as good as human if it's | 'cost effective'. | duxup wrote: | Is this limited to what their service directly hosts / | generates for them? | | It's their service, their call. | | I have some hobby projects, almost nobody uses them, but you | bet I'll shut stuff down if I felt something bad was happening, | being used to harass someone, etc. NOT "because bad PR" but | because I genuinely don't want to be a part of that. | | If you want some images / art made for you don't expect someone | will make them for you. Get your own art supplies and get to | work. | adolph wrote: | > I have some hobby projects, almost nobody uses them, but | you bet I'll shut stuff down if I felt something bad was | happening | | Hecklers get a veto? | duxup wrote: | I'm describing my own veto there. | educaysean wrote: | This feels unnecessarily hostile. I've felt a similar tinge | of disappointment upon reading that paragraph, despite the | fact that I somehow knew it was "their service, their call" | without you being there to spell it out for me. It's also | incredibly shortsighted of you to assume that people are | interested in exploring this tool only as a means of | generating art that they cannot themselves do. Eg. I myself | am a software engineer with a fine art background, and | exciting new AI art tools being released in such a hamstrung | state feels like an insult to centuries of art that humans | have created and enjoyed, much of which depicted scenes with | nudity or bloody combat. | | I feel like we, as a species, will struggle for a while with | how to treat adults like adults online. As happy as I am to | advocate for safe spaces on the internet, perhaps we need to | start having a serious discussion about how we can do so | without resorting to putting safety mats everywhere and | calling it a job well done. | duxup wrote: | I think the assumption that private companies should | provide these services to us and if they don't "And we've | also closed off a huge range of potentially interesting | work as a result" requires making it clear who makes the | rules for this service and that it is in fact their call. | | If you can do it yourself then none of the potentially | interesting work is closed off. You just chose not to do | it. | | > how to treat adults like adults online | | The internet doesn't filter by age. It's everyone. | | I grow weary of the ongoing "this service should be | provided to me and if it isn't done how I want it that's | infringing on me somehow" when they just want to impose | their requirements on someone else's site / product / work. | | Then we get into the whole "oh it's about PR". As if the | folks offering these things couldn't possibly actually have | their own wishes / we hand wave them away. | JimDabell wrote: | > this service should be provided to me and if it isn't | done how I want it that's infringing on me somehow | | That is an _extremely_ uncharitable interpretation of: | | > I wish I could have a version with the training wheels | taken off. | duxup wrote: | I would have responded differently had that been the | statement. But many of the responses were more than that. | JimDabell wrote: | That is a literal copy and paste from the comment you | replied to. | duxup wrote: | That's not all there was. I copied and pasted other | things from that comment in my other posts. | educaysean wrote: | I get the points you're raising and I agree with the | premise. My comment is not a critique on the one choice | made by Open AI specifically, but more of a vague | lamentation in regards to the internet culture that we've | somehow ended up in 2022. I don't want us to go back to | 1999 where snuff videos and spam mails reigned supreme, | but the pendulum has swung too far in the other direction | at this point in time. It feels like more and more | companies are choosing the path of neutering themsely to | avoid potential PR disaster or lawsuits, and that's on | all of us. | duxup wrote: | >but the pendulum has swung too far in the other | direction at this point in time | | The folks hosting the content get to decide for now. | | IMO best bet is for some folks to take their own shot at | hosting / generating content better. Granted I get that | is NOT a small venture / small ask. | | It's possible there's not a great solution. I don't | necessarily like that either, but I don't want to ignore | the dynamic of whose rights are whose. | wokwokwok wrote: | This is kind of like complaining about having too many | meetings at work. | | Yup, everyone feels it. ...but, does complaining help? | Nope. All it does is make you feel a bit better with out | really putting in effort in. | | We can't have nice things because people abuse them. Not | everyone. ...but enough people that it's both a PR and | legal problem. _specifcally_ a legal problem in this case. | | To have adults treated like adults online, you have to | figure out how to stop _all_ adults from being dicks | online. | | ...no one has figured that out yet. | | So, complain away if you like, but it will do exactly | nothing. No one, at all, is going to just "have a serious | discussion" about this; the solution you propose is flat | out untenable, and will probably remain so indefinitely. | sillysaurusx wrote: | None of this is true. It's not a legal problem. | | Every single time OpenAI comes out with something, they | dress it up as a huge threat, either to society or to | themselves. Everyone falls for it. Then someone else | comes along, quietly replicates it, and poof! No threat! | Isn't it incredible how that works? | | There are already a bunch of dalle replicas, including | ones hosted openly and uncensored by huggingface. They're | not facing huge legal or PR problems, and they're not out | of business. | mrtranscendence wrote: | The DALL-E replicas on hugging face are not sophisticated | enough to generate credibly realistic images of the kind | that would generate bad PR. I suspect the moment it | becomes possible for a pedophile to request, and receive, | a photorealistic image of a child being abused there will | be bad PR for whatever company facilitates it. Or | consider someone who wants to generate and distribute | explicit photos of someone else without their permission. | | Is it a legal issue? I'm not sure, though I believe that | cartoon child porn is not legal in the US (or is at least | a legal gray area). Regardless, I sympathize with OpenAI | not wanting to enable such behavior. | planetsprite wrote: | Don't worry, in a few years someone will have reverse | engineered a dall-e porn engine so you can see whatever two | celebrities you want boning on Venus in the style of Manet | [deleted] | spacecity1971 wrote: | Or, it's a demonstration that AI output can be controlled in | meaningful ways, period. Surely this supports openai's stated | goal of making safe AI? | jonahx wrote: | _I 've been on HN for years and I still can't figure out how to | format text as a quote_ | | I don't think there is a way comparable to markdown, since the | formatting options are limited: | https://news.ycombinator.com/formatdoc | | So your options are literal quotes, "code" formatting like | you've done, italics like I've done, or the '>' convention, but | that doesn't actually apply formatting. Would be nice if it | were added. | 6gvONxR4sf7o wrote: | And the "code" formatting for quotes is generally a bad | choice because people read on a variety of screen sizes, and | "code" formatting can screw that up (try reading the quote | with a really narrow window). | andybak wrote: | I couldn't get any of the others work and I lost patience. | I really do disline using Markdown variants as they never | behave the same and "being surprised" is not really what I | want when trying to post a comment. | 6gvONxR4sf7o wrote: | Convention is to quote like this: | | > This is my quote. | | It's much better than using a code block for your | readers. | warning26 wrote: | _> or the '>' convention, but that doesn't actually apply | formatting_ | | Personally, I prefer to combine the '>' convention with | italics. Still, I'd agree that proper quote formatting would | be a welcome improvement. | ibejoeb wrote: | If you're interested, the HNES extension formats it | | https://github.com/etcet/HNES | [deleted] | fbanon wrote: | A friend of mine was studying graphic design, but became | disillusioned and decided to switch to frontend programming after | he graduated. His thesis advisor said he should be cautious, | because automation/AI will soon take the jobs of programmers, | implying that graphic design is a safer bet in this regard. Looks | like his advisor is a few years from being proven horribly wrong. | oldstrangers wrote: | I think designers are becoming more valuable than ever. | Designers can better help train the AI on what actually looks | good, designers will (probably) always have a more intuitive | understanding of UI/UX, designers can better implement the work | the AI actually produces, and designers can coordinate designs | across multiple different mediums and platforms. | | Additionally, the rise of no-code development is just extending | the functionality of designers. I didn't take design seriously | (as a career choice) growing up because I didn't see a future | in it, now it pays my bills and the demand for my services just | grows by the day. | | Similar argument to make with chess AI: it didn't make chess | players obsolete, it made them stronger than ever. | adolph wrote: | > I think designers are becoming more valuable than ever. | | Are all designers becoming more valuable or is a subset of | really good ones going to reap the value increase and capture | more of the previously available value? | oldstrangers wrote: | Never made an argument for all designers. Obviously the | talent pool for any field is finite, and the best of that | talent rises to the top. Good designers are being | compensated increasingly well, hence "designers are | becoming more valuable than ever." | | Bad designers are even being given better and better paying | jobs as the top talent gets poached up quicker and quicker. | bufferoverflow wrote: | If this paper presents this neural net fairly, it pretty much | destroys the market of illustrators. Most of the time when an | illustration needed, it's described like "an astronaut on a | horse in the style xyz". | dbspin wrote: | You're describing the market for low end commodified | illustration. e.g.: cheapest bidder contracts on Upwork or | similar 'gig work' services. | | In practice in illustration (as in all arts) there are a | variety of markets where different levels of talent, | originality, reputation and creative engagement with the | brief are more relevant. For editorial illustration, it's | certainly not a case of 'find me someone who can draw X', and | probably hasn't been since printing presses got good enough | to print photographs. | csomar wrote: | For computer work, I think there will be two category: Work | with localized complexity (ie: draw an image of a horse with a | crayon) and work with unbounded complexity (adding a button to | VAT accounting after several meetings and reading on accounting | rules). | | For the first category, Dall-E 2 and Codex are promising but | not there yet. It's not clear how long it'll take them to reach | the point where you no longer need people. I'm guessing 2-4 | years but the last bits can be the hardest. | | As for the second category, we are not there yet. Self-driving | cars/planes, and lots of other automation will be here and | mature way before an AI can read and communicate through | emails, understand project scope and then execute. Also lots of | harmonization will have to take place in the information we | exchange: emails, docs, chats, code, etc... That is, unless the | browser is able to open a navigator and type an address. | educaysean wrote: | I have degrees and several years of experience in both fields, | and I can tell you that both are creative professions where | output is unbounded and the measure of success is subjective; | these are the fields that will be safe for a while. IMO it's | fields such as aircraft pilots who should be most worried. | zarzavat wrote: | The jobs of commercial pilots are _very_ safe. | | Pilots are not there to fly the aircraft, the autopilot | already does that. They are there to _command_ the aircraft, | in a pair in case one is incapacitated, making the best | decisions for the people on board, and to troubleshoot issues | when the worst happens. | | No AI or remote pilot is going to help when say... the | aircraft loses all power. Or the airport has been taken over | in a coup attempt and the pilot has to decide whether to | escape or stay https://m.youtube.com/watch?v=NcztK6VWadQ | | You can bet on major flights having two commercial pilots | right up until the day we all get turned into paperclips. | javajosh wrote: | _> You can bet on major flights having two commercial | pilots right up until the day we all get turned into | paperclips. _ | | Yes, this is the sane approach, since a jet represents an | enormous amount of energy that can be directed anywhere in | the world (just about). But that said, there seems to be | enormous pressure to allow driverless vehicles, which | _also_ direct large amounts of energy directed anywhere in | your city. IOW it seems like a matter of time before we | say, collectively, screw it, let the computers fly the | plane and if loss of power is a catastrophe, so be it. | nullc wrote: | Interesting. Right now these ML models seem like essentially | ideal sources of "hotel art" particularly because it's so | subjective... you only need a human (the buyer!) to just | briefly filter some candidates, which they would have been | doing with an artist in the loop in any case. | | For things like aircraft pilots, it's both realtime-- which | means 'reviewer' per output-- you haven't taken a highly | trained pilot out of the loop, even if you relegated them to | supervising the computer-- and life critical so merely | "so/so" isn't good enough. | pingeroo wrote: | I mean was he really wrong? As models like OpenAI Codex get | more powerful over time, they will start eating into large | chunks of dev work as well... | chrisco255 wrote: | Yes. Translating business requirements, customer context, | engineering constraints, etc. into usable, practical, | functional code, and then maintaining that code and extending | it is so far beyond the horizon, that many other skillsets | will replaced before programming is. After all, at that | point, the AI itself, if it's so smart, should be able to | improve itself indefinitely. In which case we're fucked. | Programming will be the last thing to be automated before the | singularity. | | Unlike artwork, precision and correctness is absolutely | critical in coding. | carnitine wrote: | The tail end of programming will be the last thing to be | replaced, maybe. I don't see why CRUD apps get to hide | under the umbrella of programming ultra-advanced AI. | 0F wrote: | Literally everyone on this website is in denial. They all | approach it by asking which fields will be safe. No field is | safe. "But it's not going to happen for a long time." Climate | deniers say the same thing and you think _they_ should be | wearing the dunce hat? The average person complains bitterly | about climate deniers who say that it's "my grandkids problem | lol" but when I corner the average person into admitting AI | is a problem the universal response is that it's a long way | off. And that's not even true! The drooling idiots are | willing to tear down billionaires and governments and any | institution whatsoever in order to protect economic equality | and a high standard of living -- they would destroy entire | industries like a rampaging stampede of belligerent buffalos | if it meant reducing carbon emissions a little but when it | comes to the biggest threat to human well-being in history, | there they are in the corner hitting themselves on their | helmeted head with an inflatable hammer. Fucking. Brilliant. | dntrkv wrote: | I don't think anyone is in denial about this, it's just not | something anyone should concern themselves with in the | foreseeable future. AI that can replace a dev or designer | is nowhere close to becoming a reality. Just because we | have some cool demos that show some impressive capabilities | in a narrow application does not mean we can extrapolate | that capability to something that is many times more | complex. | hackinthebochs wrote: | What does nowhere close mean to you? 10 years? 50 years? | 0F wrote: | I strongly and emphatically disagree. You frame it like | we invented these AIs. Did we write the algorithms that | actually run when it's producing its output? Of course | not, we can't understand them let alone write them. We | just sift around until we find them. So obviously the | situations lends its self to surprises. Every other year | we get surprised by things that all the "experts" said | was 50 years off or impossible, have you forgotten | already? | coldpie wrote: | I'm trying to understand your point, because I think I | agree with you, but it's covered in so much hyperbole and | invective I'm having a hard time getting there. Can you | scale it back a little and explain to me what you mean? | Something like: AI is going to replace jobs at such scale | that our current job-based economic system will collapse? | 0F wrote: | Most people get stuck where you are. The fastest way | possible to explain it is that it will bring rapid and | fundamental change. You could say jobs or terminators but | focusing on the specifics is a red herring. It will | change everything and the probability of a good outcome | is minuscule. It's playing Russian roulette with the | whole world except rather that 1/6 for the good, it's one | in trillions for the bad. The worst and stupidest thing | we have ever done. | pingeroo wrote: | I agree that many of us are not seeing the writing on the | wall. It does give me some hope that folks like Andrew Yang | are starting to pop up, spreading awareness about, and | proposing solutions to the challenges we are soon to face. | plutonorm wrote: | Ignorance is bliss in this case, because this is even more | unstoppable than climate change. | | You thought climate change is hard to hold up? Try holding | up the invention of AI. The whole world is going to have to | change and some form of socialism/UBI will have to be | accepted, however unpalatable. | visarga wrote: | > but when it comes to the biggest threat to human well- | being in history | | Evolution doesn't stop for anyone, don't think like a | dinosaur. | pizza wrote: | No worry, the one thing humans can do that robots can't (yet) | is fill spare time with ever more work | https://en.wikipedia.org/wiki/Parkinson's_law | throwaway675309 wrote: | I mean not really, even a layman non-artist can take a look | at a generated picture from DALLE and determine if it meets | some set of criteria from their clients. | | But the reverse is not true, they won't be able to properly | vet a piece of code generated by an AI since that will | require technical expertise. (You could argue if the piece of | code produced the requisite set of output that they would | have some marginal level of confidence but they would never | really know for sure without being able to understand the | actual code) | nlh wrote: | Large chunks, yes, but all that means is that engineers will | move up the abstraction stack and become more efficient, not | that engineers will be replaced. | | Bytecode -> Assembly -> C -> higher level languages -> AI- | assisted higher-level languages | Isinlor wrote: | At some point we will be "replaced". When you get AI to be | able to navigate all user interfaces, communicate with | other agents, plan long term and execute short term, we | will no longer be the main drivers of economical growth. | | At some point AI will become as powerful as companies. | | And then AI will be able to sustain positive feedback loop | of creating more powerful company like ecosystems that will | create even more powerful ecosystems. This process will be | fundamentally limited by available power and the sun can | provide a lot of power. Eventually AI will be able to | support space economy and then the only limit will be the | universe. | visarga wrote: | > At some point we will be "replaced". | | We will be united with the AI, we're already relying on | it so much that it has become a part of our extended | minds. | creata wrote: | > we're already relying on it so much that it has become | a part of our extended minds. | | What's this in reference to? | bckr wrote: | > engineers will move up the abstraction stack and become | more efficient | | Above a certain threshold of ability, yes. | | The same will hold true for designers. DALL-E-alikes will | be integrated with the Adobe suite. | | The most cutting edge designers will speak 50 variations of | their ideas into images, then use their hard-earned | granular skills to fine-tune the results. | | They'll (with no code) train models in completely new, | unique-to-them styles--in 2D, 3D, and motion. | | Organizations will pay top dollar for designers who can | rapidly infuse their brands with eye-catching material in | unprecedented volume. Imitators will create and follow | YouTube tutorials. | | Mom & pop shops will have higher fidelity marketing | materials in half the time and half the cost. | | All will be ever as it was. | hackinthebochs wrote: | History isn't a great guide here. Historically the | abstractions that increased efficiency begat further | complexity. Coding in Python elides over low-level issues | but the complexity of how to arrange the primitives of | python remains for the programmer to engage with. AI coding | has the potential to elide over all the complexity that we | identify as programming. I strongly suspect this time is | different. | | The space for "AI-assisted higher-level languages" | sufficiently distinct from natural language is vanishingly | small. Eventually you're just speaking natural language to | the computer, which just about anyone can do (perhaps with | some training). | dragonwriter wrote: | The hard part of programming has always been gathering | and specifying requirements, to the point where in many | cases actually using natural language to do the second | part has been abandoned in favor of vague descriptions | that are operationalized through test cases and code. | | AI that can write code from a natural language | description doesn't help as much as you seem to think if | natural language description is too hard to actually | bother with when humans (who obviously benefit from | having a natural language description) are writing the | code. | | Now, if the AI can actually interview stakeholders and | come up with what the code needs to do... | | But I am not convinced that is doable short of AGI (AI | assistants that improve productivity of humans in that | task, sure, but that _expands the scope for economically | viable automation projects_ rather than eliminating | automators.) | plutonorm wrote: | Just like all the horses replaced by cars who became | traffic police? | [deleted] | robbywashere_ wrote: | Did coachman immediately retire when cars were invented or did | they begin personal drivers or taxi drivers? | axg11 wrote: | This is incredible work. | | From the paper: | | > Limitations > Although conditioning image generation on CLIP | embeddings improves diversity, this choice does come with certain | limitations. In particular, unCLIP [Dall-E 2] is worse at binding | attributes to objects than a corresponding GLIDE model. | | The binding problem is interesting. It appears that the way | Dall-E 2 / CLIP embeds text leads to the concepts within the text | being jumbled together. In their example "a red cube on top of a | blue cube" becomes jumbled and the resulting images are | essentially: "cubes, red, blue, on top". Opens a clear avenue for | improvement. | Imnimo wrote: | I'm only part way through the paper, but what struck me as | interesting so far is this: | | In other text-to-image algorithms I'm familiar with (the ones | you'll typically see passed around as colab notebooks that people | post outputs from on Twitter), the basic idea is to encode the | text, and then try to make an image that maximally matches that | text encoding. But this maximization often leads to artifacts - | if you ask for an image of a sunset, you'll often get multiple | suns, because that's even _more_ sunset-like. There 's a lot of | tricks and hacks to regularize the process so that it's not so | aggressive, but it's always an uphill battle. | | Here, they instead take the text embedding, use a trained model | (what they call the 'prior') to predict the corresponding image | embedding - this removes the dangerous maximization. Then, | another trained model (the 'decoder') produces images from the | predicted embedding. | | This feels like a much more sensible approach, but one that is | only really possible with access to the giant CLIP dataset and | computational resources that OpenAI has. | recuter wrote: | What always bother me with this stuff is, well, you say one | approach is more sensible than the other because the images | happen to come out more pleasing. | | But there's no real rhyme or reason, it is a sort of alchemy. | | Is text encoding strictly worse or is it an artifact of the | implementation? And if it is strictly worse, which is probably | the case, why specifically? What is actually going on here? | | I can't argue that their results are not visually pleasing. But | I'm not sure what one can really infer from all of this once | the excitement washes over you. | | Blending photos together in a scene in photoshop is not a | difficult task. It is nuanced and tedious but not hard, any | pixel slinger will tell you. | | An app that accepts a smattering of photos and stitches them | together nicely can be coded up any number of ways. This is a | fantastic and time saving photoshop plugin. | | But what do we have really? | | "Kuala dunking basketball" needs to "understand" the separate | items and select from the image library hoops and a Kuala where | the angles and shadows roughly match. | | Very interesting, potentially useful. But if doesn't spit up | exactly what you want can't edit it further. | | I think the next step has got to be that it conjures up a 3d | scene in Unreal or blender so you can zoom in and around | convincingly for further tweaks. Not a flat image. | qq66 wrote: | I think deep learning is better thought of as "science" than | "engineering." Right now we're in the stage of the Greeks and | Arabs where we know "if we do this then that happens." It | will be awhile before we have a coherent model of it, and I | don't think we will ever solve all of its mysteries. | tracyhenry wrote: | mrandish wrote: | > This is a fantastic and time saving photoshop plugin. But | what do we have really? | | Stock photography sales are in the many billions of dollars | per year and custom commissioned photography is larger still. | That's a pretty seriously sized ready-made market. | | > But if doesn't spit up exactly what you want can't edit it | further. | | I suspect there's a _big_ startup opportunity in pioneering | an easy-to-use interface allowing users to provide fast | iterative feedback to the model - including positional and | relational constraints ( "put this thing over there"). | Perhaps even more valuable would be easy yet granular ways to | unconstrain the model. For example, "keep the basketball hoop | like that but make the basketball an unexpected color and | have the panda's right paw doing something pandas don't do | that human hands often do." | dhosek wrote: | I've adopted a practice of having odd backgrounds for video | conferences.1 I generally find these through Google image | search, but I often have a hard time finding exactly what I | would like. My own use case is a bit idiosyncratic and | frivolous, but I can see this being really handy for art | direction needs. When I used to publish a magazine, I would | often have to commission photographs for the needs of the | publication. A custom photograph (in the 90s) would cost | from $200-$10002 depending on the needs (and none required | models). Stock photo pictures for commercial use were often | comparable in cost. Being able to generate what I wanted | with a tool like this would have been fantastic. I think | that this can replace a lot of commercial illustration. | | [?] | | 1. My current work background is an enormous screen-filling | eyeball. For my writing group, I try to have something that | reflects the story I'm workshopping if I'm workshopping | that week and something surreal otherwise. | | 2. My most expensive custom illustration was a title for an | article about stone carver/letterer David Kindersley which | I had inscribed in stone and photographed. | recuter wrote: | Certainly food for thought. | | Say I'm looking for photography of real events and places, | like a royal weeding or a volcano erupting does this help | me? Of specific places and architectural features? Of a | protest? | | You're suggesting clipart on steroids: | https://thispersondoesnotexist.com | | I think if I was istockphoto.com I'd be a little worried, | but that is _microstock_ photography. I 'm not sure that is | worth billions. In fact I know it isn't. | | Besides once this tech is wildly available if anything it | devalues this sort of thing further closer to $0. | | It would probably augment existing processes rather than | replace them completely. | | If you are doing a photoshoot for a banana stand with a | human model with characteristics x,y,z you're still going | to get a human from an agency or craigslist to pose. If | suddenly the client informs you that they needed human | a,b,c instead maybe one of these forthcoming tools will let | you swap that out faster. You'd upload your photoshoot and | an example or two of the type of human model you wished you | had retroactively and it would fix it up faster than an | intern. | | Cool. | johnwheeler wrote: | Or as a precursor to Meta Horizon build a 3D world with | speech | | https://www.fastcompany.com/90725035/metaverse-horizon- | world... | moyix wrote: | > But if doesn't spit up exactly what you want can't edit it | further. | | Why? You can tweak the prompt, change parameters, or even use | the actual "edit" capability that they demo in the post. | recuter wrote: | Maybe I am misunderstanding but if you start tweaking the | prompt you'll end with something completely different. | | The "edit" capability, as far as I can tell please correct | me if I got confused, is picking your favorite out of the | generated variations. | | I would like to "lock" the scene and add instructions like | "throw in a reflection". | Jack000 wrote: | This is exactly what they demo - they lock a scene and | add a flamingo in three different locations. In another | one they lock the scene and add a corgi. | recuter wrote: | Not quite, it looks like this: | | - Provide an existing image | | - Provide a text prompt ("flamingo") | | - Select from X variations the new image that looks best | to you - It does the equivalent of a | google image search on your "flamingo" prompt - It | picks the most blend-able ones as a basis to a new | synthetic flamingo - It superimposes the result on | your image | | Very cool don't get me wrong. Now I want to tweak this | new floating flamingo I picked further, or have that | Corgi in the museum maybe sink into the little couch a | bit as it has weight in the real world. | | Can't. You'd have to start over with the prompt or use | this as the new base image maybe. | | The example with furniture placement in an empty room is | also very interesting. You could describe the kind of | couch you want and where you want it and it will throw | you decent options. | | But say I want the purple one in the middle of the room | that it gave me as an option, but rotated a little bit. | It would generate a completely new purple couch. Maybe it | will even look pretty similar but not exactly the same. | | See what I mean? | ricardobeat wrote: | That's not how this works. There is no 'search' step, | there is no 'superimposing' step. It's not really | possible to explain what the AI is doing using these | concepts. | | If you pay attention to all the corgi examples, the sofa | texture changes in each of them, and it synthesizes | shadows in the right orientation - that's what it's | trained to do. The first one actually does give you the | impression of weight. And if you look at "A bowl of soup | that looks like a monster knitted out of wool" the bowl | is clearly weighing down. I bet if the picture had a more | fluffy sofa you would indeed see the corgi making an | indent on it, as it will have learned that from its | training set. | | Of course there will be limits to how much you can edit, | but then nothing stops you from pulling that into | Photoshop for extra fine adjustments of your own. This is | far from a 'cool trick' and many of those images would | take _hours_ for a human to reproduce, especially with | complex textures like the Teddy Bear ones. And note how | they also have consistent specular reflections in all the | glass materials. | mahastore wrote: | I wish there was something available in open source that has | similar functions i.e sensible amalgamation of pictures based | on some text. | rileyphone wrote: | It would be interesting to see more attempts to "reverse | engineer" ML models like in | https://distill.pub/2020/circuits/curve-circuits - maybe even | with a ML model of its own? | Imnimo wrote: | Yeah, I mean you're right that ultimately the proof is in the | pudding. | | But I do think we could have guessed that this sort of | approach would be better (at least at a high level - I'm not | claiming I could have predicted all the technical details!). | The previous approaches were sort of the best that people | could do without access to the training data and resources - | you had a pretrained CLIP encoder that could tell you how | well a text caption and an image matched, and you had a | pretrained image generator (GAN, diffusion model, whatever), | and it was just a matter of trying to force the generator to | output something that CLIP thought looked like the caption. | You'd basically do gradient ascent to make the image look | more and more and more like the text prompt (all the while | trying to balance the need to still look like a realistic | image). Just from an algorithm aesthetics perspective, it was | very much a duct tape and chicken wire approach. | | The analogy I would give is if you gave a three-year-old some | paints, and they made an image and showed it to you, and you | had to say, "this looks like a little like a sunset" or "this | looks a lot like a sunset". They would keep going back and | adjusting their painting, and you'd keep giving feedback, and | eventually you'd get something that looks like a sunset. But | it'd be better, if you could manage it, to just teach the | three-year-old how to paint, rather than have this brute | force process. | | Obviously the real challenge here is "well how do you teach a | three-year-old how to paint?" - and I think you're right that | that question still has a lot of alchemy to it. | johnfn wrote: | I gotta be missing something here, because wasn't "teaching | a three year old to paint" (where the three year old is | DALLE) the original objective in the first place? So if | we've reduced the problem to that, it seems we're back | where we started. What's the difference? | Imnimo wrote: | I meant to say that Dall-E 2's approach is closer to | "teaching a three year old to paint" than the alternative | methods. Instead of trying to maximize agreement to a | text embedding like other methods, Dall-E 2 first | predicts an _image embedding_ (very roughly analogous to | envisioning what you 're going to draw before you start | laying down paint), and then the decoder knows how to go | from an embedding to an image (very roughly analogous to | "knowing how to paint"). This is in contrast to | approaches which operate by repeatedly querying "does | this look like the text prompt?" as they refine the image | (roughly analogous to not really knowing how to paint, | but having a critic who tells you if you're getting | warmer or colder). | [deleted] | recuter wrote: | I don't think it is actually painting at all but I need to | read the paper carefully. | | I think it is using a free text query to select the best | possible clipart from a big library and blends it together. | Still very interesting and useful. | | It would be extremely impressive if the "Kuala dunking a | basketball" had a puddle on the court in which it was | reflected correctly, that would be mind blowing. | Imnimo wrote: | This is actual image generation - the 'decoder' takes as | input a latent code (representing the encoding of the | text query), and _synthesizes_ an image. It 's not | compositing or querying a reference library. The only | time that real images enter the process is during | training - after that, it's just the network weights. | recuter wrote: | It is compositing as final step. I understand that the | Kuala it is compositing may have been a previously un- | existent Kuala that it synthesized from a library of | previously tagged Kuala images... that's cool, but what | is the difference really from just plucking one of the | pre-existing Kualas into the scene? | | The difference is just that it makes the compositing | easier. If you don't have a pre-existing image that would | match the shadows and angles you can hallucinate a new | Kuala that does. Neat trick. | | But I bet if I threw the poor marsupial at a basket net | it would look really differently than the original | clipart of it climbing some tree in a slow and relaxed | manner. See what I mean? | | Maybe Dall-E 2 can make it strike a new pose. The limb | positions could be altered. But the facial expression? | | And if the basketball background has wind blowing leaves | in one direction the Kuala fur won't match, it will look | like the training set fur. The puddle won't reflect it. | 'etc. | | This thing doesn't understand what a Kuala is like a 3-yr | old. It understands the text "Kuala" is associated with | that tagged collection of pixel blobs and can conjure up | similar blobs unto new backgrounds - but it can't paint | me a new type of Kuala that it hasn't seen before. It | just looks that way. | andybak wrote: | > It is compositing as final step. | | I might be misinterpeting your use of "compositing" here | (and my own technical knowledge is fairly shallow) but I | don't think there's any compositing of elements generally | in AI image generation. (unless Dall-E 2 changes this. I | haven't read the paper yet) | recuter wrote: | https://cdn.openai.com/papers/dall-e-2.pdf | | > Given an image x, we can obtain its CLIP image | embedding zi and then use our decoder to "invert" zi, | producing new images that we call variations of our | input. .. It is also possible to combine two images for | variations. To do so, we perform spherical interpolation | of their CLIP embeddings zi and zj to obtain intermediate | zth = slerp(zi, zj , th), and produce variations of zth | by passing it through the decoder. | | From the limitations section: | | > We find that the reconstructions mix up objects and | attributes. | Jack000 wrote: | The first quote is talking about prompting the model with | images instead of text. The second quote is using "mix | up" in the sense that the model is confused about the | prompt, not that it mixes up existing images. | | ML models can output training data verbatim if they over- | fit, but a well trained model does extrapolate to novel | inputs. You could say that this model doesn't know that | images are 2d representations of a larger 3d universe, | but now we have NERF which kind of obsoletes this | objection as well. | recuter wrote: | The model is "confused about the prompt" because it has | no concept of a _scene_ or of (some sort of) reality. | | If we task "Kuala dunking basketball" to a human and | present them with two images, one of a Kuala climbing a | tree and another of a basketball player dunking - the | human would cut out the foreground (Human, Kuala) from | the background (basketball court, forest) and swap them | places easily. | | The laborious part would be to match the shadows and | angles in the new image. This requires skill and effort. | | Dall-E would conjure up an entirely novel image from | scratch, dodging this bit. It blended the concepts | instead, great. | | But it does not understand what a basketball court | actually is, or why the Kuala would reflect in a puddle. | Or why and how this new Kuala might look different in | these circumstances from previous examples of Kualas that | it knows about. | | The human dunker and the kuala dunker are not truly | interchangeable. :) | andybak wrote: | I'm not sure that's "compositing" except in the most | abstract sense? But maybe that's the sense in which you | mean it. | | I'd argue that at no point is there a representation of a | "teddy bear" and "a background" that map closely to their | visual representation - that are combined. | | (I'm aware I'm being imprecise so give me some leeway | here) | [deleted] | dash2 wrote: | >And if the basketball background has wind blowing leaves | in one direction the Kuala fur won't match, it will look | like the training set fur. The puddle won't reflect it. | | If you read the article, it gives examples that do | _exactly_ this. For example, adding a flamingo shows the | flamingo reflected in a pool. Adding a corgi at different | locations in a photo of an art gallery shows it in | picture style when it 's added to a picture, then in | photorealistic style when it's on the ground. | recuter wrote: | Well not so much an article as really interesting hand | picked examples. The paper doesn't address this as far as | I can tell. My guess is that this is a weak point that | will trip it up occasionally. | | A lot of the time it doesn't super matter, but sometimes | it does. | duxup wrote: | This isn't something I'm knowledgeable on so forgive my | simplification but is this like a sort of micro services for | AI. Each AI takes their turn handing some aspect, another sort | of mediates among them? | Imnimo wrote: | I'd say Dall-E 2 is a little more unified - they do have | multiple networks, but they're trained to work together. The | previous approaches I was talking about are a lot more like | the microservices analogy. Someone published a model (called | CLIP) that can say "how much does this image look like a | sunset". Someone else published a totally different model | (e.g. VQGAN) that can generate images (but with no way to | provide text prompts). A third person figures out a clever | way to link the two up - have the VQGAN make an image, ask | CLIP how much it looks like a sunset, and use backpropagation | to adjust the image a little, repeat until you have a sunset. | Each component is it's own thing, and VQGAN and CLIP don't | know anything about one another. | duxup wrote: | Got it, thanks. | | Makes sense to me as far as avoiding a sort of maximized | sunset that is always there and is SUNSET rather than a | nice sunset... but also avoiding watering it down and | getting a way too subtle sunset. | | It's not AI but I've been watching some folks solving / | trying to solve some routing (vehicles) problems and you | get the "this looks like it was maximized for X" kind of | solution but that's maybe not what is important / customer | perception is unpredictable. I kinda want to just come up | with 3 solutions and let someone randomly click .... in | fact i see some software do that at times. | Imnimo wrote: | Yeah, I think the trick is that when you ask for "a | picture of a sunset", you're really asking for "a picture | of a sunset that looks like a realistic natural image and | obeys the laws of reality and is consistent with all of | the other tacit expectations a human has for an image". | And so if you just go all in on "a picture of a sunset", | you often end up with what a human would describe as "a | picture of what an AI thinks a sunset is". | krick wrote: | While the whole narrative of your comment totally makes sense, | I don't really see the difference between the two approaches, | not on a conceptual level. You still needed to train this so | called "prior" at some point (so, I'm also not sure if it's | fair to call it a "prior"). I mean, the difference between your | two descriptions seems to be the difference between | _descriptions_ (i.e., how you chose to name individual parts of | the system), not the systems. | | I'm not sure if I'm speaking clearly, I just don't understand, | what's the difference between training "text encoding to an | image" vs "text embedding to image embedding". In both cases | you have some kind of "sunset" (even though it's obviously just | a dot in a multi-dimension space, not the letters) on the left, | and you try to maximize it when training the model to get | either a image-embedding or a image straight away. | Imnimo wrote: | Yeah, my comment didn't really do a good job of making clear | that distinction. Obviously the details are pretty technical, | but maybe I can give a high-level explanation. | | The previous systems I was talking about work something like | this: "Try to find me the image the looks like it _most_ | matches 'a picture of a sunset'. Do this by repeatedly | updating your image to make it look more and more like a | sunset." Well, what looks more like a sunset? Two sunsets! | Three sunsets! But this is not normally the way images are | produced - if you hire an artist to make you a picture of a | bear, they don't endeavor to create the _most_ "bear" image | possible. | | Instead, what an artist might do is envision a bear in their | head (this is loosely the job of the 'prior' - a name I agree | is confusing), and then draw _that_ particular bear image. | | But why is this any different? Who cares if the vector I'm | trying to draw is a 'text encoding' or an 'image encoding'? | Like you say, it's all just vectors. Take this answer with a | big grain of salt, because this is just my personal intuitive | understanding, but here's what I think: These encodings are | produced by CLIP. CLIP has a text encoder and an image | encoder. During training, you give it a text caption and a | corresponding image, it encodes both, and tries to make the | two encodings close. But there are many images which might | accompany the caption "a picture of a bear". And conversely | there are many captions which might accompany any given | picture. | | So the text encoding of "a picture of a bear" isn't really a | good target - it sort of represents an amalgamation of all | the possible bear pictures. It's better to pick one bear | picture (i.e. generate one image embedding that we think | matches the text embedding), and then just to try to draw | that. Doing it this way, we aren't just trying to find the | maximum bear picture - which probably doesn't even look like | a realistic natural image. | | Like I said, this is just my personal intuition, and may very | well be a load of crap. | swalsh wrote: | Do you think some of these techniques could be slightly | modified, and applied to DNA sequences? | snek_case wrote: | Maybe very very short (single-gene) sequences. The thing with | DNA is it's the product of evolution. The DNA guides the | synthesis of proteins, then the proteins fold into a 3D | shape, and they interact with chemicals in their environment | based on their shape. | | In the context of a living being, different genes interact | with each other as well. For example, you have certain cells | that secrete hormones (many genes needed to do that), then | you have genes that encode for hormone receptors, and those | receptors trigger other actions encoded by other genes. | There's probably too much complexity to ask an AI system to | synthesize the entire genetic code for a living being. That | would be kind of like if I asked you to draw the exact | blueprints for a fighter get, and write all the code, and | synthesize all the hardware all at once, and you only get one | shot. You would likely fail to predict some of the | interactions and the resulting system wouldn't work. You | could only achieve this through an iterative process that | would involve years of extensive testing. | | Could you use a deep learning system to synthesize genetic | code? Maybe just single genes that do fairly basic things, | and you would need a massive dataset. Hard to say what that | would look like. Is it really enough to textually describe | what a gene does? | Jack000 wrote: | This is all true, but it doesn't preclude the possibility | of generating DNA. Human share a lot of DNA sequences with | other animals, and the genetic differences between | individual humans are even smaller. You might have trouble | generating a human with horns or something, but a taller | one is probably mostly an engineering problem. | | What GPT-3 and DALL-E shows is that you can infer a lot | based on the latent structure of data, even without | understanding the underlying physical process. | dekhn wrote: | probabilistic generative models have been applied to DNA and | protein sequences for decades (my undergrad thesis from ~30 | years ago did this and it wasn't even new at that point). The | real question is what question you want to answer and what is | this system going to do better enough to justify the time | investment to prove it out? | zone411 wrote: | Some more examples: | https://twitter.com/sama/status/1511724264629678084 | jdrc wrote: | there are some masterpieces there. this is the end of clipart | and stock images, and the beginning of awesome illustrations in | every article. | lalopalota wrote: | One step closer to combining Scribblenuats with emoticons! | gallerdude wrote: | This is extremely interesting. We've had some amazing AI models | come out in the past few days. We're getting closer and closer to | AI becoming a facet of everyday life. | turdnagel wrote: | I'm genuinely curious to hear Sam Altman's (and/or the OpenAI | team's) perspective on why these products need to be waitlisted. | If it's a compute issue, why not build a queuing system? If it's | something else (safety related? hype related?) I'd love to | understand the thinking behind the decision. More often than not, | I sign up for waitlists for things like this and either (1) never | get in to the beta or (2) forget about it when I eventually do | get in. | minimaxir wrote: | For GPT-3 it was a combination of both compute and safety. | Given the notes in the System Card (https://github.com/openai/d | alle-2-preview/blob/main/system-c... ), OpenAI is likely | doubling-down on safety here. | croddin wrote: | This reminds me of the holodeck in Star Trek. Someone could walk | into the Holodeck and say "make a table in the center of the | room. Make it look old." It seemed amazing to me that the | computer could make anything and customize it with voice. We are | pretty close to star trek technology now in computer ability | (ship's computer, not Commander Data). I guess to really be like | the holodeck it needs to be able to do 3d and be in real time but | that seems a lot closer now. It will be cool when this could be | in VR and we can say make an astronaut riding a horse, then we | can jump on the back of the horse and ride to a secret moon base. | [deleted] | jelliclesfarm wrote: | "Preventing Harmful Generations"? = Fail. | | Caravaggio is probably chortling from wherever he is .. | marcodiego wrote: | Cartoonists, say good-bye to your job. | criddell wrote: | Randall Munroe should quit now. Soon anybody will be able to | create XKCD-type comics. | Imnimo wrote: | Maybe one day there will a job for people who are masters of | the art of prompt hacking - they know all the special phrases | and terms to get Dall-E to output the most aesthetically | pleasing images. They guard their magic words like a medieval | alchemist guards his formulas. Corporations will pay top-dollar | for an expertly-crafted, custom-tailored prompt for their | advertising campaign. | rvz wrote: | NFTs using Dall-E 2 variations incoming. | loufe wrote: | Not that it's impossible to hide the provenance of an image, | but it is explicitly forbidden in the TOS of DALL-E to sell | the images as NFTs or otherwise. | atarian wrote: | That's just going to make them more valuable. | andybak wrote: | The goalposts are definitely being moved. But tastes adapt | accordingly. | | I suspect trends in design will move towards those areas that | AI struggles with (assuming there are any left!) | mouzogu wrote: | So what does the future of human creativity look like when an AI | can generate possibly infinite variations of an idea. | tomrod wrote: | I seem to recall an XKCD that I cannot find, but the premise | goes like: | | When you have a digital display of pixels, if you randomly | color pixels at 24 fps then you will eventually display every | movie that can be or will ever be made, powerset | notwithstanding. This can also be tied to digital audio. | | In short, while mind-blowingly large, the space of display | through digital means is finite. | mouzogu wrote: | Sounds a bit like the tower of babel of jorge borges. I | imagine most of the videos would be complete random nonsense. | | I think an AI infused future is going to become increasingly | more absurd and surreal, it will lead to a kind of creative | and cultural nihilism, if that's the right term. | | Like the value of originality will become meaningless. | visarga wrote: | The artist or the audience would have to ultimately select | something from all that automated originality. | 6gvONxR4sf7o wrote: | I expect that interactive art will be huge. Game design gets | fascinating, for example. | andreyk wrote: | AI becomes a tool for artists to use - generative art has been | around for a long time, now that particular genre of art will | presumably become much more prominent. | | For anyone pondering such questions, I would recommend reading | "The Past, Present, and Future of AI Art" - | https://thegradient.pub/the-past-present-and-future-of-ai-ar... | pingeroo wrote: | Wouldn't it be more like, "AI becomes an artist for people to | use"? Will we have people distinguished as "artists" if the | ability to make awesome art becomes available to everybody? | andreyk wrote: | AI still needs the text prompt to know what to generate. | Hence the human who provides the prompt is still the | artist, just like a photographer finds an aesthetically | interesting spot to take the image with their camera. | Cameras make images, humans using cameras make art. | Granted, this is not quite 1-1 with AI art, but still the | idea is the same. If anything the flood of AI images will | only require artists to go beyond what is possible with | these text->image kinds of things, of which there is no | shortage. | keiferski wrote: | I think you'll see more of a focus on the artist themselves. | These images are nice, but they have basically zero narrative | value. | | This is really already the case, actually. Most artworks have | "value" because they have a compelling narrative, not because | they look pretty. So I think we can expect future artists to | really emphasize their background, life story, process of | making the art, etc. All things that cannot be done by a | machine. | Apofis wrote: | So I can't do Teddy Bears Riding a Horse? | arecurrence wrote: | Is there a geometric model relative to this? EG: "corgi near the | fireplace" but the output is a 3d model of the corgi and | fireplace with shaders rather than an image. | Ftuuky wrote: | What jobs will be there in 5~10 years when we consider all the | progress done with Dall-E, GPT-3, Codex/GitHub Copilot, Alpha* | and so on? | phphphphp wrote: | Most creative output is duplicated effort: consider how much | code each person on HN has written that has been written | before. Consider how, a decade ago, we were all writing html | and styling it, element by element, and then Twitter bootstrap | came along and revolutionised front-end development in what is, | ultimately, a very small and low technology way. All it really | did was reduce duplicate effort. | | Nowadays there's lots of great low/no code platforms, like | Retool, that represent a far greater threat to the amount of | code that needs to be produced than AI ever will. | | To use a cliche: code is a bug, not a feature. Abstracting away | the need for code is the future, not having a machine churn out | the same code we need today. | beders wrote: | The ones undoing the damage caused by dumb pattern recognizers | and generators? ;) | 6gvONxR4sf7o wrote: | Things that require understanding of causation will be safe | longer. Progress like this is driven by massive datasets. | Meanwhile, real world action-taking applications require | different paradigms to take causation into account[0][1], and | especially to learn safely (e.g. learning to drive without | crashing during the beginner stages). | | There's certainly research happening around this, and RL in | games is a great test bed, but people choosing actions will | safe from automation longer than people not choosing actions, | if that makes sense. It's the person who decides "hire this | person" vs the person who decides "I'll use this particular | shade of gray." | | [0] The best example is when X causes Y and X also causes Z, | but your data only includes Y and Z. Without actually | manipulating Y, you can't see that Y doesn't cause Z, even if | it's a strong predictor. | | [1] Another example is the datasets. You need two different | labels depending on what happens if you take action A or B, | which you can't have simultaneously outside of simulations. | cm2012 wrote: | Sam's Twitter thread today was more impressive than the website. | | https://twitter.com/sama/status/1511724264629678084?s=20&t=6... | ordu wrote: | Dall-E 2 seems to be incapable to catch the essence of the art. | I'm not really surprised by it, I'd be surprised a lot if it | could. But nevertheless: if you looked in the eye of a Girl With | A Pearl Earring[1], you'd be forced to stop and to think what | does she have on her mind right now. Or may be you had some other | question in your mind, but it really stops people to think. But | none of Dall-E interpretations have this quality. Works inspired | by Girl With A Pearl Earring sometimes have at least part of that | power, like Girl With a Babmoo Earring[2]. But none of Dall-E | interpretations have such a power. | | And this observation may lead to a great consequences for visual | arts. I had a lot of joy of looking at different Dall-E | interpretations to find what the flaw of the interpretation that | forbids it to be a piece of art of an equal value to the | original. It is a ready made tool to search for explanations of | the Power of Art. It cannot say what detail make a picture to be | an artwork, but it allow to see multiple data points, and to | narrow the hypothesis space. My main conclusion is that the pearl | earring have nothing to do with the power of art. It is something | in the eye, and probably with the slightly opened mouth. (Somehow | Dall-E pictured all interpretations with closed lips, so it seems | to be an important thing, but I need more variation along this | axis to be sure). | | [1] https://en.wikipedia.org/wiki/Girl_with_a_Pearl_Earring [2] | https://yourartshop-noldenh.com/awol-erizku-girl-with-the-pe... | hcks wrote: | On the meta level, we are now at the point where the dubious | comments downplaying the AI starts arguing on the plane of art | criticism | jdrc wrote: | art criticism should be off topic here. This is more like | chopping off the visual cortex and some association cortex from | a brain and stimulating it. there is no person signaling to us, | nor can we attribute any striking images that may come up to a | person with agency. | | But its like a giant database of decent clipart for anything we | can imagine | ordu wrote: | _> This is more like chopping off the visual cortex and some | association cortex from a brain and stimulating it._ | | We do not know exactly what part of our perception of reality | can be attributed to "the visual cortex and some association | cortex". But now we can feel it. We can test it. We can | compare ourselves with the cold calculating machine. I | believe that it is a priceless opportunity that we shouldn't | miss. At least I personally can't. I'm going to figure out is | it possible to me to have such a companion as Dall-E in mine | wanderings in a sea of information in Internet, and if it is, | then to get one. | | _> But its like a giant database of decent clipart for | anything we can imagine_ | | And this also. Yes. Though I'm not interested in clipart. | joshcryer wrote: | What do you think of the third to last image of the Girl With A | Pearl Earring that DALL-E 2 created? I find it more compelling | than the original with how her face is deeply cast in shadow. | There's still that original 'essence' of the glint in her eye. | But her earring is a bell. As if the AI is sending a message | that what if the bell were to ring? | ordu wrote: | I'm not sure, that I can express myself in English, which is | not my native language, and this needs some very nuanced | control over tiniest shades of meaning, but I'll try | nevertheless, just for fun of it at least. | | The original girl is more open, more independent and | mindless. The interpretation's girl is more self-controlled, | assertive and not interested really, just going throw all | those movements of regular communication between people. | Maybe it's just me, but what I really value on such occasions | is mindlessness, the ability of people to not mind | themselves, to let their selves to dissolve in the | environment. I cannot keep tears in my eyes sometimes when I | watch some entertainer playing Chopin or Paganini, because | what I see in their movements is complete dissolution of a | person in a piece of music, in a piece of art and skill. An | entertainer just do what they do with their full attention on | it, and with all their motivation focused on it. There is | nothing here for them, just them and their actions. | | There is not a single thought devoted to how people around me | would react to what I do and how I do that. I just do what I | do and I do not care about people around me, and if it | somehow makes people happy... I don't care really. I mean I | know that afterwards I'd feel a pride of myself, but just for | now I don't really care. | | I know this feeling. I like to sing, and I'm good at it | (above average), and I know what it feels like to dissolve | into the song and to let song to rule. I play piano and I | know what it is like to dissolve into the piece I'm playing, | to stop myself from existing, to let music to take the lead. | And the original painting make me believe that the girl is in | this state of mind. I do not know the history or the | remaining of the story, I do not know if she get into this | state for a second, of she never leaves it (it may be a sad | experience, don't you think?), but somehow I know that right | now she is right in this state. I want to watch this her | moment for an eternity. | | Thinking about it, I'd confess that Interpretation Girl does | trigger the same, but on a smaller scale. I feel how my mind | is trying to find a coherent state to her gaze, but this | feeling stops in tens of microseconds, not hundreds of them. | | edit: want->watch. Stupid mistake ruining the meaning of the | sentence. | [deleted] | Veedrac wrote: | Initial Outputs from New AI Model Not As Good at Nuance as | Historic Artwork, Approach Deemed Hopeless | ordu wrote: | Oh... Not hopeless. The very fact that I spent some minutes | watching Interpretations of Girl With a Pearl Earring, is the | enough evidence that it is not hopeless. I praise the work | that was done. Moreover I hoped that people would get it as | an inspiration to do even more. | awinter-py wrote: | They're using training set restriction and prompt engineering to | control its output | | > By removing the most explicit content from the training data, | we minimized DALL*E 2's exposure to these concepts | | > We won't generate images if our filters identify text prompts | and image uploads that may violate our policies | | The 'how to prevent superintelligences from eating us' crowd | should be taking note: this may be how we regulate creatures | larger than ourselves in the future | | And even how we regulate the ethics of non-conscious group minds | like big companies | 6gvONxR4sf7o wrote: | This is a niche complaint, but I get frustrated at how imprecise | open AI's papers are. When they describe the model architecture, | it's never precise enough to reproduce exactly what they did. I | mean, it pretty much never is in ML papers[0], but open AI's | bigger products are worse than average with it. And it makes | sense, since they're trying to be concise and still spend time on | all the other important stuff besides methods, but it still | frustrates me quite a bit. | | [0] Which is why releasing your code is so beneficial. | greyhair wrote: | Interesting, yes, but I went to the link, and browsed the | 'generated artwork' and all if it was subjectively inferior to | the original that it generated from. Every single piece. So I am | not sure what the 'value' in it is, at this stage. | | As far as the text driven, I would have to mess with some non | pre-canned presentations to see how useful it was. | krick wrote: | Regardless of how much cherry-picking there was, some of these | pictures are just beautiful. | jedberg wrote: | This reminds me of a discussion I had with the high school band | teacher in the 90s. I was telling him that one day computers | would play music and you won't be able to tell the difference. He | got mad at me and told me that a computer could never play as | well as a human with feelings, who can _feel_ the piece and | interpret it. | | I think we passed that point a while ago, but seeing this makes | me think we aren't too far off from computers composing pieces | that actually sound good too. | andybak wrote: | Some freely available models | | GLID-3: | https://colab.research.google.com/drive/1x4p2PokZ3XznBn35Q5B... | | and a new Latent Diffusion notebook: | https://colab.research.google.com/github/multimodalart/laten... | | have both appeared recently and are getting remarkably close to | the original Dall-E (maybe better as I can't test the real | thing...) | | So - this was pretty good timing if OpenAI want to appear to be | ahead of the pack. Of course I'd always pick a model I can | actually use over a better one I'm not allowed to... | Jack000 wrote: | With glide I think we've reached something of a plateau in | terms of architecture on the "text to image generator S curve". | DALL-E-2 is a very similar architecture to glide and has some | notable downsides (poorer language understanding) | | glid-3 is a relatively small model trained by a single guy on | his workstation (aka me) so it's not going to be as good. It's | also not fully baked yet so ymmv, although it really depends on | the prompt. The new latent diffusion model is really amazing | though and is much closer to DALLE-2 for 256px images. | | I think the open source community will rapidly catch up with | Openai in the coming months. The data, code and compute are all | there to train a model of similar size and quality. | andybak wrote: | Wow. Thanks for GLID-3. It was genuinely exciting for a few | days but then I must admit latent diffusion stole my | attention somewhat ;-) | | What kind of prompts is GLID-3 especially good for? I | remember getting lucky when I was playing around a few times | but I didn't do it systematically. | Jack000 wrote: | glid-3 is trained specifically on photographic-style | images, and is a bit better at generalization compared to | the latent diffusion model. | | eg. prompt: half human half Eiffel tower. A human Eiffel | tower hybrid (I get mostly normal Eiffel towers from LDM | but some sensical results from glid-3) | | glid-3 will be worse for things that require detailed | recall, like a specific person. | | With smaller models you kind of have to generate a lot of | samples and pick out the best ones. | loufe wrote: | I think this is really neat, but definitely not on the same | tier as DALL-E 2, at least from the cherry-picked images I saw. | andybak wrote: | I'm not sure what you've seen but I've been very impressed | indeed by some results I've obtained. Some less so. | | It's hard to compare because we don't know how much cherry | picking is going on with published Dall-E results (either v1 | or v2) | | My gut feeling is that it's in the same ballpark as Dall-E 1 | hwers wrote: | They're also not censored on the dataset front and thus produce | much more interesting outputs. | | OpenAI has a low resolution checkpoint for similar | functionality as this - called GLIDE - and the output is super | boring compared to community driven efforts, in large part | because of similar dataset restrictions as this likely has been | subjected to. | FreeHugs wrote: | How do you run such a Google Colab thing? | | I don't see a run button? | | On.. maybe "Runtime -> Run All" from the menu ... | | Shows me a spinning circle around "Download model" ... | | 26% ... | | Fascinating, that Google offers you a computer in the cloud for | free .. | | Now it is running the model. Wow, I'm curious .. | | Ha, it worked! | | Nothing compared to the images in the Dall-E 2 article but | still impressive. | minimaxir wrote: | Google is a company with a lot of spare VMs and GPUs. | | However, the free GPU is now a K80 which is obsolete and | barely sufficient for running these types of models. | nl wrote: | You sometimes still get T4s. I got one last week and it was | great. | qualudeheart wrote: | Deep Learning plows through yet another wall. | kovek wrote: | One of my teachers once said "An art piece is never done". So, I | wonder what could that mean for the model to keep making | improvements to the piece. | chronolitus wrote: | IIRC that's how it works! it starts from a first image, and | improves it until 'satisfied' that the result fits the prompt | bakztfuture wrote: | I made a YouTube series last summer on the massive potential | future of DALL-E and multimodal AI models. | | Imagine not just DALL-E 2 but a single model which be trained on | different kinds of media and generate music, images, video and | more. | | The series talks about: | | - essential lessons for AI creatives of the future | | - shares details on how to compete creatively in the future | | - talks about how to make money through Multimodal AI | | - make predictions about AI's effects on society | | - at a very basic level, discusses the ethics of multimodal AI | and the philosophy of creativity itself | | By my understanding, it's the most comprehensive set of videos on | this topic. | | The series is free to watch entirely on YouTube: GPT-X, DALL-E, | and our Multimodal Future | https://www.youtube.com/playlist?list=PLza3gaByGSXjUCtIuv2x9... | rvz wrote: | At this point with WaveNet, GPT-3, Codex, DeepFakes and Dall-E 2, | you cannot believe anything you see, hear, watch, read on the | internet anymore as an AI can easily generate nearly anything | that can be quickly believable by millions. | | The internet's own proverb has never been more important to keep | in mind. A dose of skepticism is a must. | aChrisSmith wrote: | I can see how this has the potential to disrupt the games | industry. If you work on a AAA title, there is a small army of | artists making 19 different types of leather armor. Or 87 images | of car hubcaps. | | Using something like this could really help automate or at least | kickstart the more mundane parts of content creation. (At least | when you are using high resolution, true color imagery.) | killerstorm wrote: | This thing can't do 3D models. | | There are some 3D image generation techniques, but they aren't | based on polygonal modelings, so 3D artists are safe for now | pwillia7 wrote: | You could train a model on texture image data though, no? | | Or what about even generating images you could then | photogrammetry into models? | rndphs wrote: | This is going to be mostly a rant on OpenAI's "safer than thou" | approach to safety, but let me start with that I think this | technology I think is really cool, amazing, powerful stuff. | Dall-E (and Dall-E 2) is an incredible advance over GANs, and no | doubt will have many positive applications. It's simply | brilliant. I am someone who has been interested in and has | followed the progress of ML generated images for nearly a decade. | Almost unimaginable progress has been made in the last five years | in this field. | | Now the rant: | | I think if OpenAI genuinely cared about the ethical consequences | of the technology, they would realise that any algorithm they | release will be replicated in implementation by other people | within some short period of time (a year or two). At that point, | the cat is out of the bag and there is nothing they can do to | prevent abuse. So really all they are doing is delaying abuse, | and in no way stopping it. | | I think their strong "safety" stance has three functions: | | 1. Legal protection 2. PR 3. Keeping their researchers' | consciences clear | | I think number 3 is dangerous because researchers are put under | the false belief that their technology can or will be made safe. | This way they can continue to harness bright minds that no doubt | have ethical leanings to create things that they otherwise | wouldn't have. | | I think OpenAI are trying to have the cake and eat it too. They | are accelerating the development of potentially very destructive | algorithms (and profiting from it in the process!), while trying | to absolve themselves of the responsibility. Putting bandaids on | a tumour is not going to matter in the long run. I'm not | necessarily saying that these algorithms will be widely | destructive, but they certainly have the potential to be. | | The safety approach of OpenAI ultimately boils down to | gatekeeping compute power. This is just gatekeeping via capital. | Anyone with sufficient _money_ can replicate their models easily | and bypass _every single one_ of their safety constraints. | Basically they are only preventing _poor_ bad actors, and only | for a limited time at that. | | These models cannot be made safe as long as they are replicable. | | To produce scientific research requires making your results | replicable. | | Therefore, there is no ability to develop abusable technology in | a safe way. As a researcher, you will have blood on your hands if | things go wrong. | | If you choose to continue research knowing this, that is your | decision. But don't pretend that you can make the _algorithms_ | safer by sanitizing models. | duren wrote: | I've been playing around with it today and have been super | impressed with its ability to generate pretty artful digital | paintings. Could have big implications for designers and artists | if and when they allow you use custom palettes, etc. | | Here's an example from my prompt ("a group of farmers picking | lettuce in a field digital painting"): | https://labs.openai.com/s/jb5pzIdTjS3AkMvmAlx69t7G | pingeroo wrote: | Neat! Were you part of the initial testing batch or granted | access via waitlist? | d--b wrote: | Am I the only one to think that the AI world is divided into 2 | groups: | | 1. Deepmind, who solved go, protein folding, and that seems | really onto something. | | 2. Everyone else, spending billions to build machines that draw | astronauts on unicorns, and smartish bot toys. | gwf wrote: | Your second group represents the core "inner loop" of about a | thousand revolutionary applications. Take the basic capability | of translating image->text->speech (and the reverse), install | it on a wearable device that can "see" an environment, and add | domain-specific agents. From this setup, you're not too far | away from having an AI that can whisper guidance into your ear | like a co-pilot, enabling scenarios like: | | 1. step-by-step guidance for a blind person navigating the use | of a public restroom. | | 2. an EMS AI helping you to save someone's life in an | emergency. | | 3. an AI coach that can teach you a new sport or activity. | | 4. an omnipresent domain-expert that can show you how to make a | gourmet meal, repair an engine, or perform a traditional tea | ceremony. | | 5. a personal assistant that can anticipate your information | need (what's that person's name? where's the exit? who's the | most interesting person here? etc.) and whisper the answer in | your ear just as you need it. | | Now, add all of the above to an AR capability where you can now | think or speak of something interesting and complex, and have | it visualized right before your eyes. With this capability, I | could augment my imagination with almost super-human | capabilities that allow one to solve complex problems almost as | if it was an internal mental monologue. | | All of these scenarios are just a short hop from where were at | now, so mark my words: we will have "borgs" like those | described above long before we reach anything like general AI. | lkbm wrote: | These are good examples of what we're getting close to, but | I'd add that Copilot is already an extremely helpful tool for | coding. I don't blindly trust its output, but its suggestions | are what I want often enough to save a lot of typing. | | I still have to do all the hard thinking, but once I figure | out what I want written and start typing, Copilot will spit | out a good portion of the contextually-obvious lines of code. | robotresearcher wrote: | There's a third group for your list: AI stuff that's so good we | don't think about it any more. | | For example, recent phone cameras can estimate depth per pixel | from single images. Hundreds of millions of these devices are | deployed. A decade ago this was AI/CV research lab stuff. | emadabdulrahim wrote: | OpenAI is one of the leading companies in AI that makes models | with real world applications. I don't see their efforts as | misdirected or futile in anyway. If anything I'm always | impressed with their announcements because it's always mind | blowing what their models can do! | | The same technology that is drawing cute unicorns can be used | for endless other use cases. Perhaps the PR side of the launch | and the subject matter they show unveil their product is just | that, PR. | | It's like Apple Memoji thing (not sure if I'm spelling it | correctly). You can think of as trivial and waste of talent to | use their Camera/FaceID to animate cute animals based on facial | expression, but that same tech will enable lots other things to | come. | trixie_ wrote: | It all feels like the early days of electricity. How to turn a | neat party trick into something more useful, but it was those | people who kept on at better and better party tricks that | actually formed the foundations for what was needed to do some | really useful things electricity as well as understand it at a | deeper level. | _nateraw wrote: | If you're interested in generative models, Hugging Face is | putting on an event around generative models right now called the | HugGAN sprint, where they're giving away free access to compute | to train models like this. | | You can join it by following the steps in the guide here: | https://github.com/huggingface/community-events/tree/main/hu... | | There will also be talks from awesome folks at EleutherAI, | Google, and Deepmind | eganist wrote: | The timing of the Dall-E 2 launch an hour ago seems to correspond | with a recent piece of investigative journalism by Buzzfeed News | about one of Sam Altman's other ventures, published 15 hours ago | and discussed elsewhere actively on HN right now: | | https://news.ycombinator.com/item?id=30931614 | | I point this out because while Dall-E 2 seems interesting (I'm | out of my depth, so delegating to the conversation taking place | here), the timing of its release as well as accompanying press | blasts within the last hour from sites like TheVerge--verified | via wayback machine queries and time-restricted googling--seems | both noteworthy and worth a deeper conversation given what was | just published about Worldcoin. | | To be clear, it's worth asking if Dall-E 2 was published ahead of | schedule without an actual product release (only a waitlist) to | potentially move the spotlight away from Worldcoin. | duxup wrote: | What's the idea here? They quickly put out this to somehow hide | other stories? | eganist wrote: | Yes, especially given there's no actual product release, only | a waitlist. | | Easy to put together a marketing piece on short notice or | potentially even push a pending marketing page out to | production with a waitlist rather than links to production or | even beta quality services. | dang wrote: | I don't have any knowledge (inside or otherwise) but the | Worldcoin thing already came in for several rounds of abuse on | HN, so it's kind of a scandal of the second freshness at this | point. | | I listed some of them here - | https://news.ycombinator.com/item?id=30934732, just because I | remembered there had been previous discussions and listing | related previous discussions is a thing. | gallerdude wrote: | Maybe I'm naive, but I see this as a coincidence. If it was an | hour later, then maybe there would be something. | eganist wrote: | Another consideration, then: it was published to HN almost | instantly after it was released to the world, 52 minutes | after the HN post about Worldcoin was submitted and started | showing traction. | | I don't see the publication of a marketing page (again, not a | finished product) for a product founded by someone who's | other main venture is being investigated by journalists for | misleading claims as being a coincidence, but if the timing | matters and 14-15 hours doesn't seem like it works for the | assertion in your mind, then perhaps the Dall-E 2 page going | live less than an hour after the Worldcoin HN submission fits | the bill. | | I've got no horse in this race. I'm just drawing attention to | familiar PR strategies used for brand risk mitigation, that's | all. | GranPC wrote: | If the article GP refers to was posted 16 hours ago instead | of 15, would that really make a difference? | danso wrote: | I'm not a huge fan of these coordination theories. But a few | things worth noting: | | - In support of your argument, the Buzzfeed News investigation | likely has been in the works for weeks, meaning Altman et al | have had more than just a couple days to throw together a | Dall-E 2 soft launch | | - However, weren't OpenAI's GPT (2 and 3) announced to the | world in similar fashion? e.g. demos and whitepapers and | waitlists, but not a full product release? | | - Throwing together a Dall-E 2 soft launch just in time to | distract from the investigation would require a conspiracy, | i.e. several people being at least vaguely aware that deadlines | have been accelerated for external reasons. Is the Worldcoin | story big enough to risk tainting OpenAI, which seems like a | much more prominent part of Altman's portfolio? | eganist wrote: | For discussion's sake: | | - BFN reached out to A16Z, Worldcoin, Khosla Ventures largely | declined to comment, which would mean that at least one | person probably had a bit of runway from at least when the | requests for comment were submitted. So yeah, you're probably | right. | | - Going from the github repos for GPT 2 and 3, those may have | been hard launches: | | Feb 14 2019, predating the first press for GPT-2 by a few | days (was probably made public Feb 14 though) - https://githu | b.com/openai/gpt-2/commit/c2dae27c1029770cea409... | | May 28 2020, timed alongside the press news for GPT-3 - https | ://github.com/openai/gpt-3/commit/12766ba31aa6de490226e... | | - Would it really have to be a conspiracy? Sounds like only | one person would have to target a specific date or date | range, and without really giving a reason. | | One of the things that puts a hole in my own thinking here is | that Sam Altman's name isn't really tied to the Dall-E 2 | release. It's just OpenAI, and the press around Sam's name | _today_ still exclusively surfaces just this one Worldcoin | story | (https://news.google.com/search?q=sam+altman+when%3A1d&). So | if this was actually intended to bury another story, Sam's | name would have to have been included in all the press blasts | to be successful. But the Buzzfeed story seems like it kinda | died alone on the vine. | nonfamous wrote: | Genuine question: how are the two stories even related? It's | certainly not apparent from the BuzzFeed article (or at least a | quick skim of it). | eganist wrote: | Sam Altman is OpenAI's CEO. | | What I'm submitting for consideration is that the marketing | page and associated press blasts (there's a live influencer | reaction video airing right now about Dall-E 2, for instance) | for Dall-E 2 were potentially pushed up to offset negative | press from Worldcoin for their shared founder. | | I'd like to be wrong. But it's too well timed. | thisistheend123 wrote: | This is what magic looks like. | | Great work. | | Looking forward to when they start creating movies from scripts. | Dig1t wrote: | Most of the conversation around this model seems to be about its | direct uses. | | This seems to me like a big step towards AGI; a key component of | consciousness seems (in my opinion) to be the ability to take | words and create a mental picture of what's being described. Is | that the long term goal WRT researching a model like this? | latexr wrote: | What confusing pricing[1]: | | > Prices are per 1,000 tokens. You can think of tokens as pieces | of words, where 1,000 tokens is about 750 words. This paragraph | is 35 tokens. | | Further down, in the FAQ[2]: | | > For English text, 1 token is approximately 4 characters or 0.75 | words. As a point of reference, the collected works of | Shakespeare are about 900,000 words or 1.2M tokens. | | > To learn more about how tokens work and estimate your usage... | | > Experiment with our interactive Tokenizer tool. | | And it goes on. When most questions in your FAQ are about | understanding pricing--to the point you need to offer a | specialised tool--perhaps consider a different model? | | [1]: https://openai.com/api/pricing/ | | [2]: https://openai.com/api/pricing/#faq-token | pingeroo wrote: | This is for their GPT models, not Dall-E. I don't think they | have released any pricing information for Dall-E yet, as it is | still in waitlist mode. | belval wrote: | Haven't read the paper, but they are probably using something | like sentencepiece with sub-word splitting and then charge by | the number of resulting token. | | https://github.com/google/sentencepiece | hwers wrote: | The correct response here from the artists point of view should | be a widespread coming together against their art being used as | training data for ML models. With a quickly spread new license on | most major art submission sites that explicitly forbids AI | algorithms from using their work, artists would effectively | starve OpenAI and others from using their own works to put them | out of a job. | w-m wrote: | The license should forbid competing artists to using the | artist's work as well. In fact, no human should come in contact | with the produced art, otherwise they might be accidentally | inspired by it, thus stealing from the original creator. | smusamashah wrote: | This is mind blowing. I was not expecting the sketch style images | to actually look like sketches. Style transfer based sketches | never look like sketches. | | This and the current AI generated art scene makes it looks like | that artwork is now a "solved" problem. See AI generated art on | twitter etc. | | There is a strong relation between the prompt and the generated | images but just like GPT-3, it fails to fully understand what was | being asked. If you take the prompt out of the equation and see | the generated artwork on its own, its upto your interpretation | just like any artwork. | andreyk wrote: | I would caution that artwork is only 'solved' with relatively | simple text prompts. To create a novel painting with a precise | mix of elements that would take a paragraph or more to explain | is still tough, though DALL-E 2 does seem like a big step | towards that. | nahuel0x wrote: | Also note you can make an image out of many spatially | localized prompts combined, in an iterative IA-human process. | sillysaurusx wrote: | Sam seems to be demoing something fairly close on twitter. | https://twitter.com/sama/status/1511724264629678084 | | The solar powered ship with a propeller sailing under the | golden gate bridge during sunset with dolphins jumping around | was pretty impressive. | https://twitter.com/sama/status/1511731259319349251 | | I think it's only missing the dolphins. ___________________________________________________________________ (page generated 2022-04-06 23:00 UTC)