[HN Gopher] How DALL-E 2 Works ___________________________________________________________________ How DALL-E 2 Works Author : SleekEagle Score : 183 points Date : 2022-04-19 15:19 UTC (7 hours ago) (HTM) web link (www.assemblyai.com) (TXT) w3m dump (www.assemblyai.com) | MengerSponge wrote: | How does DALL-E 2 handle stereotypes? For example, what kind of | output would you see for: | | > A person being shot by a police officer | | > A scientist emptying a dishwasher | | > A nurse driving a minivan | | AI training sets are famously biased, and I'm curious how | egregious the outputs are... | axg11 wrote: | The authors of the Dall-E 2 / unCLIP paper describe some of | their efforts to mitigate biases in the paper. ML models will | always exhibit the biases present in their training dataset, | without intervention. It's not really possible to remove bias | from an ML model, at least not completely. Some stereotypes, | but not all, are backed up by statistics. In those cases, | should we completely remove the bias in the training dataset? | Doing so would bias the model towards outputs that are not | representative of the real world. | | When people say that they want to remove bias from ML models, | what they really mean is that they want to manipulate the | output distribution into something they deem acceptable. I'm | not arguing against this practise, there are plenty of | situations where the output of an ML model is very clearly | biased towards specific classes/samples. I'm merely arguing | that there is no such thing as an unbiased model, just as there | is no such thing as an unbiased human. Unbiased models would | produce no output. | | To get around some of these problems OpenAI restricted the | training dataset (e.g. filtering sexual and violent content) | and also prevent generating images with recognizable faces. | This doesn't prevent bias but it does reduce the number of | controversial outputs. | blamazon wrote: | One way to dodge this and other issues related to depiction of | human bodies is to trim the dataset such that humans are not | generally recognizable as realistic humans in the output. It is | also currently explicitly forbidden by OpenAI to share publicly | realistic images of human faces generated by DE2. | | Via LessWrong.com: [1] | | > _" One place where DE2 clearly falls down is in generating | people. I generated an image for [four people playing poker in | a dark room, with the table brightly lit by an ornate | chandelier], and people didn't look human -- more like the | typical GAN-style images where you can see the concept but the | details are all wrong. | | >Update: image removed because the guidelines specifically call | out not sharing realistic human faces. | | >Anything involving people, small defined objects, and so on, | looks much more like the previous systems in this area. You can | tell that it has all the concepts, but can't translate them | into something realistic. | | >This could be deliberate, for safety reasons -- realistic | images of people are much more open to abuse than other things. | Porn, deep fakes, violence, and so on are much more worrisome | with people. They also mentioned that they scrubbed out lots of | bad stuff from the training data; possibly one way they did | that was removing most images with people. | | >Things look much better with animals, and better again with an | artistic style."_ | | [1]: https://www.lesswrong.com/posts/r99tazGiLgzqFX7ka/playing- | wi... | [deleted] | tiborsaas wrote: | I guess we will figure that out quite soon, but does it matter | that much? Your only job with DALL-E 2 is to prompt it properly | so if you want a female scientist, just say it so. If it comes | up with the "wrong" gender or ethnicity, then it takes a second | to fix it, which would probably take a bit less than ranting | about it on Twitter :) | snovv_crash wrote: | It will, being a deterministic machine, generate any kind of | wrongthink that is in its training data. Ironically, all of the | media coverage of negative stereotypes by well intentioned | activists probably even makes it more likely to generate this | kind of data. | radu_floricica wrote: | I can't think of a way that would "fix" this that wouldn't also | make it less useful overall. If people are looking for people | being shot by police officers, they probably already have those | stereotypes and thus expectations of the end product. You can | argue that you want to insert a certain morality set in the | process, but that to me sounds a hell of a lot scarier than the | scientist emptying the dishwasher being a women in 60% of the | pictures. Once you have the mechanism for morality bias, you | also have people with the capacity to change the settings. | SleekEagle wrote: | Great questions! I'd also be interested in this. I supposed the | generations would mimic the general distribution of information | that is on the internet, but what that would look like | specifically is hard to say without OpenAI releasing more | information. | achr2 wrote: | Over the next decade, ML advancements will erode the monetary | value of _countless_ professions. Hopefully AI research will be | turned towards solving the problems of society | /economics/civilization before it is too late to avoid major | disruptions in human wellbeing. | [deleted] | mmastrac wrote: | Are we finally past the AI winter? We seem to be seeing major | advances at least once a year. I recall there was a bit of a lull | after GPT3, but clearly the boundaries of AI are expanding | ridiculously fast. | mellosouls wrote: | In some ways (narrow AI), yes, its been a fantastic few years | including tools like the one in context. | | In the important way that the AI winter originally referred to | though, no, there doesn't seem to have been any progress | towards AGI. | SleekEagle wrote: | Was the original AI winter with reference to AGI? I thought | it was in reference to the resulting lack of research and | interest after the "bubble bust". If we're not close to AGI | now I can't imagine researchers 40 years ago really thought | AGI was around the corner, right? Just curious, I'm not an | expert on the history of ML! | mellosouls wrote: | I think there have been several really, and they tend to | follow hype periods, which over-promise. | | I do think the last few years have been more productive | than previous periods in advancing narrow AI, and to be | fair to those researchers who just get on with the work, it | is not on them if the advances are over-sold by others. | visarga wrote: | > If we're not close to AGI now | | I bet we're closer than most people think. Instruct GPT-3 | can do semantic tasks just as efficiently as DALL-E 2 can | draw. NLP tasks that took whole teams multiple years can be | simply described in a few words and they work right away. | | The entry barrier to implement new tasks will get very low. | The large models will be the new operating system. This | means more investments and data, leading to new | improvements. | | I believe GPT-3 is already close to median human level on | most semantic tasks that fit in a 4000 token window. I'm | researching how to use it right now for a variety of tasks, | it just works from plain text requirements with no | training. | redredrobot wrote: | There has not been an AI winter in at least a decade, arguably | more. | Tossrock wrote: | Indeed, and I called it 7 years ago: | https://news.ycombinator.com/item?id=9882217 | Polygator wrote: | I guess he's referring to the fact that the glut of | investment around 2017-2018 was followed by disappointment | due to startups overpromising. I agree that from the | technical side (I mostly follow NLP, might be different in | other subfields) there's been no hint of a winter. | jollybean wrote: | At least the big public display of this tech seems to me that | it's mostly merging photos in interesting ways. That the 'seed' | comes from a word is not hugely interesting to me. | | I'm actually more curious if we could parse the underlying | logic that ultimately it emulates to merge those images | together. | | It 'looks like' something kind of sophisticated is being | modelled with AI but there's some nice algorithms hidden in | there. | ma2rten wrote: | It doesn't merge images. It generates them from scratch. Sure | it's trained on a corpus of existing images, but I don't | think it ,,merges" them any more than human artists do with | images they have seen in their lifetime. | jollybean wrote: | I don't believe 'creating them' is the write word. | 'Merging' them is probably a bad choice of words on my | part. | | More like 'averaging them' and finding variations from vast | inputs. | | Which is more a long the lines of what I mean. | SleekEagle wrote: | Luckily the use of Transformer models makes what's going on | under the hood a bit more interpretable, but I think the | fundamental part at which ideas are merged is translating | from CLIP text embeddings to CLIP image embeddings. | | The training principle of CLIP is very simple, but | intuitively understanding how the diffusion prior maps | between semantically similar textual and visual | representations is a bit more unclear (if that's even a well- | formulated question!) | alar44 wrote: | Well that's a absolutely not what's happening. It seems like | you haven't done any reading in this space, so I'm not even | sure what to link for you. | jollybean wrote: | 'Merging' was a poor choice of words on my part, but I'm | aware of what it does. | SleekEagle wrote: | GPT-3 was released 2 years ago, and in that time CLIP, GLIDE, | and DALL-Es 1 and 2 have been released. All of this is just | from OpenAI too! DL research is cranking along as quickly as | ever imo! | lurker619 wrote: | Just need a music one please. | gwern wrote: | Jukebox. If you listen to Jukebox samples, recall that that | was quite a while ago in dog/DL years, and imagine what the | DALL-E 2 equivalent would be for a Jukebox 2... | p1esk wrote: | I'm surprised no one has tried to launch a music | generation startup based on Jukebox. I'd be interested in | collaboration if anyone wants to work on it (and has | compute resources). | Der_Einzige wrote: | I resent this notion that AI doesn't advance if we aren't | making new larger and larger foundation models. | | Even during that lull between GPT3 and DALL-E/CLIP, there was | tons of truly wonderful advances in AI... | nsxwolf wrote: | I'll just never understand how any of this works. I know it is | trained on millions of existing images, but when you say "... a | bowl ..." in your prompt, how does it decide what the bowl should | look like? Does it pick one of the bowls it's seen at random? It | doesn't ever quite draw the same bowl twice, does it? Is it | somehow "imagining" a bowl, the way a human would, and some all | new image of a bowl pops into its "head"? | simonw wrote: | The trick is to start with random "gaussian noise" - something | like https://opendatascience.com/wp- | content/uploads/2017/03/noise... - and then iteratively modify | that image until it starts to look like the concept you want it | to look like. | | I find the concept of a GAN - a Generative Adversarial Network | - useful. | | My high-level attempt at explaining how those work is that you | create two machine learning models, one that tries to create | fake images and one that tries to see if an image is fake or | not. | | The first one says "here's an image", the second one says | "that's a fake", the first one learns from that and tries | again, then keep going until an image scores highly on the | test. | | The networks are adversarial because they are trying to outwit | each other. | | (I'm sure a ML researcher could provide a better explanation | than I can, but that's the way I think about it.) | goodside wrote: | I don't believe Dall-E 2 incorporates GANs at all, but I | haven't read the paper in detail. GANs were the best text-to- | image models maybe a year ago but lately diffusion techniques | are taking over. | simonw wrote: | Thanks for the keyword hint - this explanation looks good | for diffusion models: | https://ai.googleblog.com/2021/07/high-fidelity-image- | genera... | astrange wrote: | It's not trained on labeled data so it doesn't know bowls are a | specific concept necessarily. It's all statistical similarity | in the same way Google Image Search works. (from the original | CLIP paper, it seems to think an apple and the word "apple" | written on a piece of paper are the same thing) | | The model in step 3 produces an image encoding (something like | a sketch of the output) from a text encoding (something like | what you typed), and the unCLIP model in step 2 produces images | from that encoding. How much variation you get inside a | specific input word varies a lot and is spread across those | models. | SleekEagle wrote: | If you have a bit of background in math, I would encourage you | to read the CLIP paper: https://arxiv.org/abs/2103.00020 | | Ultimately, the link between words and their representations | comes from the CLIP training. The model generates encodings | (vectors) for both an image and its corresponding caption, and | then the parameters of these encoders (the functions that | generate the vectors) are tuned in order to minimize the angle | between the textual and visual encodings that represent the | same concept. | | The core of your question is why minimizing the angle between | like vectors is equivalent to learning what the "Platonic | ideal" of a given object (in your example, a bowl) is, whether | appearing as a textual representation or a visual one. This | question is subtle and difficult to answer (if it's even a | well-formulated question), but I'd say that the easiest | interpretation is that the vector space is composed of a basis | of vectors that each represent a distinct feature (which the | model learns). | oofbey wrote: | One thing I find really interesting about the DALL-E-2 is that | the popular blog name ("DALL-E-2") never shows up in either of | the research papers that describe it. The paper commonly referred | to as DALL-E-2 calls its own algorithm "unCLIP". UnCLIP is | _heavily_ based on a paper from a few months earlier called GLIDE | - in fact you can't really understand the unCLIP paper by reading | it without first reading the GLIDE paper. | | I suspect what's going on is that OpenAI has decoupled their PR | activities from their science activities. They told the | researchers to publish papers when they're ready, and then the PR | apparatus decides when one is good enough to be crowned | "DALL-E-2" and writes a blog post about it. | axg11 wrote: | This is only surprising if you're not familiar with product | launches that result from R&D. In this case DALL-E 2 is the | consumer-facing name, unCLIP is the name used during research | and in this case for publication. OpenAI may also have a | further internal codename that they used for the project. | Currently DALL-E 2 access is limited but there are lots of | reasons to believe that OpenAI will try to productize Dall-E 2 | as an API. If you're selling a product, you need a product | name. | FrenchDevRemote wrote: | Maybe it's because there is more features on the openai | websites? for exemple with GPT, you get different models, | different templates, a playground, an api etc... | KaoruAoiShiho wrote: | I don't get it, isn't this how literally every product launch | works. | radicaldreamer wrote: | I don't know why you're being voted down, internal/research | names are often way weirder and decided on ad-hoc by the | researchers themselves and then a good PM comes in when | product-ionizing and part of this is deciding on a catchy | name for public use. | oofbey wrote: | The phrasing "I don't get it" is fairly rude - it implies | the post is obvious to the point of not being worth | mentioning. However obvious this might seem to somebody, I | would point out that turning AI research papers turn into | products is hardly commonplace. | SleekEagle wrote: | I noticed that as well! It confused me a bit at first. They say | that their "image generation stack" is referred to as unCLIP, | and I was trying to figure out how it's distinct from DALL-E 2 | at first! | | My only guess would be that unCLIP is the end-to-end image | generation model, but if the model is used for manipulation, | interpolation, or variations, then it is referred to as DALL-E | 2. So unCLIP is a subset of DALL-E 2. | tmabraham wrote: | This behavior is not exclusive to OpenAI. NVIDIA did this too. | Originally StyleGAN3 was published under the name "Alias-free | GAN" and the paper itself uses that terminology. | ShannonLimiter wrote: | DALL-E is mentioned in several places in the paper. | | DALL-E 2 specifically is on page 18 and the system card: | https://github.com/openai/dalle-2-preview/blob/main/system-c... | | DALL-E 2 = the stack of unCLIP and the image generator. | phailhaus wrote: | Unrelated: I've noticed the common use of underscores for | emphasis in HN comments. Why use that when italics are | supported via asterisks? _Like this_? | burke wrote: | HN's markup is idiosyncratic but similar enough to markdown | that it's hard for occasional commenters to remember the | details. It's also minimal enough that users are already used | to parsing extra-syntactic markup visually. | oofbey wrote: | Yeah. HN should just switch to markdown. ;) | bern4444 wrote: | Markdown syntax is that a single underscore around words | _like this_ renders it italicized. | phailhaus wrote: | Sure, but HN doesn't. I'm trying to understand why I see so | many comments using underscores when only asterisks work. | tingletech wrote: | this was also a common convention during the usenet news | era | LeifCarrotson wrote: | I think it's a combination of Markdown being completely | readable in plain text especially if you're familiar with | the syntax and even if you're not. Similarly, I see a lot | of people using TeX-style mathematics, it's not | particularly readable but "The quadratic formula is | $x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}$" is a decent way of | representing the formula to people fluent in LaTeX even | in plaintext conditions. I suppose there's also likely to | be a bit of muscle memory where people accustomed to | typing in Github/Stack Overflow/Reddit markdown use it on | other systems, and even if they see it's not supported | it's good enough to not need editing. | | I don't think it's particularly worthwhile to learn a new | comment format (one that's not even linked or described | in the comment editor, for that matter) for every site. | thomasahle wrote: | _Underscores_ still _work_ , even if they don't get | converted into <u></u>, they convey the meaning just | fine. | | Similar to how people use ">" to indicate quotes, even if | it doesn't get special treatment by the editor. | dwighttk wrote: | unCLIP as in "this algorithm is NOT going to be told to make | paper clips resulting in all the mass in the solar system | converted into paper clips"? | oofbey wrote: | Diffusion models seem like they're poised to completely replace | GANs. They obviously work super well, and you don't have this | super finicky minimax training problem. | SleekEagle wrote: | Yeah, I haven't seen any big advancements in GANs in few years. | Have I missed anything big or is the research volume trending | down on them? | astrange wrote: | There's this but I don't know if it's been followed up on. | | https://www.microsoft.com/en- | us/research/publication/manifol... | mdda wrote: | Or the two can be combined : | https://nvlabs.github.io/denoising-diffusion-gan/index.html | oofbey wrote: | Sounds like it gets the worst of both worlds? The difficult | training of a GAN with the slow runtime of a diffusion model. | mdda wrote: | Could be... Except their page (should you choose to believe | it, of course) specifically addresses the advantages: | | """ | | "Advantages over Traditional GANs" : Thus, we observe that | our model exhibits _better training stability_ and mode | coverage. | | "Why is Sampling from Denoising Diffusion Models so Slow?" | : After training, we generate novel instances by sampling | from noise and iteratively denoising it _in a few steps_ | using our denoising diffusion GAN generator. | | """ | machinekob wrote: | Biggest problem for diffusion models were performance (as you | need to iterate even at inference) But I'm not up to date with | newest architectures maybe its already solved :P | johndough wrote: | I was wondering it if would be possible to train a neural | network to do multiple iterative steps at once. As it turns | out, it has already been done and it requires about 4 to 8 | distilled iterations for comparable quality. If this pace | keeps up, we will probably see similar running time to GANs | in the near future. | | https://arxiv.org/pdf/2202.00512.pdf | ak391 wrote: | open source alternative to dalle: | https://huggingface.co/spaces/multimodalart/latentdiffusion | madiator wrote: | I think DALL-E 3 will generate short clips. But I am curious to | know what HN thinks will OpenAI will do with these technologies? | aabhay wrote: | Try to commercialize it, but fail to create much of a moat from | it. Just like their past commercialization efforts. | SleekEagle wrote: | GPT-3 was sold with exclusive usage rights to MicroSoft, so | maybe something along those lines with a different company | (Meta?). As for what they will do with it, it's hard to say ... | astrange wrote: | You can use GPT-3 right now on OpenAI playground and there's | commercial apps running on it that as far as I know aren't on | Azure. It's not clear what they meant by exclusive. | aantix wrote: | How are objects differentiated from their background? | password54321 wrote: | Tech bros are high-fiving their way to the top in every field | with some neural nets. No one is safe. | ausbah wrote: | these are teams of PhD research scientists and research | engineers. I wouldn't quantify them quite as just tech bros | SleekEagle wrote: | The rate of advancement over the past 10-15 years really has | been incredible. Now the question is - is this growth curve | logistic or exponential! | pupppet wrote: | If I have it generate a "bowl of soup" will I find an identical | bowl in some clip art collection somewhere? How much does it | deviate from the source images? | SleekEagle wrote: | You can try to reverse image search - from what I've seen of | other people doing this, the renditions are quite distinct. The | diffusion process is ultimately the root of the model's ability | to not just copy images. Variational methods truly allow for | the learning of a distribution, which is why VAEs can generate | new data and AEs can't. | | Also, practically from a data point of view, the same object | can be represented in numerous ways (different artistic styles, | different filters, abstract paintings, etc.) and the model has | to optimize across all of these samples. What this means is | that the model truly is forced to learn the semantic meaning | behind a concept and not just rely on specific features. | | Check out the dropdown under the "Significance of CLIP to | DALL-E 2" section in the article | corysama wrote: | I've played with tech like this for over a year now. You won't | find the bowl in the source images. It doesn't evolve the noise | into a source image. It slowly nudges the noise into feeling | more and more bowl-like. Do that enough, and you get something | that feels quite a bit like a bowl. | | Put it this way: The model file is absurdly smaller than the | half billion source images files. If it actually contained the | source images, it would be the greatest feat of image | compression ever. Instead it only contains the impression left | over by the images. A lot closer to a memory than a jpg. | rglover wrote: | I've always been skeptical of AI stuff (for obvious reasons/long- | term implications), but I have to say this application has me | excited beyond belief. This is pure magic. Kudos to the OpenAI | team. | SleekEagle wrote: | A few years ago when photorealistic facial image generation | models started getting really good I had my first "holy crap" | moment. OpenAI expanding the domain from faces to essentially | _anything_ is absolutely mind blowing. An absolutely seminal | step forward undoubtedly! | [deleted] | [deleted] | chrisco255 wrote: | I find myself oscillating between excitement and sheer terror, | sometimes several times a day. | SleekEagle wrote: | Sometimes at the same time! | breakfastduck wrote: | This is a truly horrifying piece of technology, destined to | destroy the livelihoods of countless artists. It's incredible in | terms of the technology, but... scary in equal measure. | | I can't think of a single good reason for this to exist that | doesn't have huge negative impacts on our world. | | Why pay an artist/graphic designer when this does what you need? | | "Now those damned creatives can go and find real jobs" | alar44 wrote: | Only non-artists say this. Every graphic designer I know thinks | this is great. | chrisco255 wrote: | I'm less worried about the jobs angle, as this can be viewed as | a productivity tool. I'm more worried about the ability to use | this tech for deep fakes. It's going to erode trust in society | even further than it already has. | MartinCron wrote: | The cynic in me is wondering if that will make any | difference. It's not like people need deep fakes or even the | possibility of deep fakes to believe that the world is flat | or that Obama was born in Kenya or that lizard people are | running sex trafficking rings out the basements of pizza | parlors with no basements. | | People look at the objective reality, provided by the sources | that should have the most credibility, and just shrug it off. | wormer wrote: | Right people _should_ but this will only increase people | being deluded, because of it 's ease of use. And it's not | like any of us are immune to being deluded either; I'm sure | there are things I and others take as truths because the | facts we found them upon were carefully fabricated to have | no holes. | | If I saw a masterfully crafted video of vaccines _actually_ | being implanted with microchips, wouldn 't I believe it? | I'm not an expert on identifying deepfakes, nor should I be | just to consume media. I think this is a valid cause for | concern and will make things worse rather than keep it the | same. | sephlietz wrote: | How is this any different than any other technological | innovation which has made a job obsolete or otherwise allowed | fewer people to do more work? | wormer wrote: | I argue there is a difference because of the nature of the | work. Machines aiding in farming is only a good thing, | because it can maximize output and minimize input. People | (largely) don't care about the process of how it was grown, | but rather having the product to eat (Of course there's | cruelty free agriculture, organic, etc. but stay with me | here). But artistry is a personal thing, and maximizing the | output of art pieces isn't something that most are interested | in. Art is a uniquely unquantifiable subject, and we want it | to have a personal and emotional connection to both the | creator and the viewer, something that is lost when AI boil | it down to it's essential components and rebuild them in it's | image. | WillPostForFood wrote: | _Machines aiding in farming is only a good thing, because | it can maximize output and minimize input._ | | Machines aiding in art is only a good thing, because it can | maximize output and minimize input? | | Makes art cheaper, more accessible, allows more people to | create? | | It is like how digital filmmaking has cracked the Hollywood | monopoly on content. | wormer wrote: | I think that this doesn't really help artists as much as | just do it for them. Art, the way I see it, requires a | human to do because it is something that requires | emotion, something a robot could _replicate_ but not | feel. For example, a gut wrenching image of innocents | being beat by police is gut wrenching _because_ it 's | something that exists in the real world, and the artist | and the subjects are real and their emotion is real. But | a computer generated image only has a likeness; it | doesn't have actual emotion. | | I aldo don't think that it makes it "cheaper, more | accessible, and allows more people to create". Digital | art supplies being something readily available and | relatively cheap to their classic counterparts is what | makes things more accessible, and to make it more so | would be to drive the cost down or something. Having the | computer draw for you isn't exactly creating art. | | And art isn't a commodity and I argue it shouldn't be a | commodity. It's something, again, personal and special. | | And this doesn't end at the visual arts, I think it | applies too to writing. AI could write what's written in | my journal word for word but my journal would have more | value just by virtue of it being written by me. | madiator wrote: | Your argument is weak in that it could have been invoked for | several previous inventions: ATMs replacing cashiers, search | engines replacing librarians and so on. | password54321 wrote: | Well they won't be alone at least. Even us programmers are | eventually going to be replaced. | p1esk wrote: | Same was true many times throughout the history. Why do people | still pay musicians to play in live concerts when they could | listen to a recording? Why do people still watch other people | play chess when they could watch two AIs play much better | chess? | | Think long term. Eventually AI will be able to do most of human | jobs. As a result, products and services will become cheaper. | As a result, people will have to work less for a living. As a | result, more people will be able to draw and paint for | pleasure, and not necessarily to make a buck. | Bud wrote: | There are a couple gigantic blind spots here: | | 1) AI appears to have approximately zero chance of making | housing and food and other basic needs cheaper. | | 2) Artists WANT to make money for creating art, music, etc. | emteycz wrote: | I'm working on applying this technology to housing as we | speak, you're very wrong IMHO. | | Yeah some people are going to loose jobs over this, happens | all the time. People are not isolated from the market, they | function on it and need to take it into account. | badRNG wrote: | > Think long term. Eventually AI will be able to do most of | human jobs. As a result, products and services will become | cheaper. As a result, people will have to work less for a | living. As a result, more people will be able to draw and | paint for pleasure, and not necessarily to make a buck. | | This is ahistorical. The fact is that you must at least seem | to produce more market value than your total compensation in | order for a company to hire you. There will simply be less | people who make a "livable" wage while those who own these | automations will become increasingly wealthy. Depending on | how the market changes, there may also be increasing | unemployment. But why would that matter? Unless unemployment | gets too high, the market will continue to work as usual. | | There's simply no reason for the owners and inheritors of an | increasingly automated economy to share the value increase | with their workers. The worker's wages will be market- | determined just as before. Perhaps if unemployment gets too | high it will be in their interests to offer something like | UBI, though no reason for anything beyond what's strictly | necessary for the economy to function, and the minimum | required to avoid excessive social turmoil. | gbasin wrote: | Your claim is very theoretical. In practice, everyone in | the world has grown increasingly wealthy, and unemployment | levels are lower than ever. | bckr wrote: | > As a result, products and services will become cheaper. As | a result, people will have to work less for a living. | | I think we've seen this play out before and instead of | reducing work, our standards of living increase and people | keep working about the same amount. See e.g. the post | industrial world where homemakers had to scrub clothes, then | got machines to do the scrubbing, but subsequently had to | clean the clothes more frequently. | | We might be able to reduce the overall amount of human work | only through extremely successful social/political reforms | similar to the ones that outlawed child labor and established | the 40 hour work week. Assuming the technology will cause it | to happen is bound to lead to disappointment. | Der_Einzige wrote: | The future is now old man. | macawfish wrote: | meanwhile artists are like the most curious about this | karmasimida wrote: | technology once invented is not going back | | you can't demand some technology not to be used when it is not | a weapon | | there isn't a reason to believe that our current world is in a | stage that is free from changes, in fact our world become what | it is due to invention of disruptive technologies, regardless | you like it or not. | MarcoZavala wrote: | zackbrown wrote: | Last week at a birthday party, I met a 74-year-old career | visual artist who still creates with various media: paint, | colored pencils, sculpture, etc. | | Curious for her thoughts on DALL-E, I pulled out my phone and | invited her to generate some imagery. (I have early access via | a family member at OpenAI.) She didn't skip a beat, and | immediately started _getting creative_ with it. We even did a | "collaborative piece" a la Mad Lib. | | I asked her if she felt threatened by DALL-E. Surprised by the | question, she said: "No! I could see this really accelerating | my process. Sometimes I'm blocked on an idea and I could see | this being a great tool for finding inspiration. Can I get | access to this?" | | My take-away was that art is not zero-sum: someone's art isn't | "less" because more entities are creating art. If computers can | do it too -- even if they're arguably more mechanical in the | recombination of existing ideas (note: humans do the same) -- | nothing stops human art from being art. | izzygonzalez wrote: | Arguably, the biggest barrier to any creative domain is | technical capability. | | An immediate thought is that locked-in people who can only | communicate by text would be able to share their thoughts | more expressively. | | In terms of the creation loop, anyone can create a bunch of | AI-generated images. Wombo is huge right now. The | differentiating factors will be prompt design, commitment to | iteration, aesthetic-driven curation of generated works and | presentation. | | Photographers take and process thousands of photos to create | just one masterpiece. | andreilys wrote: | _My take-away was that art is not zero-sum:_ | | Art is zero sum in that there are a limited number of artist | residencies, exhibitions and funds available. | | In this case, we will likely see further contraction in the | number of artists able to support themselves. There will ofc | always be the super stars and hobbyists. | rictic wrote: | The amount of art that people want in their lives is much | larger than the amount that's there now. | | Artists who are willing to direct their talents towards | satisfying others' desires for art will find the world is | very positive sum. Those that vie for a limited number of | spots in a prestige game may find that it's zero or even | negative sum, but those are not good games to play anyways. | mrfusion wrote: | Maybe an artist can make huge images or whole catalogs of | images with technology like this. | | Maybe more people can be game developers with access to free | original artwork at their fingertips. | | I don't see it as replacing artists, I see it as amplifying | artists. | billconan wrote: | Can I train DALL-E2 on my personal computer with a fairly decent | gpu? or it is out of the question? | SleekEagle wrote: | Unfortunately, it is out of the question. OpenAI trains on | hundreds of thousands of dollars of GPUs and even then the | trainings take two weeks. Also, as far as I know their training | data (400 M image/caption pairs) is not available to the | public! | GaggiX wrote: | fortunately there are even larger public datasets like LAION | 5b | manquer wrote: | You estimate is off by 2 orders of magnitude, it is ore like | 10M+ for single run for the latest generation models[1]. This | is the primary reason why not lot of models are out there. | | Few groups have that kind of money to commit, also the | viability is not yet very clear , i.e. how much the model | with make if commercialized so they can recoup the | investment. | | There is also cost of running the model on each API call, | this is of course not factoring in any of the employee costs. | | [1] https://venturebeat.com/2020/06/01/ai-machine-learning- | opena... | axg11 wrote: | This is a cute question. Not today! I hope someone comes back | to read this question in 10-15 years time, when we will all | have the ability to train Dall-E quality models on our AR | glasses. | ShamelessC wrote: | Never gonna happen ha. | oofbey wrote: | Maybe possible with a fabulous GPU, but still likely not, and | if it did work it would take a horrendously long time. The real | blocker is gonna be GPU memory. With an RTX 3090 you have 24 GB | of GPU RAM and _might_ be able to try it, but I'm still not | sure it would fit. The key model has 3.5 billion parameters, | which at 16-bit requires 7GB of GPU-memory for each copy. | Training requires 3 or 4 copies of the model, depending on the | algorithm you use. And then you need memory for the data and | activations, which you can reduce with a small batch size. But | if it did fit, on a single GPU with a small batch size, you're | probably looking at years of training time. | | Even an RTX 3080 is a complete non-starter. | manquer wrote: | Something like the Quadro RTX 8000 may theoretically work, it | does have 48GB of RAM [1]. | | [1] https://www.nvidia.com/content/dam/en- | zz/Solutions/design-vi... | simonw wrote: | I'm pretty confident that part of OpenAI's competitive edge is | that they can train these models on GIANT clusters of machines. | | This article predicts that GPT-3 cost $10-$20m to train. I | imagine DALL-E could cost even more: | https://lastweekin.ai/p/gpt-3-is-no-longer-the-only-game?s=r | joshcryer wrote: | Nope, and you'll still need a pretty beefy computer to run the | trained data. Currently GPT-NeoX-20B, the "open source GPT3," | requires 42 GB of VRAM, so you're looking at minimum a $5-6k | graphics card (though a Quadro RTX 8000 is actually in stock so | there's that). Or use a service like GooseAI. | | Eleuther.ai or some other open source / open research | developers will likely try to reproduce DALL-E 2 but it'll take | some time and a lot of donated hardware and cycles. | mokchira wrote: | From the article: | | "CLIP is trained on hundreds of millions of images and their | associated captions..." | | Does anyone have any insight as to which images were trained on? | Was it all open-domain stuff? And if not were the original | authors of those images made aware their work was being use to | train an AI that would likely put them out of work? Were they | compensated appropriately? | SleekEagle wrote: | As far as I know, OpenAI has not made this dataset publicly | available. IIRC the dataset is images scraped from instagram | and their corresponding captions. Check out the CLIP paper for | more details: | | https://arxiv.org/abs/2103.00020 | | Theoretically, you could build a web-scraping tool to do | something like this, but even storing that data would take an | absolutely insane amount of storage. | | I would assume OpenAI has some deal with Meta to make the | creation of datasets like this easier. | mokchira wrote: | Thanks for the link. I hope they do make the data set | publicly available at some point so that the artists whose | work helped train this can know. I think, while it is | absolutely impressive on a technical level what the OpenAI | team has been able to do, it is also important to consider | what damage it will do to artists and their livelihood. | | Many professional artists stake their career on one unique | style of art that they have honed and developed over many | years. It's this unique style that clients generally pay for, | and that now faces a very real threat of being stolen from | them by a technology that frankly no human can hope to | compete with. Without artist compensation, this can only lead | to artists terminating their careers early once the AI has | co-opted all work from them. Or future artists never | beginning their careers in the first place. This is a net | loss for humanity, as it will deprive us of works and styles | of art that have yet to be imagined. | | I'm not saying AI like this needs to go away. There is no | putting that genie back in the bottle, of course. But it | needs to be something that artists opt into. If someone's | style is worth it for OpenAI to train on, then that style | obviously should have a price tag. And it ought to be up to | the artist whether they want to sell or not. Anything short | of that is theft in my eyes. | superasn wrote: | Gaming is going to get so interesting with these emerging | technologies. I played A.I. dungeon sometime ago and I was amazed | at how good it was at making up believable stories on the fly. | | Now imagine joining this with dall-e and you truly have a game | which has never existed until now, with it's own story and | graphics that you are creating on the spot. | | Unlike the adventure games like King's quest where everything was | pre-programmed, this is truly infinite never-ending game with a | unique experience for every single player. | | Like the guy from the '2 min papers': what a time to be alive. I | feel so happy and excited just thinking about the possibilities | these techs are going to bring. | SleekEagle wrote: | Yes! So many exciting possibilities in so many industries. | Hopefully it won't displace artists though, we'll have to find | a way to manage the efficiency of AI with the curation of | artists! | smaudet wrote: | > I feel so happy and excited just thinking about the | possibilities these techs are going to bring. | | Like what? | | I should preface by saying I think art is great, I know a lot | of artists who struggle to make a living, and it is somewhat | heartbreaking to think of all the poor art students who I guess | we should pay for their educations and will never have careers | now? | cercatrova wrote: | So we should halt progress just so people can keep their | jobs? I hate this argument whenever AI or automation is | brought up, it's probably one of the worst ways to deal with | it. | nightski wrote: | As long as it is progress. What usually happens though is | we get a watered down version of what we had before, but | since it is cheaper and far more profitable the big | companies exploit it to maximum effect. So in reality we | lose a lot. | | I'm hopeful but skeptical at the same time. | exolymph wrote: | We will still need people with taste to drive the machines | and curate output. | | Also, like, this is how the world works. To cite a hackneyed | example, people who worked with horses had to figure | something out when new tech displaced them. So will graphic | designers, illustrators, et al, if indeed AI is a more | competitive option for their services. | visarga wrote: | Not just graphic designers. In NLP, what used to take years | of data labelling, architecture design and model training | now is being done zero-shot by GPT-3. | | Simple automations can be driven by GPT-3 as well. It needs | a representation of the screen and it will automate the | task described in natural language. | elhesuu wrote: | AI generated images are not art. They might use the same | medium visual arts do, but they lack a meaningful vision of | the world. | | Of course defining art is a subject in itself, but I think | that being afraid of AI replacing artists is comparable to | thinking photography would when it was invented. | password54321 wrote: | If you wanted one original character and you wanted shots of | that character from multiple angles with a consistent look, | Dalle 2 would already fail. | simonw wrote: | But a variant of Dall-E that output a textured 3D character | model would work fantastically well. | astrange wrote: | Only if you're sure it didn't memorize a copyrighted | input, and only until everyone gets bored with its style | or you want your assets to look the same in a predictable | way. | smaudet wrote: | True, but artists also sell artwork - my question could be | reframed as, if Dalle 2 can produce a Rembrandt, is a | Rembrandt worth anything, even emotionally? | SleekEagle wrote: | I think it's worth pointing out that DALL-E 2 _mimics_ | the style of famous artists. The artists had to come up | with the original style in the first place! | | There are highly competent artists that can create highly | convincing copies (fabrications? forgeries?) of famous | paintings. Are these paintings worth anything? No, | because people find value in the specific contribution to | the field of art that the particular painting represents. | | I think we should look at DALL-E 2 like a highly | competent artist that can produce convincing forgeries | and even mimic the style of famous artists, but cannot | replace the artists themselves. | mupuff1234 wrote: | I doubt it will beat a curated experience any time soon, but I | do see a future where it could assist in creating that curated | experience. | kromem wrote: | For the format, AI dungeon already does. | | When I played with it, I started a quest as a wizard looking | for a book. | | I was able to cast a tracking spell that led me to a giant | library. | | I could have it read the titles of the books on a shelf in | front of me. | | I could pick up any book and open to a random page and have | it tell me what was in it. | | One was about a little half-elf that had a magic flute that | broke. | | I cast a summoning spell to summon the half-elf and fix its | flute, after which it happily played a song opening a door to | another dimension filled with musical instruments. | | Give me that level of emergent gameplay in a VR open world, | and then just take my money and all my free time, as I'm | never leaving. | | We're simply very, very early on in what's arguably going to | be the most transformative tech since the Internet. People | predicted back then that the slow network which only offered | basic things like email wasn't going to significantly disrupt | things like retail. | | They were only right in that it didn't remain slow and ended | up doing a lot more than email. | | This stuff is getting better way faster than any tech I've | seen, and I used to consult for CEOs at Fortune 500s and sit | on advisory boards on the topic of emerging tech. | | I wouldn't be so quick to bet against it. We really haven't | even started to see what these models can do in application. | kromem wrote: | Yeah, it's getting to the point I'm starting to see current | game design as getting long in the tooth by comparison to what | I know is ahead. | | There's a great tech demo a dev did a year or two ago | showcasing GPT-3, speech-to-text, and text-to-speech to have | random NPCs in a VR open world respond to anything the guy said | if he walked up to them and talked to them. | | Procedural generation has taken on almost a "dirty word" | reputation in the past few years in gaming, but as AI continues | to allow for exponential variety at increasingly high quality, | it's going to enable some truly mind boggling experiences. | | Expect to see MMO models (subscription fee and server-oriented) | but for single-player instanced worlds dynamically generated | around your interactions in them. | | I can't wait to have a party of friends to go on epic | adventures with that are all just AIs I picked up across a | world along the way. | | Less than 20 years away, and possibly even less than 10. | seanwilson wrote: | > "DALL-E 2's works very simply: ... a model called the prior | maps the text encoding to a corresponding image encoding that | captures the semantic information of the prompt contained in the | text encoding. Finally, an image decoding model stochastically | generates an image which is a visual manifestation of this | semantic information." | | > "The fundamental principles of training CLIP are quite simple: | First, all images and their associated captions are passed | through their respective encoders, mapping all objects into an | m-dimensional space." | | Not scared to admit I don't find this simple at all haha. I'm | probably not in the target audience. I'd love a description that | doesn't assume machine learning basics. Is there one? | pas wrote: | https://ml.berkeley.edu/blog/posts/dalle2/ | | it's "simple" because how it works is "just" brute-fucking- | force. of course coming up with the architecture and making it | fast (so it scales up well) is the challenge. | | and scaling works .. because .. well, no one knows why (but | likely because it's just a nice architecture for learning, | evolution also converged on it without knowing _why_ ) | | see also: https://www.gwern.net/Scaling-hypothesis ___________________________________________________________________ (page generated 2022-04-19 23:00 UTC)