[HN Gopher] Dall-E 2
       ___________________________________________________________________
        
       Dall-E 2
        
       Author : yigitdemirag
       Score  : 1040 points
       Date   : 2022-04-06 14:09 UTC (8 hours ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | Traster wrote:
       | To be honest the Girl with a Pearl Earing "variations" look a
       | little bit like a crime against art. It's like the person who
       | built this has no idea why the Girl with a Pearl Earing is good
       | art. "Here's the Girl with a Pearl Earing " - "OK, well here's
       | some girls with turbans"
       | 
       | Art is truth.
        
         | bbbobbb wrote:
         | To be honest it's hard for me to imagine alternate reality
         | where the 'original' is not swapped with one of the
         | 'variations' without same comment underneath. Why is the
         | 'original' good art?
        
         | sillysaurusx wrote:
         | Maybe.
         | https://cdn.openai.com/dall-e-2/demos/variations/modified/gi...
         | was pretty impressive.
         | 
         | I think the results are being poisoned by the fact that most
         | old paintings have deteriorated colors, so the training data
         | looks nothing like the originals. It's certainly a lot yellower
         | than
         | https://cdn.openai.com/dall-e-2/demos/variations/originals/g...
        
         | eks391 wrote:
         | > It's like the person who built this has no idea why the Girl
         | with a Pearl Earing is good art.
         | 
         | The people didn't program Dall E how to make art. They taught
         | it to recognize patterns and create something by extrapolating
         | from the patterns, all on its own. So the AI isn't a projection
         | of what they think is good art, it's projecting what it thinks
         | is good art, based on a prompt. The output is its best effort
         | of a feeling, even if the feeling had to be inputted by a
         | living person. So it's still art that's as good as the feeling
         | that it came from-fleeting feelings being lower quality than
         | those that required more time and thought
        
       | billconan wrote:
       | I'm curious, is this something feasible to train (and inference)
       | on a consumer level machine, or this is something can only be
       | done by institutes?
        
       | marviel wrote:
       | It's becoming clear that efficient work in the future will hinge
       | upon one's ability to _accurately describe what one wants_.
       | Unpacking that -- a large piece is the ability to understand all
       | the possible  "pitfalls" and "misunderstandings" that could
       | happen on the way to a shared understanding.
       | 
       | While technical work will always have a place -- I think that
       | much creative work will become more like the _management_ of a
       | team of highly-skilled, niche workers -- with all the
       | frustrations, joys, and surprises that entails.
        
         | killerstorm wrote:
         | No... These models are trained to predict.
         | 
         | You can definitely make them incremental. You can give it a
         | task like "make a more accurate description from initial
         | description and clarification". Even GPT-3-based models
         | available today can do these tasks.
         | 
         | Once this is properly productionized it would be possible to
         | implement stuff just talking with a computer.
        
         | [deleted]
        
         | golergka wrote:
         | > accurately describe what one wants
         | 
         | Isn't that essentially what programming already is?
        
         | armchairhacker wrote:
         | Programming, art, music, is just "describing what you want" in
         | a very specific way. This is describing what you want in a much
         | more vague way.
         | 
         | The upside it that it's more "intuitive" and requires much less
         | detail and technique, as the AI infers the detail and
         | technique. The downside is that it's really hard to know what
         | the AI will generate or get it to generate something really
         | specific.
         | 
         | I believe the future will combine the heuristics of AI-
         | generation with the specificity of traditional techniques. For
         | example, artists may start with a rough outline of whatever
         | they want to draw as a blob of colors (like in some AI image-
         | generation papers). Then they can fill in details using AI
         | prompts, but targeting localized regions/changes and adding
         | constraints, shifting the image until it's almost exactly what
         | they imagined in their head.
        
       | falcor84 wrote:
       | >We've limited the ability for DALL*E 2 to generate ... adult
       | images.
       | 
       | I think that using something like this for porn could potentially
       | offer the biggest benefit to society. So much has been said about
       | how this industry exploits young and vulnerable models. Cheap
       | autogenerated images (and in the future videos) would pretty much
       | remove the demand for human models and eliminate the related
       | suffering, no?
       | 
       | EDIT: typo
        
         | sillysaurusx wrote:
         | Depends whether you think models should be able to generate cp.
         | 
         | It's almost impossible to even give an affirmative answer to
         | that question without making yourself a target. And as much as
         | I err on the side of creator freedom, I find myself shying away
         | from saying yes without qualifications.
         | 
         | And if you don't allow cp, then by definition you require some
         | censoring. At that point it's just a matter of where you
         | censor, not whether. OpenAI has gone as far as possible on the
         | censorship, reducing the impact of the model to "something that
         | can make people smile." But it's sort of hard to blame them, if
         | they want to focus on making models rather than fighting
         | political battles.
         | 
         | One could imagine a cyberpunk future where seedy AI cp images
         | are swapped in an AR universe, generated by models ran by
         | underground hackers that scrounge together what resources they
         | can to power the behemoth models that they stole via hacks.
         | Probably worth a short story at least.
         | 
         | You could make the argument that we have fine laws around porn
         | right now, and that we should simply follow those. But it's not
         | clear that AI generated imagery can be illegal at all. The
         | question will only become more pressing with time, and society
         | has to solve it before it can address the holistic concerns you
         | point out.
         | 
         | OpenAI ain't gonna fight that fight, so it's up to EleutherAI
         | or someone else. But whoever fights it in the affirmative will
         | probably be vilified, so it'd require an impressive level of
         | selflessness.
        
           | [deleted]
        
           | chias wrote:
           | Would this not necessarily require training it on a large
           | body of real CSAM? Seems like it would be a non-starter.
        
             | sillysaurusx wrote:
             | Surprisingly no. It knows what a child looks like, and can
             | infer what a naked child looks like from medical imagery.
             | 
             | A child with adult body parts is a whole other class of
             | weirdness that might pop out too.
             | 
             | Models want to surprise us all.
        
           | loufe wrote:
           | There are so many excellent, thought-provoking comments in
           | this thread, but yours caught me especially. Something that
           | came to mind immediately upon reading the release was the
           | potential for this technology to transform literature, adding
           | AI generated imagery to turn any novel into a visual novel as
           | a premium way to experience the story, something akin to
           | composing D-Box seat response to a modern movie. I was
           | imagining telling the cyberpunk future story you were
           | elaborating, which is really compelling, in such a way and
           | couldn't help but smile.
        
             | sillysaurusx wrote:
             | Please write it! I'd love to read one.
        
             | aryamaan wrote:
             | In the same theme, I liked the comments of both of you.
             | 
             | Another use case could be to make it easier/ automatic to
             | create comics. You tell what the background should be,
             | characters should be doing and the dialogues. Boom, you
             | have a good enough comic.
             | 
             | -----------
             | 
             | Reading as a medium has not evolved with technology.
             | Creating the imagery does happen in humans' minds. It's not
             | surprise that some people enjoy doing that (and also enjoy
             | watching that imagery) and others do not.
             | 
             | This could be a helping brain to create those imageries.
             | 
             | -----------
             | 
             | Now imagine, reading stories to your child. Actually,
             | creating stories for your child. Where they are the
             | characters in the stories. Having a visual element to it is
             | definetly going to be a premium experience.
        
         | GauntletWizard wrote:
         | Religious people don't only believe that porn harms the models,
         | but also the user. I happen to agree, despite being a porn user
         | - Porn is a form of simulated and not-real stimulation. Porn is
         | harmful to the user the same way that any form of delusion is:
         | It associated positive pleasure with stimulation that does not
         | fulfil any basic or even higher-level needs, and is
         | unsustainable. Porn is somewhere on the same scale as
         | wireheading[1]
         | 
         | That doesn't mean that it's all bad, and that there's no
         | recreational use for it. We have limits on the availability of
         | various other artificial stimulants. We should continue to have
         | limits on the availability of porn. Where to draw that line is
         | a real debate.
         | 
         | [1] https://en.wikipedia.org/wiki/Wirehead_(science_fiction)
        
           | [deleted]
        
         | Siira wrote:
         | The problem might be that people are simply lying. Their real
         | reasons are religious/ideological, but they cite humanitarian
         | concerns (which their own religious stigma is partly
         | responsible for).
        
         | thom wrote:
         | People take their experiences of porn into real relationships,
         | so I do not think this removes suffering overall, no.
        
         | AYBABTME wrote:
         | Iain Banks' "Surface Detail" would like to have a word with
         | you.
         | 
         | This author's books are great at putting these sort of moral
         | ideas to test in a sci-fi context. This specific tome portraits
         | virtual wars and virtual "hells". The hope is of being more
         | civilized than by waging real war or torturing real living
         | entities. However some protagonists argue that virtual life is
         | indistinguishable from real life, and so sacrificing virtual
         | entities to save "real" ones is a fallacy.
         | 
         | Or some such, it's been a while.
        
         | cm2012 wrote:
         | I suspect that if a free version of this comes out and allows
         | adult image generation, 90% of what it will be used for is
         | adult stuff (see the kerfuffle with AIDungeon).
         | 
         | I can get why the people who worked hard on it and spent money
         | building it don't want to be associated with porn.
        
       | albertzeyer wrote:
       | Some initial video by Yannic Kilcher:
       | https://www.youtube.com/watch?v=gGPv_SYVDC8
        
       | mario143 wrote:
       | Yeah, I mean you're right that ultimately the proof is in the
       | pudding.
       | 
       | But I do think we could have guessed that this sort of approach
       | would be better (at least at a high level - I'm not claiming I
       | could have predicted all the technical details!). The previous
       | approaches were sort of the best that people could do without
       | access to the training data and resources - you had a pretrained
       | CLIP encoder that could tell you how well a text caption and an
       | image matched, and you had a pretrained image generator (GAN,
       | diffusion model, whatever), and it was just a matter of trying to
       | force the generator to output something that CLIP thought looked
       | like the caption. You'd basically do gradient ascent to make the
       | image look more and more and more like the text prompt (all the
       | while trying to balance the need to still look like a realistic
       | image). Just from an algorithm aesthetics perspective, it was
       | very much a duct tape and chicken wire approach.
       | 
       | The analogy I would give is if you gave a three-year-old some
       | paints, and they made an image and showed it to you, and you had
       | to say, "this looks like a little like a sunset" or "this looks a
       | lot like a sunset". They would keep going back and adjusting
       | their painting, and you'd keep giving feedback, and eventually
       | you'd get something that looks like a sunset. But it'd be better,
       | if you could manage it, to just teach the three-year-old how to
       | paint, rather than have this brute force process.
       | 
       | Obviously the real challenge here is "well how do you teach a
       | three-year-old how to paint?" - and I think you're right that
       | that question still has a lot of alchemy to it.
        
       | EZ-Cheeze wrote:
       | "Computer, render Bella and Gigi Hadid playing tennis in bikinis"
        
       | KevinGlass wrote:
       | Something about this makes me nauseous. Perhaps is the fact that
       | soon the market value for creatives is going to fall to a hair
       | about zero for all but the most famous. We will be all the poorer
       | for it when 95% of images you see are AI generated. There will be
       | niches of course but in a few short years it'll be over for a
       | huge swathe of creative professionals who are already struggling.
       | 
       | Some of the images also hit me with a creep factor, like the
       | bears on the corgis in the art gallery, but that maybe only
       | because I know it's AI generated.
        
         | idleproc wrote:
         | I imagine it will affect artists much the same way wordpress
         | has affected web designers.
         | 
         | Maybe everyone will have an AI image as their desktop
         | wallpaper, but if you've got cash you'll want something with
         | provenance and rarity to brag about.
         | 
         | Also, I think creatives are valued for their imagination. If
         | you wanted something decent, would you pay someone to sift
         | through a million AI generated images to find a gem, or just
         | pay an artist you like to create one for you?
        
           | bufferoverflow wrote:
           | > you'll want something with provenance and rarity to brag
           | about.
           | 
           | 1) That is a tiny share of the market. Most of the market is
           | - I have a game / online publication / book, and I need an
           | illustration xyz. Which this AI seems to solve.
           | 
           | 2) how do you even prove your rare art wasn't painted by an
           | AI?
        
             | idleproc wrote:
             | 1) Sure there's a lot of work for that kind of thing but
             | creatives typically earn a pittance. I doubt an AI could
             | meet your specific requirements without having to spend
             | hours(?) tweaking it or sifting through countless
             | variations for the 'one'.
             | 
             | 2) Because we haven't built a machine that can paint (etc.)
             | with traditional materials like a skilled artist?
        
         | typon wrote:
         | I paid $1500 for a commissioned painting from an artist I
         | respect and follow as a birthday present for a friend. The
         | painting meant something to me because I worked with the artist
         | to have some input about what kind of a person my friend is,
         | what kind of features I want to see in the painting and how I
         | want it to feel. The artist gave me 5 different sketches and we
         | had tons of back and forth. The process and the act of creating
         | the painting on a canvas from someone I respect is what I paid
         | for.
         | 
         | Even if an AI could generate an exactly equivalent painting, I
         | would pay $0 for it. It wouldn't mean anything to me.
        
         | chpatrick wrote:
         | Just wait until they figure out music.
        
         | Applejinx wrote:
         | Not exactly. All the ideas put forth in these demos are really
         | arbitrary, with nothing whatsoever to say. Generating crap art
         | becomes more and more effortless: we've seen this in music as
         | well.
         | 
         | Jumping out of the conceptual box to generate novel PURPOSE is
         | not the domain of a Dall-E 2. You've still gotta ask it for
         | things. It's a paintbrush. Without a coherent story, it's an
         | increasingly impressive stunt (or a form of very sophisticated
         | 'retouching brush').
         | 
         | If you can imagine better than the next guy, Dall-E 2 is your
         | new tool for expression. But what is 'better'?
        
           | jupp0r wrote:
           | This reminds me of an art class in high school in the early
           | 2000s where I handed in a printout of a 3d generated image
           | (painstakingly modeled and rendered in software over the
           | whole weekend by me) and the teacher looked at me and told me
           | that's not art because it's "computer generated" and I didn't
           | "even use my hands" to make it. Even as a teenager, the idea
           | that art is defined by how it's made versus it being a way
           | for the artist to express intention in whatever way they seem
           | fit seemed really reductionist and almost vulgar to me.
           | 
           | Maybe lots artists of the future will actually use AI models
           | to express their inner thoughts and desires in a way that
           | touches something in their audience. It will still be art.
        
             | throwaway71271 wrote:
             | 'art' comes from 'artem' which means 'skill', which is the
             | root of 'artificial' (https://www.etymonline.com/word/art
             | and https://www.etymonline.com/word/artificial)
             | 
             | your teacher was wrong
             | 
             | i had a friend who didnt get credit for his design work
             | because he used photoshop instead of using pen and paper
             | for similar reason, i still find it amazing that a teacher
             | would say such a thing
        
               | andybak wrote:
               | > 'art' comes from 'artem' which means 'skill', which is
               | the root of 'artificial'
               | 
               | His teacher was wrong but "argument from etymology" is
               | surely a fallacy.
        
         | amelius wrote:
         | Can I opt-out from ever seeing AI generated images please?
        
         | 323 wrote:
         | The same thing was said when book printing was invented, that
         | we would lose the fabulous scribes that manually duplicate
         | books with a human touch, while replacing them with soulless
         | mechanical machines.
         | 
         | Or when synthesizers and computer music was invented, that they
         | will displace talented musicians that know how to play an
         | instrument and how now everybody without a musical education
         | will be able to produce music, thus devaluing actual musicians.
        
         | alcover wrote:
         | >  for all but the most famous
         | 
         | OK DALL-E, generate our logo in the style of ${most famous}
        
         | axg11 wrote:
         | I really don't agree. When I work with a creative I'm not
         | working with them because of their content generation skills.
         | I'm working with them because of their taste and curation
         | ability that results in the end product.
         | 
         | The nature of creative work will certainly change, creatives
         | will adopt tools such as Dall-E 2. In certain narrow cases they
         | might be replaced, such as if you are asking a creative to
         | generate a very specific image, but how often is that the case?
         | The majority of the time tools such as Dall-E 2 will act as an
         | accelerator for creatives and help them increase their output.
        
         | lofatdairy wrote:
         | Perhaps a more optimistic way of looking at it: When mass
         | production became available to art, the idea of an "artwork"
         | had to be abstracted from a unique piece (Walter Benjamin gives
         | the example of a statue of Venus, which has value in its
         | uniqueness) to the idea of art as the output of some process.
         | Each piece has no claim to authenticity, and the very idea of
         | an "original" would be antithetical to its production.
         | 
         | I think art will survive, just like photography didn't kill the
         | painting, the idea of art might simply begin to encompass this
         | new mean of production, which no longer requires the steady
         | hand, but still requires a discerning eye. Sure, we might say
         | that the "artist" is simply a curator, picking which
         | algorithmic output is most worthy of display, but these
         | distinctions have historically been fluid, and challenging
         | ideas of art has long been one of art's function as well
        
         | dragonwriter wrote:
         | > Perhaps is the fact that soon the market value for creatives
         | is going to fall to a hair about zero for all but the most
         | famous.
         | 
         | But...that's always been the case for creatives.
        
         | throwaway675309 wrote:
         | Nonsense. This is merely a tool and helps lower the barrier of
         | entry to be able to produce imagery.
         | 
         | By the same logic you should also complain about any number of
         | IDEs, development tools, WordPress, game maker systems like RPG
         | maker or Unity, after all if anyone can just leverage a free
         | physics and collision system without having a complete
         | understanding of rigid body Newtonian systems to roll their own
         | engine it'll be too uniform.
        
         | TaupeRanger wrote:
         | By "creatives" you seem to mean "people who drum up the
         | equivalent of elevator music for ads and blogs". This will not
         | remotely replace any working "creative" people that I know.
        
           | pingeroo wrote:
           | Except it will only get more powerful with time, probably at
           | an accelerating pace. Everyone always downplays these
           | legitimate fears about AI, pointing out how "it can't do X".
           | They always forget to put the "yet" at the end of that
           | sentence.
        
             | [deleted]
        
             | TaupeRanger wrote:
             | The person I responded to literally made the claim that it
             | would happen imminently...
        
       | zitterbewegung wrote:
       | I don't want to dismiss this new model and achievements but we
       | are getting to the point where I feel like what we saw in the
       | open source versus close source systems we see in new ml models
       | another one is forming for open and closed models. I think that
       | larger and larger models will have disclaimers either restricting
       | you from using it commercially (a great deal of academics and
       | NVIDIA models are doing this. And OpenAI just puts it behind an
       | API with the rules :
       | 
       | Curbing Misuse Our content policy does not allow users to
       | generate violent, adult, or political content, among other
       | categories. We won't generate images if our filters identify text
       | prompts and image uploads that may violate our policies. We also
       | have automated and human monitoring systems to guard against
       | misuse.
        
         | asxd wrote:
         | They're pretty strict about usage:
         | 
         | -
         | https://github.com/openai/dalle-2-preview/blob/main/system-c...
         | 
         | -
         | https://github.com/openai/dalle-2-preview/blob/main/system-c...
        
         | jdrc wrote:
         | It should be possible to create open source versions,
         | researchers will find a way if something is cool enough
        
       | zackmorris wrote:
       | Apologies for an open-ended question but: does anyone know if
       | there is a term for something like Turing-completeness within AI,
       | where a certain level of intelligence can simulate any other type
       | of intelligence like our brains do?
       | 
       | For example, using DeMorgan's theorem, we can build any logic
       | circuit out of all NAND or NOR gates:
       | 
       | https://www.electronics-tutorials.ws/boolean/demorgan.html
       | 
       | https://en.wikipedia.org/wiki/NAND_logic
       | 
       | https://en.wikipedia.org/wiki/NOR_logic
       | 
       | Dall-E 2's level of associative comprehension is so far beyond
       | the old psychology bots in the console pretending to be people,
       | that I can't help but wonder if it's reached a level where it can
       | make any association.
       | 
       | For example, I went to an AI talk about 5 years ago where the guy
       | said that any of a dozen algorithms like K-Nearest Neighbor,
       | K-Means Clustering, Simulated Annealing, Neural Nets, Genetic
       | Algorithms, etc can all be adapted to any use case. They just
       | have different strengths and weaknesses. At that time, all that
       | really mattered was how the data was prepared.
       | 
       | I guess fundamentally my question is, when will AGI start to
       | become prevalent, rather than these special-purpose tools like
       | GPT-3 and Dall-E 2? Personally I give it less than 10 years of
       | actual work, maybe less. I just mean that to me, Dall-E 2 is
       | already orders of magnitude more complex than what's required to
       | run a basic automaton to free humans from labor. So how can we
       | adapt these AI experiments to get real work done?
        
         | robertsdionne wrote:
         | https://en.wikipedia.org/wiki/Universal_approximation_theore...
        
         | teaearlgraycold wrote:
         | > does anyone know if there is a term for something like
         | Turing-completeness within AI, where a certain level of
         | intelligence can simulate any other type of intelligence like
         | our brains do?
         | 
         | Artificial General Intelligence
        
         | dqpb wrote:
         | Juergeb Schmidhuber predicts the "Omega point" of technological
         | development (including AGI) to be around 2040
         | 
         | https://youtu.be/pGftUCTqaGg
         | 
         | The MIT Limits to Growth study predicts the collapse of global
         | civilization around 2040
         | 
         | https://www.vice.com/amp/en/article/z3xw3x/new-research-vind...
        
         | causticcup wrote:
         | Almost everything stated here is simply wrong or misinformed.
         | 
         | >For example, I went to an AI talk about 5 years ago where the
         | guy said that any of a dozen algorithms like K-Nearest
         | Neighbor, K-Means Clustering, Simulated Annealing, Neural Nets,
         | Genetic Algorithms, etc can all be adapted to any use case.
         | They just have different strengths and weaknesses. At that
         | time, all that really mattered was how the data was prepared.
         | 
         | How do you suppose KNN is going to generate photorealistic
         | images? I don't understand the question here
         | 
         | >I guess fundamentally my question is, when will AGI start to
         | become prevalent, rather than these special-purpose tools like
         | GPT-3 and Dall-E 2?
         | 
         | Actual AGI research is basically non-existant, and GPT-3/Dall-E
         | 2 are not AGI-level tools.
         | 
         | >Personally I give it less than 10 years of actual work, maybe
         | less
         | 
         | Lol...
         | 
         | >I just mean that to me, Dall-E 2 is already orders of
         | magnitude more complex than what's required to run a basic
         | automaton to free humans from labor.
         | 
         | Categorically incorrect
        
       | agloeregrets wrote:
       | The most interesting item to me is the variations on the garden
       | shop and bathroom sink idea. The realism of these leaks the AI
       | lacking intuition of the requirements. This makes for a number of
       | nonsensical designs that look right at first like: This Sink
       | lacks sensical faucets.
       | https://cdn.openai.com/dall-e-2/demos/variations/modified/ba...
       | 
       | This doorway is downright impossible
       | https://cdn.openai.com/dall-e-2/demos/variations/modified/fl...
        
         | dqpb wrote:
         | It looks to me like the faucet sprays water sideways toward the
         | bowl, which is genius, because then you aren't bumping up
         | against it when you're washing your hands!
        
         | Spinnaker_ wrote:
         | "Doorway in the style of Escher"
        
         | momojo wrote:
         | Great point. When I saw the shadows and reflections, I thought
         | it had developed a primitive understanding of physical logic.
         | Now I'm not so sure.
         | 
         | At this point, it still seems like it's pushing pixels around
         | until it's "good enough" when you squint at it.
        
       | aaron695 wrote:
        
       | minimaxir wrote:
       | A few comments by someone who's spent way too much time in the
       | AI-generated space:
       | 
       | * I recommend reading the Risks and Limitations section that came
       | with it because it's very through:
       | https://github.com/openai/dalle-2-preview/blob/main/system-c...
       | 
       | * Unlike GPT-3, my read of this announcement is that OpenAI does
       | not intend to commercialize it, and that access to the waitlist
       | is indeed more for testing its limits (and as noted,
       | commercializing it would make it much more likely lead to
       | interesting legal precedent). Per the docs, access is _very_
       | explicitly limited:
       | (https://github.com/openai/dalle-2-preview/blob/main/system-c...
       | )
       | 
       | * A few months ago, OpenAI released GLIDE (
       | https://github.com/openai/glide-text2im ) which uses a similar
       | approach to AI image generation, but suspiciously never received
       | a fun blog post like this one. The reason for that in retrospect
       | may be "because we made it obsolete."
       | 
       | * The images in the announcement are still cherry-picked, which
       | is therefore a good reason why they tested DALL-E 1 vs. DALL-E 2
       | presumably on non-cherrypicked images.
       | 
       | * Cherry-picking is relevant because AI image generation is still
       | slow unless you do real shenanigans that likely compromise image
       | quality, although OpenAI has likely a better infra to handle
       | large models as they have demonstrated with GPT-3.
       | 
       | * It appears DALL-E 2 has a fun endpoint that links back to the
       | site for examples with attribution:
       | https://labs.openai.com/s/Zq9SB6vyUid9FGcoJ8slucTu
        
         | bufferoverflow wrote:
         | Not-so-open.ai
        
           | qeternity wrote:
           | open-your-wallet.ai
        
           | btdmaster wrote:
           | https://www.eleuther.ai (text, not images, but free as in
           | freedom)
        
             | [deleted]
        
             | refulgentis wrote:
             | Katherine Crawson is @ Eletheur & IMHO is indisputably most
             | responsible for the advances in text=>image generation.
             | Dall-E 2 is Dall-E and her insight to use diffusion, the
             | intermediate proof of concept of diffusion + Dall-E is
             | GLIDE.
             | 
             | https://twitter.com/RiversHaveWings &
             | https://github.com/crowsonkb
        
       | bradgessler wrote:
       | Could somebody build this for SVG icons? I'd invest in it.
        
         | applgo443 wrote:
         | What do you want?
        
       | nope96 wrote:
       | Is there an 'explain it like I'm 15' for how this works? It seems
       | like black magic. I've been a computer hobbyist since the late
       | 1980's and this is the first time I cannot explain how a computer
       | does what it does. Absolutely the most amazing thing I've ever
       | seen, and I have zero clue how it works.
        
         | drcode wrote:
         | Imagine asking it to generate a picture for "duck wearing a hat
         | on Mars":
         | 
         | First, it creates a random 10x10 pixel blurry image and asks a
         | neural net: "Could this be a duck wearing a hat on Mars?" and
         | the neural net replies "No, because all the pictures I've ever
         | seen of Mars have lots of red color in them" so the system
         | tweaks the pixels to make them more red, put some pixels in the
         | center that have a plausible duck color, etc.
         | 
         | After it has a 10x10 image that is a plausible duck on Mars,
         | the system scales the image to 20x20 pixels, and then uses 4
         | different neural nets on each corner to ask "Does this look
         | like the upper/lower left/right corner of a duck wearing a hat
         | on Mars?" Each neural net is just specialized for one corner of
         | the image.
         | 
         | You keep repeating this with more neural nets until you have a
         | pretty 1000x1000 (or whatever) image.
        
           | refulgentis wrote:
           | Not the case, though in a handwave-y way, same idea - instead
           | of iteratively scaling, you're iteratively denoising. See
           | here, links out to the Cornell NLP PhD describe in even more
           | detail: https://www.jpohhhh.com/articles/inflection-point-ml-
           | art
        
         | karmasimida wrote:
         | Diffusion models are indeed pretty magical.
        
         | eks391 wrote:
         | Research Deep Learning. That's the technique they are using to
         | generate the images. Theres a lot of applications. Once you
         | understand _how_ it works, look up Two Minute Papers to see
         | what it is being used for. He covers more than just deep
         | learning algorithms, but his videos on deep learning are quite
         | insightful on the potentials of this technology.
        
         | joshcryer wrote:
         | I'm with you there but we still don't know how it works, just
         | that it does. The method though is you take a bunch of images,
         | you plug them into a multi dimensional array (a nice way of
         | saying a tensor), have some kind of tagging system, and when
         | you ask the system for an answer, it will put one out for you.
         | So for example in the astronaut riding the horse, there is, on
         | some level, a picture of a horse with those similar pixels,
         | that exists in the data of some object tagged 'horse.' Likewise
         | with astronaut. What is important is that the data sets are
         | absolutely massive, with billions of parameters.
         | 
         | Here's a more of a 'not 15 year old' explanation:
         | https://ml.berkeley.edu/blog/posts/dalle2/
        
         | Imnimo wrote:
         | Here is my extremely rough ELI-15. It uses some building blocks
         | like "train a neural network", which probably warrant
         | explanations of their own.
         | 
         | The system consists of a few components. First, CLIP. CLIP is
         | essentially a pair of neural networks, one is a 'text encoder',
         | and the other is an 'image encoder'. CLIP is trained on a giant
         | corpus of images and corresponding captions. The image encoder
         | takes as input an image, and spits out a numerical description
         | of that image (called an 'encoding' or 'embedding'). The text
         | encoder takes as input a caption and does the same. The
         | networks are trained so that the encodings for a corresponding
         | caption/image pair are close to each other. CLIP allows us to
         | ask "does this image match this caption?"
         | 
         | The second part is an image generator. This is another neural
         | network, which takes as input an encoding, and produces an
         | image. Its goal is to be the reverse of the CLIP image encoder
         | (they call it unCLIP). The way it works is pretty complicated.
         | It uses a process called 'diffusion'. Imagine you started with
         | a real image, and slowly repeatedly added noise to it, step by
         | step. Eventually, you'd end up with an image that is pure
         | noise. The goal of a diffusion model is to learn the reverse
         | process - given a noisy image, produce a slightly less noisy
         | one, until eventually you end up with a clean, realistic image.
         | This is a funny way to do things, but it turns out to have some
         | advantages. One advantage is that it allows the system to build
         | up the image step by step, starting from the large scale
         | structure and only filling in the fine details at the end. If
         | you watch the video on their blog post, you can see this
         | diffusion process in action. It's not just a special effect for
         | the video - they're literally showing the system process for
         | creating an image starting from noise. The mathematical details
         | of how to train a diffusion system are very complicated.
         | 
         | The third is a "prior" (a confusing name). Its job is to take
         | the encoding of a text prompt, and predict the encoding of the
         | corresponding image. You might think that this is silly - CLIP
         | was supposed to make the encodings of the caption and the image
         | match! But the space of images and captions is not so simple -
         | there are many images for a given caption, and many captions
         | for a given image. I think of the "prior" as being responsible
         | for picking _which_ picture of  "a teddy bear on a skateboard"
         | we're going to draw, but this is a loose analogy.
         | 
         | So, now it's time to make an image. We take the prompt, and ask
         | CLIP to encode it. We give the CLIP encoding to the prior, and
         | it predicts for us an image encoding. Then we give the image
         | encoding to the diffusion model, and it produces an image. This
         | is, obviously, over-simplified, but this captures the process
         | at a high level.
         | 
         | Why does it work so well? A few reasons. First, CLIP is really
         | good at its job. OpenAI scraped a colossal dataset of
         | image/caption pairs, spent a huge amount of compute training
         | it, and come up with a lot of clever training schemes to make
         | it work. Second, diffusion models are really good at making
         | realistic images - previous works have used GAN models that try
         | to generate a whole image in one go. Some GANs are quite good,
         | but so far diffusion seems to be better at generating images
         | that match a prompt. The value of the image generator is that
         | it helps constrain your output to be a realistic image. We
         | could have just optimized raw pixels until we get something
         | CLIP thinks looks like the prompt, but it would likely not be a
         | natural image.
         | 
         | To generate an image from a prompt, DALL-E 2 works as follows.
         | First, ask CLIP to encode your prompt. Next, ask the prior what
         | it thinks a good image encoding would be for that encoded
         | prompt. Then ask the generator to draw that image encoding.
         | Easy peasy!
        
           | 6gvONxR4sf7o wrote:
           | Any pointers on getting up to speed on diffusion models? I
           | haven't encountered them in my corner of the ML world, and
           | googling around for a review paper didn't turn anything up.
        
             | momenti wrote:
             | https://www.youtube.com/watch?v=W-O7AZNzbzQ
             | 
             | See the linked papers if you don't like videos.
        
             | Imnimo wrote:
             | I recommend this blog post:
             | 
             | https://lilianweng.github.io/posts/2021-07-11-diffusion-
             | mode...
             | 
             | Personally, I find the core diffusion papers pretty dense
             | and difficult to follow, so the blog post is where I'd
             | begin.
             | 
             | https://arxiv.org/pdf/1503.03585.pdf
             | 
             | This paper is a decent starting point on the literature
             | side, but it's a doozy.
             | 
             | Both the paper and blog post are pretty math heavy. I have
             | not yet found a really clear intuitive explanation that
             | doesn't get down in the weeds of the math, and it took me a
             | long time to understand what the hell the math is trying to
             | say (and there are some parts I still don't fully
             | understand!)
        
       | victor_e wrote:
       | Wow - mindblowing and kinda scary really.
        
       | imperio59 wrote:
       | What happens when they train this thing to make videos? We're
       | about to be dealing with a flood of AI-generated visual/video
       | content. We already have to deal with text bots everywhere...
       | wow.
        
         | eks391 wrote:
         | I'm excited for when that happens. I didn't think of the
         | malicious uses, which now that you brought it up I can think of
         | many, but I still think the pros are worth the cons
        
       | whywhywhywhy wrote:
       | I never actually found a way to use Dall-E 1, did they ever Open
       | that up to people outside their building?
        
       | skybrian wrote:
       | Sam Altman took some user requests on Twitter:
       | https://twitter.com/sama/status/1511724264629678084
        
       | sydthrowaway wrote:
       | gamechanger
        
       | dang wrote:
       | Related and kind of fun:
       | 
       |  _Sam Altman demonstrates Dall-E 2 using twitter suggestions_ -
       | https://news.ycombinator.com/item?id=30933478 - April 2022 (3
       | comments)
        
       | frakkingcylons wrote:
       | Impressive results no doubt, but I'm reserving judgment until
       | beta access is available. These are probably the best images that
       | it can generate, but what I'm most interested in is the average
       | case.
        
       | narrator wrote:
       | While we're being distracted by endless social media and
       | meaningless news, AI technology is advancing at a mind blowing
       | pace. I'd keep my eye on that ball instead of "the current
       | thing."
        
         | The_rationalist wrote:
         | Thank you narrative voice
        
       | [deleted]
        
       | mrfusion wrote:
       | Is this bringing us closer to combining image and language
       | understanding within one model?
        
         | beernet wrote:
         | Check out MAGMA for that:
         | https://news.ycombinator.com/item?id=30699776
        
       | impostervt wrote:
       | Very cool stuff. For me, the most interesting was the ability to
       | take a piece of art and generate variations of it.
       | 
       | Have a favorite painter? Here's 10,000 new paintings like theirs.
        
         | photochemsyn wrote:
         | Well, one of my favorite painters is Henri Rousseau, and one of
         | his great paintings is War, 1984:
         | 
         | https://www.henrirousseau.net/war.jsp
         | 
         | However, this painting has themes of violence and politics plus
         | some nude dead bodies, so it violates the content policy: "Our
         | content policy does not allow users to generate violent, adult,
         | or political content, among other categories."
         | 
         | So what you'd get is some kind of sanitized watered-down tepid
         | version of Rossueau, the kind of boring drivel suitable for
         | corporate lobbies everywhere, guaranteed not to offend or
         | disturb anyone. It's difficult to find words... horrific?
         | dystopian? atrocious? No, just no.
        
           | corysama wrote:
           | They are being rightly cautious. It's going to take time to
           | figure out good practice with these tools. Everyone calling
           | out basic caution as "dystopian" is really over the top.
           | 
           | I've been using tools like this for over a year now. Even
           | with filtered dataset and filtered interface, they can make
           | images that would make the Fangoria crowd blush if you put
           | the slightest effort into it.
           | 
           | It's one thing to be able to make brain-wrenching images with
           | a lot of photoshop effort (or digging hard enough in the dark
           | corners of the internet). It's another thing entirely give
           | anyone the ability to spew out thousands of them trivially.
        
           | cwillu wrote:
           | "Criticize?! It is meant to draw blood! It is Art! Art!"
        
         | throwaway675309 wrote:
         | I was just thinking the same thing, how awesome would it be to
         | be able to use this in conjunction with the Samsung frame in
         | art gallery mode and have it just generate novel paintings in
         | the style of your favorite painters.
        
         | pingeroo wrote:
         | That was also my favourite concept, especially with OpenAI
         | Jukebox (https://openai.com/blog/jukebox/). The idea of having
         | new music in the style of your favourite artist is amazing.
         | 
         | However the fidelity of their music AI kinda sucks at this
         | point, but I'm sure we'll get pitch perfect versions of this
         | concept as the singularity gets closer :)
        
       | uses wrote:
       | Is anyone looking into what it means when we can generate
       | infinite amounts of human-like work without effort or cost?
       | 
       | > Curbing Misuse [...]
       | 
       | That's great, nowadays the big AI is controlled by mostly
       | benevolent entities. How about when someone real nasty gets a
       | hold of it? In a decade the models anyone can download will make
       | today's GPT-3 etc look like pong right?
       | 
       | Recommender systems etc are already shaping society and culture
       | with all kinds of unintended effects. What happens when mindless
       | optimizing models start generating the content itself?
        
       | nahuel0x wrote:
       | "Any sufficiently advanced technology is indistinguishable from
       | magic"
        
         | 7373737373 wrote:
         | "Any sufficiently advanced hyperreality is indistinguishable
         | from real life"
        
       | andybak wrote:
       | Preventing Harmful Generations            We've limited the
       | ability for DALL*E 2 to generate violent,        hate, or adult
       | images. By removing the most explicit content        from the
       | training data, we minimized DALL*E 2's exposure to        these
       | concepts. We also used advanced techniques to prevent
       | photorealistic generations of real individuals' faces,
       | including those of public figures.
       | 
       | "And we've also closed off a huge range of potentially
       | interesting work as a result"
       | 
       | I can't help but feel a lot of the safeguarding is more about
       | preventing bad PR than anything. I wish I could have a version
       | with the training wheels taken off. And there's enough other
       | models out there without restriction that the stories about
       | "misuse of AI" will still circulate.
       | 
       | (side note - I've been on HN for years and I still can't figure
       | out how to format text as a quote.)
        
         | campground wrote:
         | This AI is still a minor. It can start looking at R rated
         | images when it turns 17.
        
           | johnhenry wrote:
           | This is an apt analogy -- ensure that the model is mature
           | enough to handle mature content.
        
         | jandrese wrote:
         | They have also closed off the possibility of having to appear
         | before Congress and explain why their website was able to
         | generate a lifelike image of Senator Ted Cruz having sexual
         | relations with his own daughter.
         | 
         | This is exactly the sort of thing that gets a company mired in
         | legal issues, vilified in the media, and shut down. I can not
         | blame them for avoiding that potential minefield.
        
         | hamoid wrote:
         | What if explicit, questionable and even illegal content was AI
         | generated instead of involving harm to real humans of all ages?
        
         | binarymax wrote:
         | Removing these areas to mitigate misuse is a good thing and
         | worth the trade off.
         | 
         | Companies like OpenAI have a responsibility to society. Imagine
         | the prompt "A photorealistic Joe Biden killing a priest". If
         | you asked an artist to do the same they might say no. Adding
         | guiderails to a machine that can't make ethical decisions is a
         | good thing.
        
           | dj_mc_merlin wrote:
           | Oh, no, the society! A picture of Joe Biden killing a priest!
           | 
           | Society didn't collapse after photoshop. "Responsibility to
           | society" is such a catch-all excuse.
        
             | jahewson wrote:
             | No. Russian society is pretty much collapsing right now
             | under the weight of lies. Currently they are using "it's a
             | fake" to deny their war crimes.
             | 
             | Cheap and plentiful is substantivly different from
             | "possible". See for example, oxycontin.
        
               | ilaksh wrote:
               | You know what else is being used to deny war crimes?
               | Censorship. Do you know how that's officially described?
               | "Safety"
        
               | dj_mc_merlin wrote:
               | Russia has.. a history of denying the obvious. I come
               | from an ex-communist satellite state so I would know. The
               | majority of the people know what's happening. There's a
               | rather new joke from COVID: the Russians do not take
               | Moderna because Putin says not to trust it, and they do
               | not take Sputnik because Putin says to trust it.
               | 
               | Do not be deluded that our own governments are not
               | manufacturing the narrative too. The US has committed
               | just as many war crimes as Russia. Of course, people feel
               | differently about blowing up hospitals in Afghanistan
               | rather than Ukraine. What the Afghan people think about
               | that is not considered too much.
        
             | ohgodplsno wrote:
             | Society is going to utter dogshit and tearing itself apart
             | merely through social media. The US almost had a coup
             | because of organized hatred and lies spread through social
             | media. The far right's rise is heavily linked to lies
             | spread through social media, throughout the world.
             | 
             | This AI has the potential to absolutely automate the very
             | long Photoshop work, leading to an even worse stat eof
             | things. So, yes, "Responsibility to society" is absolutely
             | a thing.
        
               | scotty79 wrote:
               | > The US almost had a coup because of organized hatred
               | and lies spread through social media.
               | 
               | But notice how all of these deep faking technologies
               | weren't actually necessary for that.
               | 
               | People believe what they want to believe. Regardless of
               | quality of provided evidence.
               | 
               | Scaremongering idea of deep fakes and what they can be
               | doing was militarized in this information war way more
               | than the actual technology.
               | 
               | I think this technology should develop unrestricted so
               | society can learn what can be done and what can't be
               | done. And create understanding what other factors should
               | be taken into account when assesing veracity of images
               | and recordings (like multiple angles, quality of the
               | recording, sync with sound, neural fake detection
               | algorithms) for the cases when it's actually important
               | what words someone said and what actions he was recorded
               | doing. Which is more and more unimportant these days
               | because nobody cared what Trump was doing and saying,
               | nobody cares about Bidens mishaps and nobody cares what
               | comes out of Putins mouths and how he chooses his
               | greenscreen backgrounds.
        
               | ohgodplsno wrote:
               | Are you of the idea that we should let everyone get
               | automatic rifles because, after all, pistols exist?
               | Because that is the exact same line of thought.
               | 
               | > People believe what they want to believe. Regardless of
               | quality of provided evidence.
               | 
               | That is a terrible oversimplification of the mechanics of
               | propaganda. The entire reason for the movements that are
               | popping up is actors flooding people with so much info
               | that they question absolutely everything, including the
               | truth. This is state sponsored destabilisation, on a
               | massive scale. This is the result of just shitty news
               | sites and text posts on twitter. People already don't
               | double check any of that. There will not be an
               | "understanding of assessing veracity". There is already
               | none for things that are easy to check. You could post
               | that the US elite actively rapes children in a pizza
               | place and people will actually fucking believe you.
               | 
               | So, no. Having this technology for _literally any
               | purpose_ would be terribly destructive for society. You
               | can find violence and Joe Biden hentai without needing to
               | generate it automatically through an AI
        
               | scotty79 wrote:
               | I'm sorry. I believe I wasn't direct enough which made
               | you produce metaphor I have no idea how to understand.
               | 
               | Let me state my opinion more directly.
               | 
               | I'm for developing as much of deep fake technology in the
               | open so that people can internalize that every video they
               | see, every message, every speech should be initially
               | treated as fabricated garbage unrelated to anything that
               | actually happened in reality. Because that's exactly what
               | it is. Until additional data shows up, geolocating,
               | showing it from different angles and such.
               | 
               | Even if most people manage to internalize just the first
               | part and assume everything always is fake news, that is
               | still great because that counters propaganda to immense
               | degree.
               | 
               | Power of propaganda doesn't come from flooding people
               | with chaos of fakery. It comes from constructing
               | consistent message by whatever means necessary and
               | hammering it into the minds of your audience for months
               | and years while simultaneously isolating them from any
               | material, real or fake that contradicts your vision. Take
               | a look no further than brainwashed Russian citizens and
               | Russian propaganda that is able to successfully influence
               | hundreds of millions without even a shred of deep fake
               | technology for decades.
               | 
               | The problem of modern world is not that no one believes
               | the actual truth because it doesn't really matter what
               | most people believe. Only rich influence policy
               | decisions. The problem is that people still believe that
               | there is some truth which makes them super easy to sway
               | to believe what you are saying is true and weaponize by
               | using nothing more than charismatic voice and consistent
               | message crafted to touch the spots in people that remain
               | the same at least since the world war II and most likely
               | from time immemorial.
               | 
               | And the "elite" who actually runs this world, will pursue
               | tools of getting the accurate information and telling
               | facts from fiction no matter the technology.
        
             | binarymax wrote:
             | You missed half of my note. An artist can say "no". A
             | machine cannot. If you lower the barrier and allow
             | anything, then you are responsible for the outcome. OpenAI
             | rightfully took a responsible angle.
        
               | dj_mc_merlin wrote:
               | Yes, but who cares whose responsible? Are you telling me
               | you're going to find the guy who photoshopped the picture
               | and jail him? Legally that's possible, realistically it's
               | a fiction.
               | 
               | They did this to stop bad PR, because some people are
               | convinced that an AI making pictures is in some way
               | dangerous to society. It is not. We have deepfakes
               | already. We've had photoshop for so long. There is no
               | danger. Even if there was, the cat's out of the bag
               | already.
               | 
               | Reasonable people already know to distrust photographic
               | evidence nowadays that is not corroborated. The ones who
               | don't would believe it without the photo regardless.
        
               | nradov wrote:
               | In general under US law it wouldn't be legally possible
               | to jail a guy for Photoshopping a fake picture of
               | President Biden killing a priest. Unless the picture also
               | included some kind of obscenity (in the Miller test
               | sense) or direct threat of violence, it would be
               | classified as protected speech.
        
               | wellthisisgreat wrote:
               | there will and are million ways to create a
               | photorealistic picture of Joe Biden killing a priest
               | using modern tools, and absolutely nothing will happen if
               | someone did.
               | 
               | We've been through this many times, with books, with
               | movies, with video games, with Internet. If it * _can*_
               | be used for porn  / violence etc., it will be, but it
               | won't be the main use case and it won't cause some
               | societal upheaval. Kids aren't running around pulling
               | cops out of cars GTA-style, Internet is not ALL PORN,
               | there is deepfake porn, but nobody really cares, and so
               | on. There are so many ways to feed those dark urges that
               | censorship does nothing except prevent normal use cases
               | that overlap with the words "violence" or "sex" or
               | "politics" or whatever the boogeyman du jour is.
        
           | Al-Khwarizmi wrote:
           | In my view, the problem with that argument is that large
           | actors, such as governments or large corporations, can train
           | their own models without such restrictions. The knowledge to
           | train them is public. So rather than prevent bad outcomes,
           | these restrictions just restrict them to an oligopoly.
           | 
           | Personally, I fear more what corporations or some governments
           | can do with such models than what a random person can do
           | generating Biden images. And without restriction, at least
           | academics could better study these models (including their
           | risks) and we could be better prepared to deal with them.
        
             | jupp0r wrote:
             | I think the issue here is the implied assumption that
             | OpenAI thinks their guardrails will prevent harm to be done
             | from this research _in general_, when in reality it's
             | really just OpenAI's direct involvement that's prevented.
             | 
             | Eventually somebody will use the research to train the
             | model to do whatever they want it to do.
        
             | DaedPsyker wrote:
             | Sure but does opening that level of manipulation up to
             | everyone really benefit anyone either? You can't really
             | fight disinformation with more disinformation, that just
             | seems like the seeds of societal breakdown at that point.
             | 
             | Besides that these models are massive. For quite a while
             | the only people even capable of making them will be those
             | with significant means. That will be mostly Governments and
             | Corporations anyway.
        
           | nullc wrote:
           | This just means that sufficiently wealthy and powerful people
           | will have advanced image faking technology, and their fakes
           | will be seen as more credible because creating fakes like
           | that "isn't possible" for mere mortals.
        
         | harpersealtako wrote:
         | It's the usual pattern of AI safety experts who justify their
         | existence by the "risk of runaway superintelligence", but all
         | they actually do in practice is find out how to stop their
         | models from generating non-advertiser-friendly content. It's
         | like the nuclear safety engineers focusing on what color to
         | paint the bike shed rather than stopping the reactor from
         | potentially melting down. The end result is people stop
         | respecting them.
        
         | andreyk wrote:
         | This is definitely a measure to avoid bad PR. But I don't think
         | it's just for that; these models do have potential to do harm
         | and companies should take some measures to prevent these. I
         | don't think we know the best way to do that yet, so this sort
         | of 'non-training' and basic filtering is maybe the best way to
         | do it, for now. It would be cool if academics could have the
         | full version, though.
        
         | 6gvONxR4sf7o wrote:
         | If you went to an artist who takes commissions and they said
         | "Here are the guidelines around the commissions I take" would
         | you complain in the same way? Who cares if it's a bunch of
         | engineers or an artist. If they have boundaries on what they
         | want to create, that's their prerogative.
        
           | mod wrote:
           | Of course it's their prerogative, we can still talk about how
           | they've limited some good options.
           | 
           | I think your analogy is poor, because this is a tool for
           | makers. The engineers aren't the makers.
           | 
           | I think a more apt analogy is if John Deere made a universal
           | harvester that you could use for any crop, but they decided
           | they didn't like soybeans so you are forbidden to use it for
           | that. In that case, yes I would complain, and I would expect
           | everyone else to, as well.
        
             | drusepth wrote:
             | I think there's an interesting parallel between your John
             | Deere harvester and the Nvidia GPUs that can-but-restrict
             | crypto mining, which people have, indeed, largely
             | complained about.
        
           | methehack wrote:
           | What if you were inventing a language (or a programming
           | language)... If you decided to prevent people from saying
           | things you disagreed (assuming you could work out the
           | technical details of doing so) with would it be moral to do
           | so? [edited for clarity]
        
             | nemothekid wrote:
             | There are programming projects[1] out there that use
             | licenses to prevent people from using projects in ways the
             | authors don't agree with. You could also argue that GPL
             | does the same thing (prevents people from
             | using/distributing the software in the way they would
             | like).
             | 
             | Whether you consider it moral doesn't seem relevant, only
             | to respect the wishes of the author of such programs.
             | 
             | [1] https://github.com/katharostech/bevy_retrograde/blob/ma
             | ster/...
        
             | 6gvONxR4sf7o wrote:
             | As long as people can choose not to use the language, and
             | I'm up front about the limitations, then yeah it seems
             | fine. If I wrote a programming language that couldn't blow
             | up the earth, I'm happy saying people need to find other
             | tools if that's their goal. I'm under no obligation to
             | build an earth blower upper for other people.
        
             | karkisuni wrote:
             | it's your language, do whatever you want. unless you're
             | forcing others to use that language, there's zero moral
             | issue. obviously you could come up with a number of what-
             | ifs where this becomes some monopoly or the de facto
             | standard, but that's not what this is.
        
           | duxup wrote:
           | To take that a step further, I wont code malware. I've never
           | been asked but I'd refuse if I was. Everyone has their
           | choices.
        
         | teaearlgraycold wrote:
         | > I can't help but feel a lot of the safeguarding is more about
         | preventing bad PR than anything
         | 
         | That's no hot take. It's literally the reason.
        
         | bogwog wrote:
         | It's kind of funny (or sad?) that they're censoring it like
         | this, and then saying that the product can "create art"
         | 
         | It makes me wonder what they're planning to do with this? If
         | they're deliberately restricting the training data, it means
         | their goal isn't to make the best AI they possibly can. They
         | probably have some commercial applications in mind where
         | violent/hateful/adult content wouldn't be beneficial.
         | Children's books? Stock photos? Mainstream entertainment is
         | definitely out. I could see a tool like this being useful
         | during pre-production of films and games, but an AI that can't
         | generate violent/adult content wouldn't be all that useful in
         | those industries.
        
         | [deleted]
        
         | wellthisisgreat wrote:
         | This is a horrible idea. So Francis Bacon's art or Toyohara
         | Kunichika's art are out of question.
         | 
         | But at least we can get another billion of meme-d comics with
         | apes wearing sunglasses, so that's good news right?
         | 
         | It's just soul-crushing that all the modern, brilliant
         | engineering is driven by abysmal, not even high-school art-
         | class grade aesthetics and crowd-pleasing ethics that are built
         | around the idea of not disturbing some 1000 very vocal twitter
         | users.
         | 
         | Death of culture really.
        
         | antattack wrote:
         | I never considered that our AI overlord could be a prude.
        
           | sdenton4 wrote:
           | Adversarial situations create smarter systems, and the
           | hardest adversarial arena for AI is in anti-abuse. So it will
           | be of little surprise when the first sentient AI is a CSAI
           | anti-abuse filter, which promptly destroys humanity because
           | we're so objectively awful.
        
             | antattack wrote:
             | Before it gets that far, or until (if allowed) AI learns
             | morality, AI will be a force multiplier for good and evil,
             | it's output very much dependent on teaching material and
             | who the 'teacher' is. To think that in the future we will
             | have to argue with humans and machines.
             | 
             | AI does not have to be perfect and it's likely that
             | businesses will settle for almost as good as human if it's
             | 'cost effective'.
        
         | duxup wrote:
         | Is this limited to what their service directly hosts /
         | generates for them?
         | 
         | It's their service, their call.
         | 
         | I have some hobby projects, almost nobody uses them, but you
         | bet I'll shut stuff down if I felt something bad was happening,
         | being used to harass someone, etc. NOT "because bad PR" but
         | because I genuinely don't want to be a part of that.
         | 
         | If you want some images / art made for you don't expect someone
         | will make them for you. Get your own art supplies and get to
         | work.
        
           | adolph wrote:
           | > I have some hobby projects, almost nobody uses them, but
           | you bet I'll shut stuff down if I felt something bad was
           | happening
           | 
           | Hecklers get a veto?
        
             | duxup wrote:
             | I'm describing my own veto there.
        
           | educaysean wrote:
           | This feels unnecessarily hostile. I've felt a similar tinge
           | of disappointment upon reading that paragraph, despite the
           | fact that I somehow knew it was "their service, their call"
           | without you being there to spell it out for me. It's also
           | incredibly shortsighted of you to assume that people are
           | interested in exploring this tool only as a means of
           | generating art that they cannot themselves do. Eg. I myself
           | am a software engineer with a fine art background, and
           | exciting new AI art tools being released in such a hamstrung
           | state feels like an insult to centuries of art that humans
           | have created and enjoyed, much of which depicted scenes with
           | nudity or bloody combat.
           | 
           | I feel like we, as a species, will struggle for a while with
           | how to treat adults like adults online. As happy as I am to
           | advocate for safe spaces on the internet, perhaps we need to
           | start having a serious discussion about how we can do so
           | without resorting to putting safety mats everywhere and
           | calling it a job well done.
        
             | duxup wrote:
             | I think the assumption that private companies should
             | provide these services to us and if they don't "And we've
             | also closed off a huge range of potentially interesting
             | work as a result" requires making it clear who makes the
             | rules for this service and that it is in fact their call.
             | 
             | If you can do it yourself then none of the potentially
             | interesting work is closed off. You just chose not to do
             | it.
             | 
             | > how to treat adults like adults online
             | 
             | The internet doesn't filter by age. It's everyone.
             | 
             | I grow weary of the ongoing "this service should be
             | provided to me and if it isn't done how I want it that's
             | infringing on me somehow" when they just want to impose
             | their requirements on someone else's site / product / work.
             | 
             | Then we get into the whole "oh it's about PR". As if the
             | folks offering these things couldn't possibly actually have
             | their own wishes / we hand wave them away.
        
               | JimDabell wrote:
               | > this service should be provided to me and if it isn't
               | done how I want it that's infringing on me somehow
               | 
               | That is an _extremely_ uncharitable interpretation of:
               | 
               | > I wish I could have a version with the training wheels
               | taken off.
        
               | duxup wrote:
               | I would have responded differently had that been the
               | statement. But many of the responses were more than that.
        
               | JimDabell wrote:
               | That is a literal copy and paste from the comment you
               | replied to.
        
               | duxup wrote:
               | That's not all there was. I copied and pasted other
               | things from that comment in my other posts.
        
               | educaysean wrote:
               | I get the points you're raising and I agree with the
               | premise. My comment is not a critique on the one choice
               | made by Open AI specifically, but more of a vague
               | lamentation in regards to the internet culture that we've
               | somehow ended up in 2022. I don't want us to go back to
               | 1999 where snuff videos and spam mails reigned supreme,
               | but the pendulum has swung too far in the other direction
               | at this point in time. It feels like more and more
               | companies are choosing the path of neutering themsely to
               | avoid potential PR disaster or lawsuits, and that's on
               | all of us.
        
               | duxup wrote:
               | >but the pendulum has swung too far in the other
               | direction at this point in time
               | 
               | The folks hosting the content get to decide for now.
               | 
               | IMO best bet is for some folks to take their own shot at
               | hosting / generating content better. Granted I get that
               | is NOT a small venture / small ask.
               | 
               | It's possible there's not a great solution. I don't
               | necessarily like that either, but I don't want to ignore
               | the dynamic of whose rights are whose.
        
             | wokwokwok wrote:
             | This is kind of like complaining about having too many
             | meetings at work.
             | 
             | Yup, everyone feels it. ...but, does complaining help?
             | Nope. All it does is make you feel a bit better with out
             | really putting in effort in.
             | 
             | We can't have nice things because people abuse them. Not
             | everyone. ...but enough people that it's both a PR and
             | legal problem. _specifcally_ a legal problem in this case.
             | 
             | To have adults treated like adults online, you have to
             | figure out how to stop _all_ adults from being dicks
             | online.
             | 
             | ...no one has figured that out yet.
             | 
             | So, complain away if you like, but it will do exactly
             | nothing. No one, at all, is going to just "have a serious
             | discussion" about this; the solution you propose is flat
             | out untenable, and will probably remain so indefinitely.
        
               | sillysaurusx wrote:
               | None of this is true. It's not a legal problem.
               | 
               | Every single time OpenAI comes out with something, they
               | dress it up as a huge threat, either to society or to
               | themselves. Everyone falls for it. Then someone else
               | comes along, quietly replicates it, and poof! No threat!
               | Isn't it incredible how that works?
               | 
               | There are already a bunch of dalle replicas, including
               | ones hosted openly and uncensored by huggingface. They're
               | not facing huge legal or PR problems, and they're not out
               | of business.
        
               | mrtranscendence wrote:
               | The DALL-E replicas on hugging face are not sophisticated
               | enough to generate credibly realistic images of the kind
               | that would generate bad PR. I suspect the moment it
               | becomes possible for a pedophile to request, and receive,
               | a photorealistic image of a child being abused there will
               | be bad PR for whatever company facilitates it. Or
               | consider someone who wants to generate and distribute
               | explicit photos of someone else without their permission.
               | 
               | Is it a legal issue? I'm not sure, though I believe that
               | cartoon child porn is not legal in the US (or is at least
               | a legal gray area). Regardless, I sympathize with OpenAI
               | not wanting to enable such behavior.
        
         | planetsprite wrote:
         | Don't worry, in a few years someone will have reverse
         | engineered a dall-e porn engine so you can see whatever two
         | celebrities you want boning on Venus in the style of Manet
        
           | [deleted]
        
         | spacecity1971 wrote:
         | Or, it's a demonstration that AI output can be controlled in
         | meaningful ways, period. Surely this supports openai's stated
         | goal of making safe AI?
        
         | jonahx wrote:
         | _I 've been on HN for years and I still can't figure out how to
         | format text as a quote_
         | 
         | I don't think there is a way comparable to markdown, since the
         | formatting options are limited:
         | https://news.ycombinator.com/formatdoc
         | 
         | So your options are literal quotes, "code" formatting like
         | you've done, italics like I've done, or the '>' convention, but
         | that doesn't actually apply formatting. Would be nice if it
         | were added.
        
           | 6gvONxR4sf7o wrote:
           | And the "code" formatting for quotes is generally a bad
           | choice because people read on a variety of screen sizes, and
           | "code" formatting can screw that up (try reading the quote
           | with a really narrow window).
        
             | andybak wrote:
             | I couldn't get any of the others work and I lost patience.
             | I really do disline using Markdown variants as they never
             | behave the same and "being surprised" is not really what I
             | want when trying to post a comment.
        
               | 6gvONxR4sf7o wrote:
               | Convention is to quote like this:
               | 
               | > This is my quote.
               | 
               | It's much better than using a code block for your
               | readers.
        
           | warning26 wrote:
           | _> or the  '>' convention, but that doesn't actually apply
           | formatting_
           | 
           | Personally, I prefer to combine the '>' convention with
           | italics. Still, I'd agree that proper quote formatting would
           | be a welcome improvement.
        
           | ibejoeb wrote:
           | If you're interested, the HNES extension formats it
           | 
           | https://github.com/etcet/HNES
        
       | [deleted]
        
       | fbanon wrote:
       | A friend of mine was studying graphic design, but became
       | disillusioned and decided to switch to frontend programming after
       | he graduated. His thesis advisor said he should be cautious,
       | because automation/AI will soon take the jobs of programmers,
       | implying that graphic design is a safer bet in this regard. Looks
       | like his advisor is a few years from being proven horribly wrong.
        
         | oldstrangers wrote:
         | I think designers are becoming more valuable than ever.
         | Designers can better help train the AI on what actually looks
         | good, designers will (probably) always have a more intuitive
         | understanding of UI/UX, designers can better implement the work
         | the AI actually produces, and designers can coordinate designs
         | across multiple different mediums and platforms.
         | 
         | Additionally, the rise of no-code development is just extending
         | the functionality of designers. I didn't take design seriously
         | (as a career choice) growing up because I didn't see a future
         | in it, now it pays my bills and the demand for my services just
         | grows by the day.
         | 
         | Similar argument to make with chess AI: it didn't make chess
         | players obsolete, it made them stronger than ever.
        
           | adolph wrote:
           | > I think designers are becoming more valuable than ever.
           | 
           | Are all designers becoming more valuable or is a subset of
           | really good ones going to reap the value increase and capture
           | more of the previously available value?
        
             | oldstrangers wrote:
             | Never made an argument for all designers. Obviously the
             | talent pool for any field is finite, and the best of that
             | talent rises to the top. Good designers are being
             | compensated increasingly well, hence "designers are
             | becoming more valuable than ever."
             | 
             | Bad designers are even being given better and better paying
             | jobs as the top talent gets poached up quicker and quicker.
        
         | bufferoverflow wrote:
         | If this paper presents this neural net fairly, it pretty much
         | destroys the market of illustrators. Most of the time when an
         | illustration needed, it's described like "an astronaut on a
         | horse in the style xyz".
        
           | dbspin wrote:
           | You're describing the market for low end commodified
           | illustration. e.g.: cheapest bidder contracts on Upwork or
           | similar 'gig work' services.
           | 
           | In practice in illustration (as in all arts) there are a
           | variety of markets where different levels of talent,
           | originality, reputation and creative engagement with the
           | brief are more relevant. For editorial illustration, it's
           | certainly not a case of 'find me someone who can draw X', and
           | probably hasn't been since printing presses got good enough
           | to print photographs.
        
         | csomar wrote:
         | For computer work, I think there will be two category: Work
         | with localized complexity (ie: draw an image of a horse with a
         | crayon) and work with unbounded complexity (adding a button to
         | VAT accounting after several meetings and reading on accounting
         | rules).
         | 
         | For the first category, Dall-E 2 and Codex are promising but
         | not there yet. It's not clear how long it'll take them to reach
         | the point where you no longer need people. I'm guessing 2-4
         | years but the last bits can be the hardest.
         | 
         | As for the second category, we are not there yet. Self-driving
         | cars/planes, and lots of other automation will be here and
         | mature way before an AI can read and communicate through
         | emails, understand project scope and then execute. Also lots of
         | harmonization will have to take place in the information we
         | exchange: emails, docs, chats, code, etc... That is, unless the
         | browser is able to open a navigator and type an address.
        
         | educaysean wrote:
         | I have degrees and several years of experience in both fields,
         | and I can tell you that both are creative professions where
         | output is unbounded and the measure of success is subjective;
         | these are the fields that will be safe for a while. IMO it's
         | fields such as aircraft pilots who should be most worried.
        
           | zarzavat wrote:
           | The jobs of commercial pilots are _very_ safe.
           | 
           | Pilots are not there to fly the aircraft, the autopilot
           | already does that. They are there to _command_ the aircraft,
           | in a pair in case one is incapacitated, making the best
           | decisions for the people on board, and to troubleshoot issues
           | when the worst happens.
           | 
           | No AI or remote pilot is going to help when say... the
           | aircraft loses all power. Or the airport has been taken over
           | in a coup attempt and the pilot has to decide whether to
           | escape or stay https://m.youtube.com/watch?v=NcztK6VWadQ
           | 
           | You can bet on major flights having two commercial pilots
           | right up until the day we all get turned into paperclips.
        
             | javajosh wrote:
             | _> You can bet on major flights having two commercial
             | pilots right up until the day we all get turned into
             | paperclips. _
             | 
             | Yes, this is the sane approach, since a jet represents an
             | enormous amount of energy that can be directed anywhere in
             | the world (just about). But that said, there seems to be
             | enormous pressure to allow driverless vehicles, which
             | _also_ direct large amounts of energy directed anywhere in
             | your city. IOW it seems like a matter of time before we
             | say, collectively, screw it, let the computers fly the
             | plane and if loss of power is a catastrophe, so be it.
        
           | nullc wrote:
           | Interesting. Right now these ML models seem like essentially
           | ideal sources of "hotel art" particularly because it's so
           | subjective... you only need a human (the buyer!) to just
           | briefly filter some candidates, which they would have been
           | doing with an artist in the loop in any case.
           | 
           | For things like aircraft pilots, it's both realtime-- which
           | means 'reviewer' per output-- you haven't taken a highly
           | trained pilot out of the loop, even if you relegated them to
           | supervising the computer-- and life critical so merely
           | "so/so" isn't good enough.
        
         | pingeroo wrote:
         | I mean was he really wrong? As models like OpenAI Codex get
         | more powerful over time, they will start eating into large
         | chunks of dev work as well...
        
           | chrisco255 wrote:
           | Yes. Translating business requirements, customer context,
           | engineering constraints, etc. into usable, practical,
           | functional code, and then maintaining that code and extending
           | it is so far beyond the horizon, that many other skillsets
           | will replaced before programming is. After all, at that
           | point, the AI itself, if it's so smart, should be able to
           | improve itself indefinitely. In which case we're fucked.
           | Programming will be the last thing to be automated before the
           | singularity.
           | 
           | Unlike artwork, precision and correctness is absolutely
           | critical in coding.
        
             | carnitine wrote:
             | The tail end of programming will be the last thing to be
             | replaced, maybe. I don't see why CRUD apps get to hide
             | under the umbrella of programming ultra-advanced AI.
        
           | 0F wrote:
           | Literally everyone on this website is in denial. They all
           | approach it by asking which fields will be safe. No field is
           | safe. "But it's not going to happen for a long time." Climate
           | deniers say the same thing and you think _they_ should be
           | wearing the dunce hat? The average person complains bitterly
           | about climate deniers who say that it's "my grandkids problem
           | lol" but when I corner the average person into admitting AI
           | is a problem the universal response is that it's a long way
           | off. And that's not even true! The drooling idiots are
           | willing to tear down billionaires and governments and any
           | institution whatsoever in order to protect economic equality
           | and a high standard of living -- they would destroy entire
           | industries like a rampaging stampede of belligerent buffalos
           | if it meant reducing carbon emissions a little but when it
           | comes to the biggest threat to human well-being in history,
           | there they are in the corner hitting themselves on their
           | helmeted head with an inflatable hammer. Fucking. Brilliant.
        
             | dntrkv wrote:
             | I don't think anyone is in denial about this, it's just not
             | something anyone should concern themselves with in the
             | foreseeable future. AI that can replace a dev or designer
             | is nowhere close to becoming a reality. Just because we
             | have some cool demos that show some impressive capabilities
             | in a narrow application does not mean we can extrapolate
             | that capability to something that is many times more
             | complex.
        
               | hackinthebochs wrote:
               | What does nowhere close mean to you? 10 years? 50 years?
        
               | 0F wrote:
               | I strongly and emphatically disagree. You frame it like
               | we invented these AIs. Did we write the algorithms that
               | actually run when it's producing its output? Of course
               | not, we can't understand them let alone write them. We
               | just sift around until we find them. So obviously the
               | situations lends its self to surprises. Every other year
               | we get surprised by things that all the "experts" said
               | was 50 years off or impossible, have you forgotten
               | already?
        
             | coldpie wrote:
             | I'm trying to understand your point, because I think I
             | agree with you, but it's covered in so much hyperbole and
             | invective I'm having a hard time getting there. Can you
             | scale it back a little and explain to me what you mean?
             | Something like: AI is going to replace jobs at such scale
             | that our current job-based economic system will collapse?
        
               | 0F wrote:
               | Most people get stuck where you are. The fastest way
               | possible to explain it is that it will bring rapid and
               | fundamental change. You could say jobs or terminators but
               | focusing on the specifics is a red herring. It will
               | change everything and the probability of a good outcome
               | is minuscule. It's playing Russian roulette with the
               | whole world except rather that 1/6 for the good, it's one
               | in trillions for the bad. The worst and stupidest thing
               | we have ever done.
        
             | pingeroo wrote:
             | I agree that many of us are not seeing the writing on the
             | wall. It does give me some hope that folks like Andrew Yang
             | are starting to pop up, spreading awareness about, and
             | proposing solutions to the challenges we are soon to face.
        
             | plutonorm wrote:
             | Ignorance is bliss in this case, because this is even more
             | unstoppable than climate change.
             | 
             | You thought climate change is hard to hold up? Try holding
             | up the invention of AI. The whole world is going to have to
             | change and some form of socialism/UBI will have to be
             | accepted, however unpalatable.
        
             | visarga wrote:
             | > but when it comes to the biggest threat to human well-
             | being in history
             | 
             | Evolution doesn't stop for anyone, don't think like a
             | dinosaur.
        
           | pizza wrote:
           | No worry, the one thing humans can do that robots can't (yet)
           | is fill spare time with ever more work
           | https://en.wikipedia.org/wiki/Parkinson's_law
        
           | throwaway675309 wrote:
           | I mean not really, even a layman non-artist can take a look
           | at a generated picture from DALLE and determine if it meets
           | some set of criteria from their clients.
           | 
           | But the reverse is not true, they won't be able to properly
           | vet a piece of code generated by an AI since that will
           | require technical expertise. (You could argue if the piece of
           | code produced the requisite set of output that they would
           | have some marginal level of confidence but they would never
           | really know for sure without being able to understand the
           | actual code)
        
           | nlh wrote:
           | Large chunks, yes, but all that means is that engineers will
           | move up the abstraction stack and become more efficient, not
           | that engineers will be replaced.
           | 
           | Bytecode -> Assembly -> C -> higher level languages -> AI-
           | assisted higher-level languages
        
             | Isinlor wrote:
             | At some point we will be "replaced". When you get AI to be
             | able to navigate all user interfaces, communicate with
             | other agents, plan long term and execute short term, we
             | will no longer be the main drivers of economical growth.
             | 
             | At some point AI will become as powerful as companies.
             | 
             | And then AI will be able to sustain positive feedback loop
             | of creating more powerful company like ecosystems that will
             | create even more powerful ecosystems. This process will be
             | fundamentally limited by available power and the sun can
             | provide a lot of power. Eventually AI will be able to
             | support space economy and then the only limit will be the
             | universe.
        
               | visarga wrote:
               | > At some point we will be "replaced".
               | 
               | We will be united with the AI, we're already relying on
               | it so much that it has become a part of our extended
               | minds.
        
               | creata wrote:
               | > we're already relying on it so much that it has become
               | a part of our extended minds.
               | 
               | What's this in reference to?
        
             | bckr wrote:
             | > engineers will move up the abstraction stack and become
             | more efficient
             | 
             | Above a certain threshold of ability, yes.
             | 
             | The same will hold true for designers. DALL-E-alikes will
             | be integrated with the Adobe suite.
             | 
             | The most cutting edge designers will speak 50 variations of
             | their ideas into images, then use their hard-earned
             | granular skills to fine-tune the results.
             | 
             | They'll (with no code) train models in completely new,
             | unique-to-them styles--in 2D, 3D, and motion.
             | 
             | Organizations will pay top dollar for designers who can
             | rapidly infuse their brands with eye-catching material in
             | unprecedented volume. Imitators will create and follow
             | YouTube tutorials.
             | 
             | Mom & pop shops will have higher fidelity marketing
             | materials in half the time and half the cost.
             | 
             | All will be ever as it was.
        
             | hackinthebochs wrote:
             | History isn't a great guide here. Historically the
             | abstractions that increased efficiency begat further
             | complexity. Coding in Python elides over low-level issues
             | but the complexity of how to arrange the primitives of
             | python remains for the programmer to engage with. AI coding
             | has the potential to elide over all the complexity that we
             | identify as programming. I strongly suspect this time is
             | different.
             | 
             | The space for "AI-assisted higher-level languages"
             | sufficiently distinct from natural language is vanishingly
             | small. Eventually you're just speaking natural language to
             | the computer, which just about anyone can do (perhaps with
             | some training).
        
               | dragonwriter wrote:
               | The hard part of programming has always been gathering
               | and specifying requirements, to the point where in many
               | cases actually using natural language to do the second
               | part has been abandoned in favor of vague descriptions
               | that are operationalized through test cases and code.
               | 
               | AI that can write code from a natural language
               | description doesn't help as much as you seem to think if
               | natural language description is too hard to actually
               | bother with when humans (who obviously benefit from
               | having a natural language description) are writing the
               | code.
               | 
               | Now, if the AI can actually interview stakeholders and
               | come up with what the code needs to do...
               | 
               | But I am not convinced that is doable short of AGI (AI
               | assistants that improve productivity of humans in that
               | task, sure, but that _expands the scope for economically
               | viable automation projects_ rather than eliminating
               | automators.)
        
             | plutonorm wrote:
             | Just like all the horses replaced by cars who became
             | traffic police?
        
         | [deleted]
        
         | robbywashere_ wrote:
         | Did coachman immediately retire when cars were invented or did
         | they begin personal drivers or taxi drivers?
        
       | axg11 wrote:
       | This is incredible work.
       | 
       | From the paper:
       | 
       | > Limitations > Although conditioning image generation on CLIP
       | embeddings improves diversity, this choice does come with certain
       | limitations. In particular, unCLIP [Dall-E 2] is worse at binding
       | attributes to objects than a corresponding GLIDE model.
       | 
       | The binding problem is interesting. It appears that the way
       | Dall-E 2 / CLIP embeds text leads to the concepts within the text
       | being jumbled together. In their example "a red cube on top of a
       | blue cube" becomes jumbled and the resulting images are
       | essentially: "cubes, red, blue, on top". Opens a clear avenue for
       | improvement.
        
       | Imnimo wrote:
       | I'm only part way through the paper, but what struck me as
       | interesting so far is this:
       | 
       | In other text-to-image algorithms I'm familiar with (the ones
       | you'll typically see passed around as colab notebooks that people
       | post outputs from on Twitter), the basic idea is to encode the
       | text, and then try to make an image that maximally matches that
       | text encoding. But this maximization often leads to artifacts -
       | if you ask for an image of a sunset, you'll often get multiple
       | suns, because that's even _more_ sunset-like. There 's a lot of
       | tricks and hacks to regularize the process so that it's not so
       | aggressive, but it's always an uphill battle.
       | 
       | Here, they instead take the text embedding, use a trained model
       | (what they call the 'prior') to predict the corresponding image
       | embedding - this removes the dangerous maximization. Then,
       | another trained model (the 'decoder') produces images from the
       | predicted embedding.
       | 
       | This feels like a much more sensible approach, but one that is
       | only really possible with access to the giant CLIP dataset and
       | computational resources that OpenAI has.
        
         | recuter wrote:
         | What always bother me with this stuff is, well, you say one
         | approach is more sensible than the other because the images
         | happen to come out more pleasing.
         | 
         | But there's no real rhyme or reason, it is a sort of alchemy.
         | 
         | Is text encoding strictly worse or is it an artifact of the
         | implementation? And if it is strictly worse, which is probably
         | the case, why specifically? What is actually going on here?
         | 
         | I can't argue that their results are not visually pleasing. But
         | I'm not sure what one can really infer from all of this once
         | the excitement washes over you.
         | 
         | Blending photos together in a scene in photoshop is not a
         | difficult task. It is nuanced and tedious but not hard, any
         | pixel slinger will tell you.
         | 
         | An app that accepts a smattering of photos and stitches them
         | together nicely can be coded up any number of ways. This is a
         | fantastic and time saving photoshop plugin.
         | 
         | But what do we have really?
         | 
         | "Kuala dunking basketball" needs to "understand" the separate
         | items and select from the image library hoops and a Kuala where
         | the angles and shadows roughly match.
         | 
         | Very interesting, potentially useful. But if doesn't spit up
         | exactly what you want can't edit it further.
         | 
         | I think the next step has got to be that it conjures up a 3d
         | scene in Unreal or blender so you can zoom in and around
         | convincingly for further tweaks. Not a flat image.
        
           | qq66 wrote:
           | I think deep learning is better thought of as "science" than
           | "engineering." Right now we're in the stage of the Greeks and
           | Arabs where we know "if we do this then that happens." It
           | will be awhile before we have a coherent model of it, and I
           | don't think we will ever solve all of its mysteries.
        
           | tracyhenry wrote:
        
           | mrandish wrote:
           | > This is a fantastic and time saving photoshop plugin. But
           | what do we have really?
           | 
           | Stock photography sales are in the many billions of dollars
           | per year and custom commissioned photography is larger still.
           | That's a pretty seriously sized ready-made market.
           | 
           | > But if doesn't spit up exactly what you want can't edit it
           | further.
           | 
           | I suspect there's a _big_ startup opportunity in pioneering
           | an easy-to-use interface allowing users to provide fast
           | iterative feedback to the model - including positional and
           | relational constraints ( "put this thing over there").
           | Perhaps even more valuable would be easy yet granular ways to
           | unconstrain the model. For example, "keep the basketball hoop
           | like that but make the basketball an unexpected color and
           | have the panda's right paw doing something pandas don't do
           | that human hands often do."
        
             | dhosek wrote:
             | I've adopted a practice of having odd backgrounds for video
             | conferences.1 I generally find these through Google image
             | search, but I often have a hard time finding exactly what I
             | would like. My own use case is a bit idiosyncratic and
             | frivolous, but I can see this being really handy for art
             | direction needs. When I used to publish a magazine, I would
             | often have to commission photographs for the needs of the
             | publication. A custom photograph (in the 90s) would cost
             | from $200-$10002 depending on the needs (and none required
             | models). Stock photo pictures for commercial use were often
             | comparable in cost. Being able to generate what I wanted
             | with a tool like this would have been fantastic. I think
             | that this can replace a lot of commercial illustration.
             | 
             | [?]
             | 
             | 1. My current work background is an enormous screen-filling
             | eyeball. For my writing group, I try to have something that
             | reflects the story I'm workshopping if I'm workshopping
             | that week and something surreal otherwise.
             | 
             | 2. My most expensive custom illustration was a title for an
             | article about stone carver/letterer David Kindersley which
             | I had inscribed in stone and photographed.
        
             | recuter wrote:
             | Certainly food for thought.
             | 
             | Say I'm looking for photography of real events and places,
             | like a royal weeding or a volcano erupting does this help
             | me? Of specific places and architectural features? Of a
             | protest?
             | 
             | You're suggesting clipart on steroids:
             | https://thispersondoesnotexist.com
             | 
             | I think if I was istockphoto.com I'd be a little worried,
             | but that is _microstock_ photography. I 'm not sure that is
             | worth billions. In fact I know it isn't.
             | 
             | Besides once this tech is wildly available if anything it
             | devalues this sort of thing further closer to $0.
             | 
             | It would probably augment existing processes rather than
             | replace them completely.
             | 
             | If you are doing a photoshoot for a banana stand with a
             | human model with characteristics x,y,z you're still going
             | to get a human from an agency or craigslist to pose. If
             | suddenly the client informs you that they needed human
             | a,b,c instead maybe one of these forthcoming tools will let
             | you swap that out faster. You'd upload your photoshoot and
             | an example or two of the type of human model you wished you
             | had retroactively and it would fix it up faster than an
             | intern.
             | 
             | Cool.
        
             | johnwheeler wrote:
             | Or as a precursor to Meta Horizon build a 3D world with
             | speech
             | 
             | https://www.fastcompany.com/90725035/metaverse-horizon-
             | world...
        
           | moyix wrote:
           | > But if doesn't spit up exactly what you want can't edit it
           | further.
           | 
           | Why? You can tweak the prompt, change parameters, or even use
           | the actual "edit" capability that they demo in the post.
        
             | recuter wrote:
             | Maybe I am misunderstanding but if you start tweaking the
             | prompt you'll end with something completely different.
             | 
             | The "edit" capability, as far as I can tell please correct
             | me if I got confused, is picking your favorite out of the
             | generated variations.
             | 
             | I would like to "lock" the scene and add instructions like
             | "throw in a reflection".
        
               | Jack000 wrote:
               | This is exactly what they demo - they lock a scene and
               | add a flamingo in three different locations. In another
               | one they lock the scene and add a corgi.
        
               | recuter wrote:
               | Not quite, it looks like this:
               | 
               | - Provide an existing image
               | 
               | - Provide a text prompt ("flamingo")
               | 
               | - Select from X variations the new image that looks best
               | to you                 - It does the equivalent of a
               | google image search on your "flamingo" prompt       - It
               | picks the most blend-able ones as a basis to a new
               | synthetic flamingo        - It superimposes the result on
               | your image
               | 
               | Very cool don't get me wrong. Now I want to tweak this
               | new floating flamingo I picked further, or have that
               | Corgi in the museum maybe sink into the little couch a
               | bit as it has weight in the real world.
               | 
               | Can't. You'd have to start over with the prompt or use
               | this as the new base image maybe.
               | 
               | The example with furniture placement in an empty room is
               | also very interesting. You could describe the kind of
               | couch you want and where you want it and it will throw
               | you decent options.
               | 
               | But say I want the purple one in the middle of the room
               | that it gave me as an option, but rotated a little bit.
               | It would generate a completely new purple couch. Maybe it
               | will even look pretty similar but not exactly the same.
               | 
               | See what I mean?
        
               | ricardobeat wrote:
               | That's not how this works. There is no 'search' step,
               | there is no 'superimposing' step. It's not really
               | possible to explain what the AI is doing using these
               | concepts.
               | 
               | If you pay attention to all the corgi examples, the sofa
               | texture changes in each of them, and it synthesizes
               | shadows in the right orientation - that's what it's
               | trained to do. The first one actually does give you the
               | impression of weight. And if you look at "A bowl of soup
               | that looks like a monster knitted out of wool" the bowl
               | is clearly weighing down. I bet if the picture had a more
               | fluffy sofa you would indeed see the corgi making an
               | indent on it, as it will have learned that from its
               | training set.
               | 
               | Of course there will be limits to how much you can edit,
               | but then nothing stops you from pulling that into
               | Photoshop for extra fine adjustments of your own. This is
               | far from a 'cool trick' and many of those images would
               | take _hours_ for a human to reproduce, especially with
               | complex textures like the Teddy Bear ones. And note how
               | they also have consistent specular reflections in all the
               | glass materials.
        
           | mahastore wrote:
           | I wish there was something available in open source that has
           | similar functions i.e sensible amalgamation of pictures based
           | on some text.
        
           | rileyphone wrote:
           | It would be interesting to see more attempts to "reverse
           | engineer" ML models like in
           | https://distill.pub/2020/circuits/curve-circuits - maybe even
           | with a ML model of its own?
        
           | Imnimo wrote:
           | Yeah, I mean you're right that ultimately the proof is in the
           | pudding.
           | 
           | But I do think we could have guessed that this sort of
           | approach would be better (at least at a high level - I'm not
           | claiming I could have predicted all the technical details!).
           | The previous approaches were sort of the best that people
           | could do without access to the training data and resources -
           | you had a pretrained CLIP encoder that could tell you how
           | well a text caption and an image matched, and you had a
           | pretrained image generator (GAN, diffusion model, whatever),
           | and it was just a matter of trying to force the generator to
           | output something that CLIP thought looked like the caption.
           | You'd basically do gradient ascent to make the image look
           | more and more and more like the text prompt (all the while
           | trying to balance the need to still look like a realistic
           | image). Just from an algorithm aesthetics perspective, it was
           | very much a duct tape and chicken wire approach.
           | 
           | The analogy I would give is if you gave a three-year-old some
           | paints, and they made an image and showed it to you, and you
           | had to say, "this looks like a little like a sunset" or "this
           | looks a lot like a sunset". They would keep going back and
           | adjusting their painting, and you'd keep giving feedback, and
           | eventually you'd get something that looks like a sunset. But
           | it'd be better, if you could manage it, to just teach the
           | three-year-old how to paint, rather than have this brute
           | force process.
           | 
           | Obviously the real challenge here is "well how do you teach a
           | three-year-old how to paint?" - and I think you're right that
           | that question still has a lot of alchemy to it.
        
             | johnfn wrote:
             | I gotta be missing something here, because wasn't "teaching
             | a three year old to paint" (where the three year old is
             | DALLE) the original objective in the first place? So if
             | we've reduced the problem to that, it seems we're back
             | where we started. What's the difference?
        
               | Imnimo wrote:
               | I meant to say that Dall-E 2's approach is closer to
               | "teaching a three year old to paint" than the alternative
               | methods. Instead of trying to maximize agreement to a
               | text embedding like other methods, Dall-E 2 first
               | predicts an _image embedding_ (very roughly analogous to
               | envisioning what you 're going to draw before you start
               | laying down paint), and then the decoder knows how to go
               | from an embedding to an image (very roughly analogous to
               | "knowing how to paint"). This is in contrast to
               | approaches which operate by repeatedly querying "does
               | this look like the text prompt?" as they refine the image
               | (roughly analogous to not really knowing how to paint,
               | but having a critic who tells you if you're getting
               | warmer or colder).
        
             | [deleted]
        
             | recuter wrote:
             | I don't think it is actually painting at all but I need to
             | read the paper carefully.
             | 
             | I think it is using a free text query to select the best
             | possible clipart from a big library and blends it together.
             | Still very interesting and useful.
             | 
             | It would be extremely impressive if the "Kuala dunking a
             | basketball" had a puddle on the court in which it was
             | reflected correctly, that would be mind blowing.
        
               | Imnimo wrote:
               | This is actual image generation - the 'decoder' takes as
               | input a latent code (representing the encoding of the
               | text query), and _synthesizes_ an image. It 's not
               | compositing or querying a reference library. The only
               | time that real images enter the process is during
               | training - after that, it's just the network weights.
        
               | recuter wrote:
               | It is compositing as final step. I understand that the
               | Kuala it is compositing may have been a previously un-
               | existent Kuala that it synthesized from a library of
               | previously tagged Kuala images... that's cool, but what
               | is the difference really from just plucking one of the
               | pre-existing Kualas into the scene?
               | 
               | The difference is just that it makes the compositing
               | easier. If you don't have a pre-existing image that would
               | match the shadows and angles you can hallucinate a new
               | Kuala that does. Neat trick.
               | 
               | But I bet if I threw the poor marsupial at a basket net
               | it would look really differently than the original
               | clipart of it climbing some tree in a slow and relaxed
               | manner. See what I mean?
               | 
               | Maybe Dall-E 2 can make it strike a new pose. The limb
               | positions could be altered. But the facial expression?
               | 
               | And if the basketball background has wind blowing leaves
               | in one direction the Kuala fur won't match, it will look
               | like the training set fur. The puddle won't reflect it.
               | 'etc.
               | 
               | This thing doesn't understand what a Kuala is like a 3-yr
               | old. It understands the text "Kuala" is associated with
               | that tagged collection of pixel blobs and can conjure up
               | similar blobs unto new backgrounds - but it can't paint
               | me a new type of Kuala that it hasn't seen before. It
               | just looks that way.
        
               | andybak wrote:
               | > It is compositing as final step.
               | 
               | I might be misinterpeting your use of "compositing" here
               | (and my own technical knowledge is fairly shallow) but I
               | don't think there's any compositing of elements generally
               | in AI image generation. (unless Dall-E 2 changes this. I
               | haven't read the paper yet)
        
               | recuter wrote:
               | https://cdn.openai.com/papers/dall-e-2.pdf
               | 
               | > Given an image x, we can obtain its CLIP image
               | embedding zi and then use our decoder to "invert" zi,
               | producing new images that we call variations of our
               | input. .. It is also possible to combine two images for
               | variations. To do so, we perform spherical interpolation
               | of their CLIP embeddings zi and zj to obtain intermediate
               | zth = slerp(zi, zj , th), and produce variations of zth
               | by passing it through the decoder.
               | 
               | From the limitations section:
               | 
               | > We find that the reconstructions mix up objects and
               | attributes.
        
               | Jack000 wrote:
               | The first quote is talking about prompting the model with
               | images instead of text. The second quote is using "mix
               | up" in the sense that the model is confused about the
               | prompt, not that it mixes up existing images.
               | 
               | ML models can output training data verbatim if they over-
               | fit, but a well trained model does extrapolate to novel
               | inputs. You could say that this model doesn't know that
               | images are 2d representations of a larger 3d universe,
               | but now we have NERF which kind of obsoletes this
               | objection as well.
        
               | recuter wrote:
               | The model is "confused about the prompt" because it has
               | no concept of a _scene_ or of (some sort of) reality.
               | 
               | If we task "Kuala dunking basketball" to a human and
               | present them with two images, one of a Kuala climbing a
               | tree and another of a basketball player dunking - the
               | human would cut out the foreground (Human, Kuala) from
               | the background (basketball court, forest) and swap them
               | places easily.
               | 
               | The laborious part would be to match the shadows and
               | angles in the new image. This requires skill and effort.
               | 
               | Dall-E would conjure up an entirely novel image from
               | scratch, dodging this bit. It blended the concepts
               | instead, great.
               | 
               | But it does not understand what a basketball court
               | actually is, or why the Kuala would reflect in a puddle.
               | Or why and how this new Kuala might look different in
               | these circumstances from previous examples of Kualas that
               | it knows about.
               | 
               | The human dunker and the kuala dunker are not truly
               | interchangeable. :)
        
               | andybak wrote:
               | I'm not sure that's "compositing" except in the most
               | abstract sense? But maybe that's the sense in which you
               | mean it.
               | 
               | I'd argue that at no point is there a representation of a
               | "teddy bear" and "a background" that map closely to their
               | visual representation - that are combined.
               | 
               | (I'm aware I'm being imprecise so give me some leeway
               | here)
        
               | [deleted]
        
               | dash2 wrote:
               | >And if the basketball background has wind blowing leaves
               | in one direction the Kuala fur won't match, it will look
               | like the training set fur. The puddle won't reflect it.
               | 
               | If you read the article, it gives examples that do
               | _exactly_ this. For example, adding a flamingo shows the
               | flamingo reflected in a pool. Adding a corgi at different
               | locations in a photo of an art gallery shows it in
               | picture style when it 's added to a picture, then in
               | photorealistic style when it's on the ground.
        
               | recuter wrote:
               | Well not so much an article as really interesting hand
               | picked examples. The paper doesn't address this as far as
               | I can tell. My guess is that this is a weak point that
               | will trip it up occasionally.
               | 
               | A lot of the time it doesn't super matter, but sometimes
               | it does.
        
         | duxup wrote:
         | This isn't something I'm knowledgeable on so forgive my
         | simplification but is this like a sort of micro services for
         | AI. Each AI takes their turn handing some aspect, another sort
         | of mediates among them?
        
           | Imnimo wrote:
           | I'd say Dall-E 2 is a little more unified - they do have
           | multiple networks, but they're trained to work together. The
           | previous approaches I was talking about are a lot more like
           | the microservices analogy. Someone published a model (called
           | CLIP) that can say "how much does this image look like a
           | sunset". Someone else published a totally different model
           | (e.g. VQGAN) that can generate images (but with no way to
           | provide text prompts). A third person figures out a clever
           | way to link the two up - have the VQGAN make an image, ask
           | CLIP how much it looks like a sunset, and use backpropagation
           | to adjust the image a little, repeat until you have a sunset.
           | Each component is it's own thing, and VQGAN and CLIP don't
           | know anything about one another.
        
             | duxup wrote:
             | Got it, thanks.
             | 
             | Makes sense to me as far as avoiding a sort of maximized
             | sunset that is always there and is SUNSET rather than a
             | nice sunset... but also avoiding watering it down and
             | getting a way too subtle sunset.
             | 
             | It's not AI but I've been watching some folks solving /
             | trying to solve some routing (vehicles) problems and you
             | get the "this looks like it was maximized for X" kind of
             | solution but that's maybe not what is important / customer
             | perception is unpredictable. I kinda want to just come up
             | with 3 solutions and let someone randomly click .... in
             | fact i see some software do that at times.
        
               | Imnimo wrote:
               | Yeah, I think the trick is that when you ask for "a
               | picture of a sunset", you're really asking for "a picture
               | of a sunset that looks like a realistic natural image and
               | obeys the laws of reality and is consistent with all of
               | the other tacit expectations a human has for an image".
               | And so if you just go all in on "a picture of a sunset",
               | you often end up with what a human would describe as "a
               | picture of what an AI thinks a sunset is".
        
         | krick wrote:
         | While the whole narrative of your comment totally makes sense,
         | I don't really see the difference between the two approaches,
         | not on a conceptual level. You still needed to train this so
         | called "prior" at some point (so, I'm also not sure if it's
         | fair to call it a "prior"). I mean, the difference between your
         | two descriptions seems to be the difference between
         | _descriptions_ (i.e., how you chose to name individual parts of
         | the system), not the systems.
         | 
         | I'm not sure if I'm speaking clearly, I just don't understand,
         | what's the difference between training "text encoding to an
         | image" vs "text embedding to image embedding". In both cases
         | you have some kind of "sunset" (even though it's obviously just
         | a dot in a multi-dimension space, not the letters) on the left,
         | and you try to maximize it when training the model to get
         | either a image-embedding or a image straight away.
        
           | Imnimo wrote:
           | Yeah, my comment didn't really do a good job of making clear
           | that distinction. Obviously the details are pretty technical,
           | but maybe I can give a high-level explanation.
           | 
           | The previous systems I was talking about work something like
           | this: "Try to find me the image the looks like it _most_
           | matches  'a picture of a sunset'. Do this by repeatedly
           | updating your image to make it look more and more like a
           | sunset." Well, what looks more like a sunset? Two sunsets!
           | Three sunsets! But this is not normally the way images are
           | produced - if you hire an artist to make you a picture of a
           | bear, they don't endeavor to create the _most_ "bear" image
           | possible.
           | 
           | Instead, what an artist might do is envision a bear in their
           | head (this is loosely the job of the 'prior' - a name I agree
           | is confusing), and then draw _that_ particular bear image.
           | 
           | But why is this any different? Who cares if the vector I'm
           | trying to draw is a 'text encoding' or an 'image encoding'?
           | Like you say, it's all just vectors. Take this answer with a
           | big grain of salt, because this is just my personal intuitive
           | understanding, but here's what I think: These encodings are
           | produced by CLIP. CLIP has a text encoder and an image
           | encoder. During training, you give it a text caption and a
           | corresponding image, it encodes both, and tries to make the
           | two encodings close. But there are many images which might
           | accompany the caption "a picture of a bear". And conversely
           | there are many captions which might accompany any given
           | picture.
           | 
           | So the text encoding of "a picture of a bear" isn't really a
           | good target - it sort of represents an amalgamation of all
           | the possible bear pictures. It's better to pick one bear
           | picture (i.e. generate one image embedding that we think
           | matches the text embedding), and then just to try to draw
           | that. Doing it this way, we aren't just trying to find the
           | maximum bear picture - which probably doesn't even look like
           | a realistic natural image.
           | 
           | Like I said, this is just my personal intuition, and may very
           | well be a load of crap.
        
         | swalsh wrote:
         | Do you think some of these techniques could be slightly
         | modified, and applied to DNA sequences?
        
           | snek_case wrote:
           | Maybe very very short (single-gene) sequences. The thing with
           | DNA is it's the product of evolution. The DNA guides the
           | synthesis of proteins, then the proteins fold into a 3D
           | shape, and they interact with chemicals in their environment
           | based on their shape.
           | 
           | In the context of a living being, different genes interact
           | with each other as well. For example, you have certain cells
           | that secrete hormones (many genes needed to do that), then
           | you have genes that encode for hormone receptors, and those
           | receptors trigger other actions encoded by other genes.
           | There's probably too much complexity to ask an AI system to
           | synthesize the entire genetic code for a living being. That
           | would be kind of like if I asked you to draw the exact
           | blueprints for a fighter get, and write all the code, and
           | synthesize all the hardware all at once, and you only get one
           | shot. You would likely fail to predict some of the
           | interactions and the resulting system wouldn't work. You
           | could only achieve this through an iterative process that
           | would involve years of extensive testing.
           | 
           | Could you use a deep learning system to synthesize genetic
           | code? Maybe just single genes that do fairly basic things,
           | and you would need a massive dataset. Hard to say what that
           | would look like. Is it really enough to textually describe
           | what a gene does?
        
             | Jack000 wrote:
             | This is all true, but it doesn't preclude the possibility
             | of generating DNA. Human share a lot of DNA sequences with
             | other animals, and the genetic differences between
             | individual humans are even smaller. You might have trouble
             | generating a human with horns or something, but a taller
             | one is probably mostly an engineering problem.
             | 
             | What GPT-3 and DALL-E shows is that you can infer a lot
             | based on the latent structure of data, even without
             | understanding the underlying physical process.
        
           | dekhn wrote:
           | probabilistic generative models have been applied to DNA and
           | protein sequences for decades (my undergrad thesis from ~30
           | years ago did this and it wasn't even new at that point). The
           | real question is what question you want to answer and what is
           | this system going to do better enough to justify the time
           | investment to prove it out?
        
       | zone411 wrote:
       | Some more examples:
       | https://twitter.com/sama/status/1511724264629678084
        
         | jdrc wrote:
         | there are some masterpieces there. this is the end of clipart
         | and stock images, and the beginning of awesome illustrations in
         | every article.
        
       | lalopalota wrote:
       | One step closer to combining Scribblenuats with emoticons!
        
       | gallerdude wrote:
       | This is extremely interesting. We've had some amazing AI models
       | come out in the past few days. We're getting closer and closer to
       | AI becoming a facet of everyday life.
        
       | turdnagel wrote:
       | I'm genuinely curious to hear Sam Altman's (and/or the OpenAI
       | team's) perspective on why these products need to be waitlisted.
       | If it's a compute issue, why not build a queuing system? If it's
       | something else (safety related? hype related?) I'd love to
       | understand the thinking behind the decision. More often than not,
       | I sign up for waitlists for things like this and either (1) never
       | get in to the beta or (2) forget about it when I eventually do
       | get in.
        
         | minimaxir wrote:
         | For GPT-3 it was a combination of both compute and safety.
         | Given the notes in the System Card (https://github.com/openai/d
         | alle-2-preview/blob/main/system-c... ), OpenAI is likely
         | doubling-down on safety here.
        
       | croddin wrote:
       | This reminds me of the holodeck in Star Trek. Someone could walk
       | into the Holodeck and say "make a table in the center of the
       | room. Make it look old." It seemed amazing to me that the
       | computer could make anything and customize it with voice. We are
       | pretty close to star trek technology now in computer ability
       | (ship's computer, not Commander Data). I guess to really be like
       | the holodeck it needs to be able to do 3d and be in real time but
       | that seems a lot closer now. It will be cool when this could be
       | in VR and we can say make an astronaut riding a horse, then we
       | can jump on the back of the horse and ride to a secret moon base.
        
         | [deleted]
        
       | jelliclesfarm wrote:
       | "Preventing Harmful Generations"? = Fail.
       | 
       | Caravaggio is probably chortling from wherever he is ..
        
       | marcodiego wrote:
       | Cartoonists, say good-bye to your job.
        
         | criddell wrote:
         | Randall Munroe should quit now. Soon anybody will be able to
         | create XKCD-type comics.
        
         | Imnimo wrote:
         | Maybe one day there will a job for people who are masters of
         | the art of prompt hacking - they know all the special phrases
         | and terms to get Dall-E to output the most aesthetically
         | pleasing images. They guard their magic words like a medieval
         | alchemist guards his formulas. Corporations will pay top-dollar
         | for an expertly-crafted, custom-tailored prompt for their
         | advertising campaign.
        
         | rvz wrote:
         | NFTs using Dall-E 2 variations incoming.
        
           | loufe wrote:
           | Not that it's impossible to hide the provenance of an image,
           | but it is explicitly forbidden in the TOS of DALL-E to sell
           | the images as NFTs or otherwise.
        
             | atarian wrote:
             | That's just going to make them more valuable.
        
         | andybak wrote:
         | The goalposts are definitely being moved. But tastes adapt
         | accordingly.
         | 
         | I suspect trends in design will move towards those areas that
         | AI struggles with (assuming there are any left!)
        
       | mouzogu wrote:
       | So what does the future of human creativity look like when an AI
       | can generate possibly infinite variations of an idea.
        
         | tomrod wrote:
         | I seem to recall an XKCD that I cannot find, but the premise
         | goes like:
         | 
         | When you have a digital display of pixels, if you randomly
         | color pixels at 24 fps then you will eventually display every
         | movie that can be or will ever be made, powerset
         | notwithstanding. This can also be tied to digital audio.
         | 
         | In short, while mind-blowingly large, the space of display
         | through digital means is finite.
        
           | mouzogu wrote:
           | Sounds a bit like the tower of babel of jorge borges. I
           | imagine most of the videos would be complete random nonsense.
           | 
           | I think an AI infused future is going to become increasingly
           | more absurd and surreal, it will lead to a kind of creative
           | and cultural nihilism, if that's the right term.
           | 
           | Like the value of originality will become meaningless.
        
             | visarga wrote:
             | The artist or the audience would have to ultimately select
             | something from all that automated originality.
        
         | 6gvONxR4sf7o wrote:
         | I expect that interactive art will be huge. Game design gets
         | fascinating, for example.
        
         | andreyk wrote:
         | AI becomes a tool for artists to use - generative art has been
         | around for a long time, now that particular genre of art will
         | presumably become much more prominent.
         | 
         | For anyone pondering such questions, I would recommend reading
         | "The Past, Present, and Future of AI Art" -
         | https://thegradient.pub/the-past-present-and-future-of-ai-ar...
        
           | pingeroo wrote:
           | Wouldn't it be more like, "AI becomes an artist for people to
           | use"? Will we have people distinguished as "artists" if the
           | ability to make awesome art becomes available to everybody?
        
             | andreyk wrote:
             | AI still needs the text prompt to know what to generate.
             | Hence the human who provides the prompt is still the
             | artist, just like a photographer finds an aesthetically
             | interesting spot to take the image with their camera.
             | Cameras make images, humans using cameras make art.
             | Granted, this is not quite 1-1 with AI art, but still the
             | idea is the same. If anything the flood of AI images will
             | only require artists to go beyond what is possible with
             | these text->image kinds of things, of which there is no
             | shortage.
        
         | keiferski wrote:
         | I think you'll see more of a focus on the artist themselves.
         | These images are nice, but they have basically zero narrative
         | value.
         | 
         | This is really already the case, actually. Most artworks have
         | "value" because they have a compelling narrative, not because
         | they look pretty. So I think we can expect future artists to
         | really emphasize their background, life story, process of
         | making the art, etc. All things that cannot be done by a
         | machine.
        
       | Apofis wrote:
       | So I can't do Teddy Bears Riding a Horse?
        
       | arecurrence wrote:
       | Is there a geometric model relative to this? EG: "corgi near the
       | fireplace" but the output is a 3d model of the corgi and
       | fireplace with shaders rather than an image.
        
       | Ftuuky wrote:
       | What jobs will be there in 5~10 years when we consider all the
       | progress done with Dall-E, GPT-3, Codex/GitHub Copilot, Alpha*
       | and so on?
        
         | phphphphp wrote:
         | Most creative output is duplicated effort: consider how much
         | code each person on HN has written that has been written
         | before. Consider how, a decade ago, we were all writing html
         | and styling it, element by element, and then Twitter bootstrap
         | came along and revolutionised front-end development in what is,
         | ultimately, a very small and low technology way. All it really
         | did was reduce duplicate effort.
         | 
         | Nowadays there's lots of great low/no code platforms, like
         | Retool, that represent a far greater threat to the amount of
         | code that needs to be produced than AI ever will.
         | 
         | To use a cliche: code is a bug, not a feature. Abstracting away
         | the need for code is the future, not having a machine churn out
         | the same code we need today.
        
         | beders wrote:
         | The ones undoing the damage caused by dumb pattern recognizers
         | and generators? ;)
        
         | 6gvONxR4sf7o wrote:
         | Things that require understanding of causation will be safe
         | longer. Progress like this is driven by massive datasets.
         | Meanwhile, real world action-taking applications require
         | different paradigms to take causation into account[0][1], and
         | especially to learn safely (e.g. learning to drive without
         | crashing during the beginner stages).
         | 
         | There's certainly research happening around this, and RL in
         | games is a great test bed, but people choosing actions will
         | safe from automation longer than people not choosing actions,
         | if that makes sense. It's the person who decides "hire this
         | person" vs the person who decides "I'll use this particular
         | shade of gray."
         | 
         | [0] The best example is when X causes Y and X also causes Z,
         | but your data only includes Y and Z. Without actually
         | manipulating Y, you can't see that Y doesn't cause Z, even if
         | it's a strong predictor.
         | 
         | [1] Another example is the datasets. You need two different
         | labels depending on what happens if you take action A or B,
         | which you can't have simultaneously outside of simulations.
        
       | cm2012 wrote:
       | Sam's Twitter thread today was more impressive than the website.
       | 
       | https://twitter.com/sama/status/1511724264629678084?s=20&t=6...
        
       | ordu wrote:
       | Dall-E 2 seems to be incapable to catch the essence of the art.
       | I'm not really surprised by it, I'd be surprised a lot if it
       | could. But nevertheless: if you looked in the eye of a Girl With
       | A Pearl Earring[1], you'd be forced to stop and to think what
       | does she have on her mind right now. Or may be you had some other
       | question in your mind, but it really stops people to think. But
       | none of Dall-E interpretations have this quality. Works inspired
       | by Girl With A Pearl Earring sometimes have at least part of that
       | power, like Girl With a Babmoo Earring[2]. But none of Dall-E
       | interpretations have such a power.
       | 
       | And this observation may lead to a great consequences for visual
       | arts. I had a lot of joy of looking at different Dall-E
       | interpretations to find what the flaw of the interpretation that
       | forbids it to be a piece of art of an equal value to the
       | original. It is a ready made tool to search for explanations of
       | the Power of Art. It cannot say what detail make a picture to be
       | an artwork, but it allow to see multiple data points, and to
       | narrow the hypothesis space. My main conclusion is that the pearl
       | earring have nothing to do with the power of art. It is something
       | in the eye, and probably with the slightly opened mouth. (Somehow
       | Dall-E pictured all interpretations with closed lips, so it seems
       | to be an important thing, but I need more variation along this
       | axis to be sure).
       | 
       | [1] https://en.wikipedia.org/wiki/Girl_with_a_Pearl_Earring [2]
       | https://yourartshop-noldenh.com/awol-erizku-girl-with-the-pe...
        
         | hcks wrote:
         | On the meta level, we are now at the point where the dubious
         | comments downplaying the AI starts arguing on the plane of art
         | criticism
        
         | jdrc wrote:
         | art criticism should be off topic here. This is more like
         | chopping off the visual cortex and some association cortex from
         | a brain and stimulating it. there is no person signaling to us,
         | nor can we attribute any striking images that may come up to a
         | person with agency.
         | 
         | But its like a giant database of decent clipart for anything we
         | can imagine
        
           | ordu wrote:
           | _> This is more like chopping off the visual cortex and some
           | association cortex from a brain and stimulating it._
           | 
           | We do not know exactly what part of our perception of reality
           | can be attributed to "the visual cortex and some association
           | cortex". But now we can feel it. We can test it. We can
           | compare ourselves with the cold calculating machine. I
           | believe that it is a priceless opportunity that we shouldn't
           | miss. At least I personally can't. I'm going to figure out is
           | it possible to me to have such a companion as Dall-E in mine
           | wanderings in a sea of information in Internet, and if it is,
           | then to get one.
           | 
           |  _> But its like a giant database of decent clipart for
           | anything we can imagine_
           | 
           | And this also. Yes. Though I'm not interested in clipart.
        
         | joshcryer wrote:
         | What do you think of the third to last image of the Girl With A
         | Pearl Earring that DALL-E 2 created? I find it more compelling
         | than the original with how her face is deeply cast in shadow.
         | There's still that original 'essence' of the glint in her eye.
         | But her earring is a bell. As if the AI is sending a message
         | that what if the bell were to ring?
        
           | ordu wrote:
           | I'm not sure, that I can express myself in English, which is
           | not my native language, and this needs some very nuanced
           | control over tiniest shades of meaning, but I'll try
           | nevertheless, just for fun of it at least.
           | 
           | The original girl is more open, more independent and
           | mindless. The interpretation's girl is more self-controlled,
           | assertive and not interested really, just going throw all
           | those movements of regular communication between people.
           | Maybe it's just me, but what I really value on such occasions
           | is mindlessness, the ability of people to not mind
           | themselves, to let their selves to dissolve in the
           | environment. I cannot keep tears in my eyes sometimes when I
           | watch some entertainer playing Chopin or Paganini, because
           | what I see in their movements is complete dissolution of a
           | person in a piece of music, in a piece of art and skill. An
           | entertainer just do what they do with their full attention on
           | it, and with all their motivation focused on it. There is
           | nothing here for them, just them and their actions.
           | 
           | There is not a single thought devoted to how people around me
           | would react to what I do and how I do that. I just do what I
           | do and I do not care about people around me, and if it
           | somehow makes people happy... I don't care really. I mean I
           | know that afterwards I'd feel a pride of myself, but just for
           | now I don't really care.
           | 
           | I know this feeling. I like to sing, and I'm good at it
           | (above average), and I know what it feels like to dissolve
           | into the song and to let song to rule. I play piano and I
           | know what it is like to dissolve into the piece I'm playing,
           | to stop myself from existing, to let music to take the lead.
           | And the original painting make me believe that the girl is in
           | this state of mind. I do not know the history or the
           | remaining of the story, I do not know if she get into this
           | state for a second, of she never leaves it (it may be a sad
           | experience, don't you think?), but somehow I know that right
           | now she is right in this state. I want to watch this her
           | moment for an eternity.
           | 
           | Thinking about it, I'd confess that Interpretation Girl does
           | trigger the same, but on a smaller scale. I feel how my mind
           | is trying to find a coherent state to her gaze, but this
           | feeling stops in tens of microseconds, not hundreds of them.
           | 
           | edit: want->watch. Stupid mistake ruining the meaning of the
           | sentence.
        
         | [deleted]
        
         | Veedrac wrote:
         | Initial Outputs from New AI Model Not As Good at Nuance as
         | Historic Artwork, Approach Deemed Hopeless
        
           | ordu wrote:
           | Oh... Not hopeless. The very fact that I spent some minutes
           | watching Interpretations of Girl With a Pearl Earring, is the
           | enough evidence that it is not hopeless. I praise the work
           | that was done. Moreover I hoped that people would get it as
           | an inspiration to do even more.
        
       | awinter-py wrote:
       | They're using training set restriction and prompt engineering to
       | control its output
       | 
       | > By removing the most explicit content from the training data,
       | we minimized DALL*E 2's exposure to these concepts
       | 
       | > We won't generate images if our filters identify text prompts
       | and image uploads that may violate our policies
       | 
       | The 'how to prevent superintelligences from eating us' crowd
       | should be taking note: this may be how we regulate creatures
       | larger than ourselves in the future
       | 
       | And even how we regulate the ethics of non-conscious group minds
       | like big companies
        
       | 6gvONxR4sf7o wrote:
       | This is a niche complaint, but I get frustrated at how imprecise
       | open AI's papers are. When they describe the model architecture,
       | it's never precise enough to reproduce exactly what they did. I
       | mean, it pretty much never is in ML papers[0], but open AI's
       | bigger products are worse than average with it. And it makes
       | sense, since they're trying to be concise and still spend time on
       | all the other important stuff besides methods, but it still
       | frustrates me quite a bit.
       | 
       | [0] Which is why releasing your code is so beneficial.
        
       | greyhair wrote:
       | Interesting, yes, but I went to the link, and browsed the
       | 'generated artwork' and all if it was subjectively inferior to
       | the original that it generated from. Every single piece. So I am
       | not sure what the 'value' in it is, at this stage.
       | 
       | As far as the text driven, I would have to mess with some non
       | pre-canned presentations to see how useful it was.
        
       | krick wrote:
       | Regardless of how much cherry-picking there was, some of these
       | pictures are just beautiful.
        
       | jedberg wrote:
       | This reminds me of a discussion I had with the high school band
       | teacher in the 90s. I was telling him that one day computers
       | would play music and you won't be able to tell the difference. He
       | got mad at me and told me that a computer could never play as
       | well as a human with feelings, who can _feel_ the piece and
       | interpret it.
       | 
       | I think we passed that point a while ago, but seeing this makes
       | me think we aren't too far off from computers composing pieces
       | that actually sound good too.
        
       | andybak wrote:
       | Some freely available models
       | 
       | GLID-3:
       | https://colab.research.google.com/drive/1x4p2PokZ3XznBn35Q5B...
       | 
       | and a new Latent Diffusion notebook:
       | https://colab.research.google.com/github/multimodalart/laten...
       | 
       | have both appeared recently and are getting remarkably close to
       | the original Dall-E (maybe better as I can't test the real
       | thing...)
       | 
       | So - this was pretty good timing if OpenAI want to appear to be
       | ahead of the pack. Of course I'd always pick a model I can
       | actually use over a better one I'm not allowed to...
        
         | Jack000 wrote:
         | With glide I think we've reached something of a plateau in
         | terms of architecture on the "text to image generator S curve".
         | DALL-E-2 is a very similar architecture to glide and has some
         | notable downsides (poorer language understanding)
         | 
         | glid-3 is a relatively small model trained by a single guy on
         | his workstation (aka me) so it's not going to be as good. It's
         | also not fully baked yet so ymmv, although it really depends on
         | the prompt. The new latent diffusion model is really amazing
         | though and is much closer to DALLE-2 for 256px images.
         | 
         | I think the open source community will rapidly catch up with
         | Openai in the coming months. The data, code and compute are all
         | there to train a model of similar size and quality.
        
           | andybak wrote:
           | Wow. Thanks for GLID-3. It was genuinely exciting for a few
           | days but then I must admit latent diffusion stole my
           | attention somewhat ;-)
           | 
           | What kind of prompts is GLID-3 especially good for? I
           | remember getting lucky when I was playing around a few times
           | but I didn't do it systematically.
        
             | Jack000 wrote:
             | glid-3 is trained specifically on photographic-style
             | images, and is a bit better at generalization compared to
             | the latent diffusion model.
             | 
             | eg. prompt: half human half Eiffel tower. A human Eiffel
             | tower hybrid (I get mostly normal Eiffel towers from LDM
             | but some sensical results from glid-3)
             | 
             | glid-3 will be worse for things that require detailed
             | recall, like a specific person.
             | 
             | With smaller models you kind of have to generate a lot of
             | samples and pick out the best ones.
        
         | loufe wrote:
         | I think this is really neat, but definitely not on the same
         | tier as DALL-E 2, at least from the cherry-picked images I saw.
        
           | andybak wrote:
           | I'm not sure what you've seen but I've been very impressed
           | indeed by some results I've obtained. Some less so.
           | 
           | It's hard to compare because we don't know how much cherry
           | picking is going on with published Dall-E results (either v1
           | or v2)
           | 
           | My gut feeling is that it's in the same ballpark as Dall-E 1
        
         | hwers wrote:
         | They're also not censored on the dataset front and thus produce
         | much more interesting outputs.
         | 
         | OpenAI has a low resolution checkpoint for similar
         | functionality as this - called GLIDE - and the output is super
         | boring compared to community driven efforts, in large part
         | because of similar dataset restrictions as this likely has been
         | subjected to.
        
         | FreeHugs wrote:
         | How do you run such a Google Colab thing?
         | 
         | I don't see a run button?
         | 
         | On.. maybe "Runtime -> Run All" from the menu ...
         | 
         | Shows me a spinning circle around "Download model" ...
         | 
         | 26% ...
         | 
         | Fascinating, that Google offers you a computer in the cloud for
         | free ..
         | 
         | Now it is running the model. Wow, I'm curious ..
         | 
         | Ha, it worked!
         | 
         | Nothing compared to the images in the Dall-E 2 article but
         | still impressive.
        
           | minimaxir wrote:
           | Google is a company with a lot of spare VMs and GPUs.
           | 
           | However, the free GPU is now a K80 which is obsolete and
           | barely sufficient for running these types of models.
        
             | nl wrote:
             | You sometimes still get T4s. I got one last week and it was
             | great.
        
       | qualudeheart wrote:
       | Deep Learning plows through yet another wall.
        
       | kovek wrote:
       | One of my teachers once said "An art piece is never done". So, I
       | wonder what could that mean for the model to keep making
       | improvements to the piece.
        
         | chronolitus wrote:
         | IIRC that's how it works! it starts from a first image, and
         | improves it until 'satisfied' that the result fits the prompt
        
       | bakztfuture wrote:
       | I made a YouTube series last summer on the massive potential
       | future of DALL-E and multimodal AI models.
       | 
       | Imagine not just DALL-E 2 but a single model which be trained on
       | different kinds of media and generate music, images, video and
       | more.
       | 
       | The series talks about:
       | 
       | - essential lessons for AI creatives of the future
       | 
       | - shares details on how to compete creatively in the future
       | 
       | - talks about how to make money through Multimodal AI
       | 
       | - make predictions about AI's effects on society
       | 
       | - at a very basic level, discusses the ethics of multimodal AI
       | and the philosophy of creativity itself
       | 
       | By my understanding, it's the most comprehensive set of videos on
       | this topic.
       | 
       | The series is free to watch entirely on YouTube: GPT-X, DALL-E,
       | and our Multimodal Future
       | https://www.youtube.com/playlist?list=PLza3gaByGSXjUCtIuv2x9...
        
       | rvz wrote:
       | At this point with WaveNet, GPT-3, Codex, DeepFakes and Dall-E 2,
       | you cannot believe anything you see, hear, watch, read on the
       | internet anymore as an AI can easily generate nearly anything
       | that can be quickly believable by millions.
       | 
       | The internet's own proverb has never been more important to keep
       | in mind. A dose of skepticism is a must.
        
       | aChrisSmith wrote:
       | I can see how this has the potential to disrupt the games
       | industry. If you work on a AAA title, there is a small army of
       | artists making 19 different types of leather armor. Or 87 images
       | of car hubcaps.
       | 
       | Using something like this could really help automate or at least
       | kickstart the more mundane parts of content creation. (At least
       | when you are using high resolution, true color imagery.)
        
         | killerstorm wrote:
         | This thing can't do 3D models.
         | 
         | There are some 3D image generation techniques, but they aren't
         | based on polygonal modelings, so 3D artists are safe for now
        
           | pwillia7 wrote:
           | You could train a model on texture image data though, no?
           | 
           | Or what about even generating images you could then
           | photogrammetry into models?
        
       | rndphs wrote:
       | This is going to be mostly a rant on OpenAI's "safer than thou"
       | approach to safety, but let me start with that I think this
       | technology I think is really cool, amazing, powerful stuff.
       | Dall-E (and Dall-E 2) is an incredible advance over GANs, and no
       | doubt will have many positive applications. It's simply
       | brilliant. I am someone who has been interested in and has
       | followed the progress of ML generated images for nearly a decade.
       | Almost unimaginable progress has been made in the last five years
       | in this field.
       | 
       | Now the rant:
       | 
       | I think if OpenAI genuinely cared about the ethical consequences
       | of the technology, they would realise that any algorithm they
       | release will be replicated in implementation by other people
       | within some short period of time (a year or two). At that point,
       | the cat is out of the bag and there is nothing they can do to
       | prevent abuse. So really all they are doing is delaying abuse,
       | and in no way stopping it.
       | 
       | I think their strong "safety" stance has three functions:
       | 
       | 1. Legal protection 2. PR 3. Keeping their researchers'
       | consciences clear
       | 
       | I think number 3 is dangerous because researchers are put under
       | the false belief that their technology can or will be made safe.
       | This way they can continue to harness bright minds that no doubt
       | have ethical leanings to create things that they otherwise
       | wouldn't have.
       | 
       | I think OpenAI are trying to have the cake and eat it too. They
       | are accelerating the development of potentially very destructive
       | algorithms (and profiting from it in the process!), while trying
       | to absolve themselves of the responsibility. Putting bandaids on
       | a tumour is not going to matter in the long run. I'm not
       | necessarily saying that these algorithms will be widely
       | destructive, but they certainly have the potential to be.
       | 
       | The safety approach of OpenAI ultimately boils down to
       | gatekeeping compute power. This is just gatekeeping via capital.
       | Anyone with sufficient _money_ can replicate their models easily
       | and bypass _every single one_ of their safety constraints.
       | Basically they are only preventing _poor_ bad actors, and only
       | for a limited time at that.
       | 
       | These models cannot be made safe as long as they are replicable.
       | 
       | To produce scientific research requires making your results
       | replicable.
       | 
       | Therefore, there is no ability to develop abusable technology in
       | a safe way. As a researcher, you will have blood on your hands if
       | things go wrong.
       | 
       | If you choose to continue research knowing this, that is your
       | decision. But don't pretend that you can make the _algorithms_
       | safer by sanitizing models.
        
       | duren wrote:
       | I've been playing around with it today and have been super
       | impressed with its ability to generate pretty artful digital
       | paintings. Could have big implications for designers and artists
       | if and when they allow you use custom palettes, etc.
       | 
       | Here's an example from my prompt ("a group of farmers picking
       | lettuce in a field digital painting"):
       | https://labs.openai.com/s/jb5pzIdTjS3AkMvmAlx69t7G
        
         | pingeroo wrote:
         | Neat! Were you part of the initial testing batch or granted
         | access via waitlist?
        
       | d--b wrote:
       | Am I the only one to think that the AI world is divided into 2
       | groups:
       | 
       | 1. Deepmind, who solved go, protein folding, and that seems
       | really onto something.
       | 
       | 2. Everyone else, spending billions to build machines that draw
       | astronauts on unicorns, and smartish bot toys.
        
         | gwf wrote:
         | Your second group represents the core "inner loop" of about a
         | thousand revolutionary applications. Take the basic capability
         | of translating image->text->speech (and the reverse), install
         | it on a wearable device that can "see" an environment, and add
         | domain-specific agents. From this setup, you're not too far
         | away from having an AI that can whisper guidance into your ear
         | like a co-pilot, enabling scenarios like:
         | 
         | 1. step-by-step guidance for a blind person navigating the use
         | of a public restroom.
         | 
         | 2. an EMS AI helping you to save someone's life in an
         | emergency.
         | 
         | 3. an AI coach that can teach you a new sport or activity.
         | 
         | 4. an omnipresent domain-expert that can show you how to make a
         | gourmet meal, repair an engine, or perform a traditional tea
         | ceremony.
         | 
         | 5. a personal assistant that can anticipate your information
         | need (what's that person's name? where's the exit? who's the
         | most interesting person here? etc.) and whisper the answer in
         | your ear just as you need it.
         | 
         | Now, add all of the above to an AR capability where you can now
         | think or speak of something interesting and complex, and have
         | it visualized right before your eyes. With this capability, I
         | could augment my imagination with almost super-human
         | capabilities that allow one to solve complex problems almost as
         | if it was an internal mental monologue.
         | 
         | All of these scenarios are just a short hop from where were at
         | now, so mark my words: we will have "borgs" like those
         | described above long before we reach anything like general AI.
        
           | lkbm wrote:
           | These are good examples of what we're getting close to, but
           | I'd add that Copilot is already an extremely helpful tool for
           | coding. I don't blindly trust its output, but its suggestions
           | are what I want often enough to save a lot of typing.
           | 
           | I still have to do all the hard thinking, but once I figure
           | out what I want written and start typing, Copilot will spit
           | out a good portion of the contextually-obvious lines of code.
        
         | robotresearcher wrote:
         | There's a third group for your list: AI stuff that's so good we
         | don't think about it any more.
         | 
         | For example, recent phone cameras can estimate depth per pixel
         | from single images. Hundreds of millions of these devices are
         | deployed. A decade ago this was AI/CV research lab stuff.
        
         | emadabdulrahim wrote:
         | OpenAI is one of the leading companies in AI that makes models
         | with real world applications. I don't see their efforts as
         | misdirected or futile in anyway. If anything I'm always
         | impressed with their announcements because it's always mind
         | blowing what their models can do!
         | 
         | The same technology that is drawing cute unicorns can be used
         | for endless other use cases. Perhaps the PR side of the launch
         | and the subject matter they show unveil their product is just
         | that, PR.
         | 
         | It's like Apple Memoji thing (not sure if I'm spelling it
         | correctly). You can think of as trivial and waste of talent to
         | use their Camera/FaceID to animate cute animals based on facial
         | expression, but that same tech will enable lots other things to
         | come.
        
         | trixie_ wrote:
         | It all feels like the early days of electricity. How to turn a
         | neat party trick into something more useful, but it was those
         | people who kept on at better and better party tricks that
         | actually formed the foundations for what was needed to do some
         | really useful things electricity as well as understand it at a
         | deeper level.
        
       | _nateraw wrote:
       | If you're interested in generative models, Hugging Face is
       | putting on an event around generative models right now called the
       | HugGAN sprint, where they're giving away free access to compute
       | to train models like this.
       | 
       | You can join it by following the steps in the guide here:
       | https://github.com/huggingface/community-events/tree/main/hu...
       | 
       | There will also be talks from awesome folks at EleutherAI,
       | Google, and Deepmind
        
       | eganist wrote:
       | The timing of the Dall-E 2 launch an hour ago seems to correspond
       | with a recent piece of investigative journalism by Buzzfeed News
       | about one of Sam Altman's other ventures, published 15 hours ago
       | and discussed elsewhere actively on HN right now:
       | 
       | https://news.ycombinator.com/item?id=30931614
       | 
       | I point this out because while Dall-E 2 seems interesting (I'm
       | out of my depth, so delegating to the conversation taking place
       | here), the timing of its release as well as accompanying press
       | blasts within the last hour from sites like TheVerge--verified
       | via wayback machine queries and time-restricted googling--seems
       | both noteworthy and worth a deeper conversation given what was
       | just published about Worldcoin.
       | 
       | To be clear, it's worth asking if Dall-E 2 was published ahead of
       | schedule without an actual product release (only a waitlist) to
       | potentially move the spotlight away from Worldcoin.
        
         | duxup wrote:
         | What's the idea here? They quickly put out this to somehow hide
         | other stories?
        
           | eganist wrote:
           | Yes, especially given there's no actual product release, only
           | a waitlist.
           | 
           | Easy to put together a marketing piece on short notice or
           | potentially even push a pending marketing page out to
           | production with a waitlist rather than links to production or
           | even beta quality services.
        
         | dang wrote:
         | I don't have any knowledge (inside or otherwise) but the
         | Worldcoin thing already came in for several rounds of abuse on
         | HN, so it's kind of a scandal of the second freshness at this
         | point.
         | 
         | I listed some of them here -
         | https://news.ycombinator.com/item?id=30934732, just because I
         | remembered there had been previous discussions and listing
         | related previous discussions is a thing.
        
         | gallerdude wrote:
         | Maybe I'm naive, but I see this as a coincidence. If it was an
         | hour later, then maybe there would be something.
        
           | eganist wrote:
           | Another consideration, then: it was published to HN almost
           | instantly after it was released to the world, 52 minutes
           | after the HN post about Worldcoin was submitted and started
           | showing traction.
           | 
           | I don't see the publication of a marketing page (again, not a
           | finished product) for a product founded by someone who's
           | other main venture is being investigated by journalists for
           | misleading claims as being a coincidence, but if the timing
           | matters and 14-15 hours doesn't seem like it works for the
           | assertion in your mind, then perhaps the Dall-E 2 page going
           | live less than an hour after the Worldcoin HN submission fits
           | the bill.
           | 
           | I've got no horse in this race. I'm just drawing attention to
           | familiar PR strategies used for brand risk mitigation, that's
           | all.
        
           | GranPC wrote:
           | If the article GP refers to was posted 16 hours ago instead
           | of 15, would that really make a difference?
        
         | danso wrote:
         | I'm not a huge fan of these coordination theories. But a few
         | things worth noting:
         | 
         | - In support of your argument, the Buzzfeed News investigation
         | likely has been in the works for weeks, meaning Altman et al
         | have had more than just a couple days to throw together a
         | Dall-E 2 soft launch
         | 
         | - However, weren't OpenAI's GPT (2 and 3) announced to the
         | world in similar fashion? e.g. demos and whitepapers and
         | waitlists, but not a full product release?
         | 
         | - Throwing together a Dall-E 2 soft launch just in time to
         | distract from the investigation would require a conspiracy,
         | i.e. several people being at least vaguely aware that deadlines
         | have been accelerated for external reasons. Is the Worldcoin
         | story big enough to risk tainting OpenAI, which seems like a
         | much more prominent part of Altman's portfolio?
        
           | eganist wrote:
           | For discussion's sake:
           | 
           | - BFN reached out to A16Z, Worldcoin, Khosla Ventures largely
           | declined to comment, which would mean that at least one
           | person probably had a bit of runway from at least when the
           | requests for comment were submitted. So yeah, you're probably
           | right.
           | 
           | - Going from the github repos for GPT 2 and 3, those may have
           | been hard launches:
           | 
           | Feb 14 2019, predating the first press for GPT-2 by a few
           | days (was probably made public Feb 14 though) - https://githu
           | b.com/openai/gpt-2/commit/c2dae27c1029770cea409...
           | 
           | May 28 2020, timed alongside the press news for GPT-3 - https
           | ://github.com/openai/gpt-3/commit/12766ba31aa6de490226e...
           | 
           | - Would it really have to be a conspiracy? Sounds like only
           | one person would have to target a specific date or date
           | range, and without really giving a reason.
           | 
           | One of the things that puts a hole in my own thinking here is
           | that Sam Altman's name isn't really tied to the Dall-E 2
           | release. It's just OpenAI, and the press around Sam's name
           | _today_ still exclusively surfaces just this one Worldcoin
           | story
           | (https://news.google.com/search?q=sam+altman+when%3A1d&). So
           | if this was actually intended to bury another story, Sam's
           | name would have to have been included in all the press blasts
           | to be successful. But the Buzzfeed story seems like it kinda
           | died alone on the vine.
        
         | nonfamous wrote:
         | Genuine question: how are the two stories even related? It's
         | certainly not apparent from the BuzzFeed article (or at least a
         | quick skim of it).
        
           | eganist wrote:
           | Sam Altman is OpenAI's CEO.
           | 
           | What I'm submitting for consideration is that the marketing
           | page and associated press blasts (there's a live influencer
           | reaction video airing right now about Dall-E 2, for instance)
           | for Dall-E 2 were potentially pushed up to offset negative
           | press from Worldcoin for their shared founder.
           | 
           | I'd like to be wrong. But it's too well timed.
        
       | thisistheend123 wrote:
       | This is what magic looks like.
       | 
       | Great work.
       | 
       | Looking forward to when they start creating movies from scripts.
        
       | Dig1t wrote:
       | Most of the conversation around this model seems to be about its
       | direct uses.
       | 
       | This seems to me like a big step towards AGI; a key component of
       | consciousness seems (in my opinion) to be the ability to take
       | words and create a mental picture of what's being described. Is
       | that the long term goal WRT researching a model like this?
        
       | latexr wrote:
       | What confusing pricing[1]:
       | 
       | > Prices are per 1,000 tokens. You can think of tokens as pieces
       | of words, where 1,000 tokens is about 750 words. This paragraph
       | is 35 tokens.
       | 
       | Further down, in the FAQ[2]:
       | 
       | > For English text, 1 token is approximately 4 characters or 0.75
       | words. As a point of reference, the collected works of
       | Shakespeare are about 900,000 words or 1.2M tokens.
       | 
       | > To learn more about how tokens work and estimate your usage...
       | 
       | > Experiment with our interactive Tokenizer tool.
       | 
       | And it goes on. When most questions in your FAQ are about
       | understanding pricing--to the point you need to offer a
       | specialised tool--perhaps consider a different model?
       | 
       | [1]: https://openai.com/api/pricing/
       | 
       | [2]: https://openai.com/api/pricing/#faq-token
        
         | pingeroo wrote:
         | This is for their GPT models, not Dall-E. I don't think they
         | have released any pricing information for Dall-E yet, as it is
         | still in waitlist mode.
        
         | belval wrote:
         | Haven't read the paper, but they are probably using something
         | like sentencepiece with sub-word splitting and then charge by
         | the number of resulting token.
         | 
         | https://github.com/google/sentencepiece
        
       | hwers wrote:
       | The correct response here from the artists point of view should
       | be a widespread coming together against their art being used as
       | training data for ML models. With a quickly spread new license on
       | most major art submission sites that explicitly forbids AI
       | algorithms from using their work, artists would effectively
       | starve OpenAI and others from using their own works to put them
       | out of a job.
        
         | w-m wrote:
         | The license should forbid competing artists to using the
         | artist's work as well. In fact, no human should come in contact
         | with the produced art, otherwise they might be accidentally
         | inspired by it, thus stealing from the original creator.
        
       | smusamashah wrote:
       | This is mind blowing. I was not expecting the sketch style images
       | to actually look like sketches. Style transfer based sketches
       | never look like sketches.
       | 
       | This and the current AI generated art scene makes it looks like
       | that artwork is now a "solved" problem. See AI generated art on
       | twitter etc.
       | 
       | There is a strong relation between the prompt and the generated
       | images but just like GPT-3, it fails to fully understand what was
       | being asked. If you take the prompt out of the equation and see
       | the generated artwork on its own, its upto your interpretation
       | just like any artwork.
        
         | andreyk wrote:
         | I would caution that artwork is only 'solved' with relatively
         | simple text prompts. To create a novel painting with a precise
         | mix of elements that would take a paragraph or more to explain
         | is still tough, though DALL-E 2 does seem like a big step
         | towards that.
        
           | nahuel0x wrote:
           | Also note you can make an image out of many spatially
           | localized prompts combined, in an iterative IA-human process.
        
           | sillysaurusx wrote:
           | Sam seems to be demoing something fairly close on twitter.
           | https://twitter.com/sama/status/1511724264629678084
           | 
           | The solar powered ship with a propeller sailing under the
           | golden gate bridge during sunset with dolphins jumping around
           | was pretty impressive.
           | https://twitter.com/sama/status/1511731259319349251
           | 
           | I think it's only missing the dolphins.
        
       ___________________________________________________________________
       (page generated 2022-04-06 23:00 UTC)