[HN Gopher] DALL*E now available in beta ___________________________________________________________________ DALL*E now available in beta Author : todsacerdoti Score : 552 points Date : 2022-07-20 16:30 UTC (6 hours ago) (HTM) web link (openai.com) (TXT) w3m dump (openai.com) | naillo wrote: | I wonder if they'll even make back what they spent on training | the models before competitors of equal quality and lower cost | eats up their margins. | tourist_on_road wrote: | Super impressive to see how OpenAI managed to bring the project | from research to production (something usable for creatives). | This is non trivial since the usecase involves filtering NSFW | content, reducing bias in generated images. Kudos to the entire | team. | seshagiric wrote: | For those who want to try DALL.E but do not have access yet, this | is good play site: https://www.craiyon.com/ | totetsu wrote: | I was really enjoying using Dalle2 to take surrealist walks | around the latent image space of human cultural production. I was | using it as one might use Wikipedia researching the links between | objects and their representation. Also just to generate | suggestion for what to have for lunch. None of this was for | anything of commercial value to me. What am I to do now, start to | find ways to sell the images I'm outputting? Do I displace the | freelance artists in the market who actually have real talent and | ability to create images and compositions and who studied how use | the tools of the trade. Does the income artists can make now get | displaced by people using dalle? Then do people stop learning how | to actually make art and we come to the end of new cultural | production and just start remixing everything made untill now? | totetsu wrote: | With real artist left only making images of sex and violence | and other TOS violations | [deleted] | password321 wrote: | The worlds most expensive meme generator. | cypress66 wrote: | > Reducing bias: We implemented a new technique so that DALL*E | generates images of people that more accurately reflect the | diversity of the world's population. This technique is applied at | the system level when DALL*E is given a prompt about an | individual that does not specify race or gender, like "CEO." | | Will it do it "more accurately" as they claim? As in, if 90% of | CEOs are male, then the odds of a CEO being male in a picture is | 90%? Or less "accurately reflect the diversity of the world's | population" and show what they would like the real world to be | like? | president wrote: | Most likely this was something forced by their marketing team | or their office of diversity. Given the explanation of the | implementation (arbitrarily adding "black" and "female" | qualifiers), it's clear it was just an afterthought. | [deleted] | klohto wrote: | hardmaru on Twitter has examples. It's the second, the one they | would like it to be. | kache_ wrote: | They literally just add "black" and "female" with some weight | before any prompt containing person. | | A comical work around to so called "bias" (isn't the whole | point of these models to encode some bias?). Here's some | experimentation showing this. | | https://twitter.com/rzhang88/status/1549472829304741888 | | As competitors with lower price points prop up, you'll see | everyone ditch models with "anti bias" measures and take their | $ somewhere else. Or maybe we'll get some real solution, that | adds noise to the embeddings, and not some half assed | workaround to the arbitrary rules that your resident AI | Ethicist comes up with. | danielvf wrote: | Add _after_. So you can see the added words by making a | prompt like "a person holding a sign saying ", and then the | sign says the extra words if they are added. | kache_ wrote: | Yeah actually, good call. The position of the token | matters, since these things use transformers to encode the | embeddings. | | https://www.assemblyai.com/blog/how-imagen-actually-works/ | whywhywhywhy wrote: | How does it deal with bias that is negative? | | Would only work for positive biases where if they actually | want to equalize it then it needs to be adding the opposite | to negative biases. | | To counteract the bias of their dataset they need to have | someone sitting there actively thinking in bias to counteract | the bias with anti-bias seasoning for every bias causing | term. Feel bad for whatever person is tasked with that job. | | Could always just fix your dataset, but who's got time and | money to do that /s | naillo wrote: | It's also funny that this likely won't 'unbias' any actual | published images coming out of it. If 90% of the images in the | world has a male CEO, then for whatever reason that's the image | people will pick and choose from DALL-Es output. (Generalized | to any unbiasing - i.e. they'll be debiased by humans.) | bequanna wrote: | Imagine you're in South Korea (or any other ethnically | homogenous country). Do you want "black" "female" randomly | appended to your input? | educaysean wrote: | If I was using this in South Korea, how is showing all | white people any better than showing whites, blacks, | latinos and asians? | bequanna wrote: | You would presumably input "South Korean CEO". DALL-E | would then unhelpfully add "black" "female" without your | knowledge. | educaysean wrote: | I just tried it out and it looks like DALL-E isn't as | inept as you imagined. Exact query used was 'A profile | photo of a male south korean CEO', and it spat out 4 very | believable korean business dudes. | | Supplying the race and sex information seems to prevent | new keywords from being injected. I see no problem with | the system generating female CEOs when the gender | information is omitted, unless you think there are? | astrange wrote: | I don't think they "randomly insert keywords" like people | are claiming, I think they probably run it through a GPT3 | prompt and ask it to rewrite the prompt if it's too | vague. | | I set up a similar GPT prompt with a lot more power | ("rewrite this vague input into a precise image | description") and I find it much more creative and useful | than DALLE2 is. | bequanna wrote: | Isn't the diversity keyword injection random? | | My point is that it is pointless. If you want an image of | a <race> <gender> person included, you can just specify | it yourself. | educaysean wrote: | > If you want an image of a <race> <gender> person | included, you can just specify it yourself. | | I agree wholeheartedly. So what are we arguing about? | | What we're seeing is that DALL-E has its own bias- | balancing technique it uses to nullify the imbalances it | knows exists in its training data. When you specify | ambiguous queries it kicks into action, but if you wanted | male white CEOs the system is happy to give it to you. | I'm not sure where the problem is. | totetsu wrote: | Yes the quality of surrealist generations went down with that | change suddenly including gender and race into prompts that I | really didn't want anything specific in. Like a snail radio DJ, | and suddenly the microphone is a woman of colours head.. I | understand the intention but I want this to be a default on but | you can turn it off thing. | TheFreim wrote: | It's also odd since you'd think that this would be an issue | solved by training with representative images in the first | place. | | If you used good input you'd expect an appropriate output, I | don't know why manual intervention would be necessary unless | it's for other purposes than stated. I suspect this is another | case where "diversity" simply means "less whites". | StrictDabbler wrote: | If accurately reflects the _world_ population then only one in | six pictures will be a white person. Half the pictures will be | Asian, another sixth will be Indian. | | Slightly more than half of the pictures will be women. | | That accurately represents the world's diversity. It won't | accurately reflect the world's power balance but that doesn't | seem to be their goal. | | If you want to say "white male CEO" because you want results | that support the existing paradigm it doesn't sound like | they'll stop you. I can't imagine a more boring request. | | Let's look at _interesting_ questions: | | If you ask for "victorian detective" are you going to get a | bunch of Asians in deerstalker caps with pipes? | | What about Jedi? A lot of the Jedi are blue and almost nobody | on Earth is. | | Are cartoon characters exempt from the racial algorithm? If I | ask for a Smurf surfing on a pizza I don't think that making | the Smurf Asian is going to be a comfortable image for any | viewer. | | What about ageism? 16% of the population is over sixty. Will a | request for "superhero lifting a building" have an 16% chance | of being old? | | If I request a "bad driver peering over a steering wheel" am I | still going to get an Asian 50% of the time? Are we ok with | that? | | I respect the team's effort to create an inclusive and | inoffensive tool. I expect it's going to be hard going. | bequanna wrote: | > inoffensive tool. | | Wouldn't that result end up being like "inoffensive art" or | "inoffensive comedy"? | | Bland, boring and Corporate-PC. | erikpukinskis wrote: | Being offensive is only one way to be interesting. | | There are others, like being clever, or being absurd, or | being goofy, or being poignant, or being refreshing. | | Of the good stuff, offensive humor is only a tiny slice. | jazzyjackson wrote: | offensive _to whom_ is the sticking point when it comes | to comedy | | it takes a special talent to please everybody | driverdan wrote: | To a certain degree, yes. They care more about the image of | the project than art. Considering a large amount of art | depicts non-sexual nudity yet they block all nudity, art is | not their primary concern. | bequanna wrote: | Some people claim to be emotionally "triggered" by images | of police. Does that mean DALL-E should also start | blocking images that contain police? | visarga wrote: | You know a surprising way to solve the issues you presented? | You train another model to trick DALL-E to generate | undesirable images. It will use all its generative skills to | probe for prompts. Then you can use those prompts to fine- | tune the original model. So you use generative models as a | devil's advocate. | | - Red Teaming Language Models with Language Models | | https://arxiv.org/abs/2202.03286 | bjt2n3904 wrote: | Will it reduce bias across all fields? Or only ones that are | desirable? How about historical? | | "A photo of a group of soldiers from WW2 celebrating victory | over nazi CEOs and plumbers". | noelsusman wrote: | In their examples, the "After mitigation" photos seem more | representative of the real world. Before you got nothing but | white guys for firefighter or software engineer and nothing but | white ladies for teacher. That's not how the real world | actually is today. | | I'm not sure how they would accomplish 100% accurate | proportions anyway, or even why that would be desirable. If I | don't specify any traits then I want to see a wide variety of | people. That's a more useful product than one that just gives | me one type of person over and over again because it thinks | there are no female firefighters in the world. | scifibestfi wrote: | The latter. Here's what we, a small number of people, think the | world should look like according to our own biases and | information bubble in the current moment. We will impose our | biases upon you, the unenlightened masses who must be | manipulated for your own good. And for god sakes, don't look | for photos of the US Math team or NBA Basketball or compare | soccer teams across different countries and cultures. | bequanna wrote: | > Here's what we, a small number of people, think the world | should look like according to our own biases and information | bubble in the current moment. | | You're being quite charitable. It is much more likely that | optics and virtue signaling is behind this addition. | erikpukinskis wrote: | If I search for "food" I don't want to see a slice of pizza | every time, even if that's the #1 food. I want to see some | variety. | | I think you're jumping to quickly to bad intentions. | Injecting diversity of results is a sane thing to do, | totally irrespective of politics. | aledalgrande wrote: | I wonder at this price point which kind of business can use DALL | E at scale? | hit8run wrote: | It's so dirty what Microsoft is doing here. They ripped the tech | out of developers hands just to sell us drips of it. Drips that | are not enough to build a product for more than a few people. | They require to check on the use before launching etc. I truly | hate this company, their shitty operating system and their | monopoly business game. Everything they buy turns to shit. And | don't tell me about VSCode. It's just a trap to fool developers. | NaughtyShiba wrote: | Slightly offtopic, but how one would report false-positive check | in content policy check? | Al-Khwarizmi wrote: | In beta, maybe, but I don't think "available" means what they | think it means. | | I have been on the waitlist from the very beginning. Still | waiting. | skilled wrote: | I can't check right now but this mean the watermark is also gone | and images will have a higher resolution? | gverri wrote: | Watermarks are still there and resolution still 1024x1024. | skilled wrote: | I wonder if they have plans to allow SVG exports in the | future. I mean, the file size would probably be ridiculous in | a lot of the cases, but for my use case I wouldn't mind it. | And sucks about the watermark, maybe they will introduce an | option to pay for removing it. | rahimnathwani wrote: | SVG exports would only be meaningful if the model is | generating vector images, which are then converted to | bitmaps. I highly doubt that's the case, but perhaps | someone who has actually looked at the model structure can | confirm? | tiagod wrote: | It's just pixels. You can pass them into a tracer | moyix wrote: | SVG isn't really possible with the model architecture | they're using. The diffusion+upscaling step basically | outputs 1024x1024 pixels; at no point does the model have a | vector representation. | | I suppose it's possible that at some point they'll try to | make an image -> svg translation model? | [deleted] | xnx wrote: | I fully expect stock image sites to be swamped by DALL-E | generated images that match popular terms (e.g. "business person | shaking hands"). Generate the image for $0.15. Sell it for $1.00. | smusamashah wrote: | They won't. DALL-E images are mostly not as high quality. The | high quality stuff which everyone has been sharing is result of | lots of cherry picking. | commandlinefan wrote: | Even the high quality stuff still can't do human faces right. | TomWhitwell wrote: | This one surprised me when it came out, felt more 'human' | than lots of stock photos: | https://labs.openai.com/s/AsRKFiOKJmmZrVDxIGa75sSA | optimalsolver wrote: | They avoided using real human faces in the training data. | speedgoose wrote: | In my experience it doesn't require that much cherry picking | if you use a carefully crafted prompt. For example: " A | professional photography of a software developer talking to a | plastic duck on his desk, bright smooth lighting, f2.2, | bokeh, Leica, corporate stock picture, highly detailed" | | And this is the first picture I got: | https://labs.openai.com/s/lSWOnxbHBYQAtli9CYlZGqcZ | | It got it a bit strong on the depth of field and I don't like | the angle but I could iterate a few times and get a good one. | arecurrence wrote: | Additionally, wherever it classically falls over (such as | currently for realistic human faces), there will be second | pass models that both detect and replace all the faces with | realistic ones. People are already using models that alter | eyes to be life-like with excellent results (many of the | dalle-2 ones appear somewhat dead atm). | smusamashah wrote: | Even this image is just an illusion of a perfect photo, | which is a blur for most part, see the face of duck. I had | access since past 4 5 days and it fails badly whenever I | tried to create any unusual scene. | | For the first few days when it was announced I use to look | deep even in real photos in search of generative artifacts. | They are not so difficult to spot now, most of the times | anyway. | cornel_io wrote: | NB: when you share links like that, nobody who doesn't have | access can see the results | alana314 wrote: | sure they can, just tried in incognito | messe wrote: | If the price is low enough, you can have humans rank | generated images (maybe using Mechanical Turk or a similar | service), and from that ranking choose only the highest | quality DALL-E generated images. | Forge36 wrote: | If someone can make money doing it they might. | | Heck: If the cost to entry is prohibitively low they might do | it at a loss and take over the site | redox99 wrote: | DALL-E 2 isn't good enough for such photorealistic pictures | with humans as of yet however. | arecurrence wrote: | There has been trouble with generating life-like eyes but a | second pass with a model tuned around making realistic faces | has been very successful at fixing that. | bpicolo wrote: | https://twitter.com/TobiasCornille/status/154972906039745331. | .. | | Unless I'm missing something, these seem pretty darn good | zerocrates wrote: | Woof, that bias "solution" that that thread is actually | about though...! | thorum wrote: | DALLE images are still only 1024 px wide. Which has its uses, | but I don't think the stock photo industry is in real danger | until someone figures out a better AI superresolution system | that can produce larger and more detailed images. | [deleted] | eigenvalue wrote: | I've been using this app to upscale the images to 4000x4000, | and it works amazingly well (there is also a version for | Android): | | https://apps.apple.com/us/app/waifu2x/id1286485858 | | I paid extra to get the higher quality model using the in-app | purchase option. It crushes the phone's battery life, but | runs in only ~10 seconds on an iPhone 13 Pro for a single | 1000x1000 input image. | ZeWaka wrote: | I mean, waifu2x and similar waifuxx libraries are free and | open-source, there's really no reason to pay for it if | you're working on a desktop. | [deleted] | arecurrence wrote: | You can obtain any size by using the source image with the | masking feature. Take the original and shift it then mask out | part of the scene and re-run. Sort of like a patchwork quilt, | it will build variations of the masked areas with each | generation. | | Once the API is released, this will be easier to do in a | programmatic fashion. | | Note: Depending on how many times you do this... I could see | there being a continuity problem with the extremes of the | image (eg: the far left has no knowledge of the far right). | An alternative could be to scale the image down and mask the | borders then later scale it back up to the desired | resolution. | | This scale and mask strategy also works well for images where | part of the scene has been clipped that you want to include | (EG: Part of a character's body outside the original image | dimensions). Scale the image down, then mask the border | region, and provide that to the generation step. | ploppyploppy wrote: | "buy fo' a dollar, sell fa' two" - Prop. Joe | wishfish wrote: | Makes me imagine stock image sites in the near future. Where | your search term ("man looks angrily at a desktop computer") | gets a generated image in addition to the usual list of stock | photos. | | Maybe it would be cheaper. I imagine it would one day. And | maybe it would have a more liberal usage license. | | At any rate, I look forward to this. And I look forward to the | inevitable debates over which is better: AI generation or | photographer. | dymk wrote: | They'll likely immediately go out of business, because I can | just pay OpenAI 15 cents directly for the exact same product. | dylanlacom wrote: | Eh, I'd bet the arbitrage window is pretty brief, and that | prices will fall closer to $0.15 pretty quickly. | jowday wrote: | Sad to say I've been dissapointed in DALLE's performance since I | got access to it a couple of weeks ago - I think mainly because | it was hyped up as the holy grail of text2image ever since it was | first announced. | | For a long while whenever Midjourney or DALLE-mini or the other | models underperformed or failed to match a prompt the common | refrain seemed to be "ah, but these are just the smaller version | of the real impressive text2image models - surely they'd perform | better on this prompt". Honestly, I don't think it performs | dramatically better than DALLE-mini or Midjourney - in some cases | I even think DALLE-mini outperforms it for whatever reason. Maybe | because of filtering applied by OpenAI? | | What difference there is seems to be a difference in quality on | queries that work well, not a capability to tackle more complex | queries. If you try a sentence involving lots of relationships | between objects in the scene, DALLE will still generate a | mishmash of those objects - it'll just look like a slightly | higher quality mishmash than from DALLE-mini. And on queries that | it does seem to handle well, there's almost always something off | with the scene if you spend more than a moment inspecting it. I | think this is why there's such a plethora of stylized and | abstract imagery in the examples of DALLE's capabilities - humans | are much more forgiving of flaws in those images. | | I don't think artists should be afraid of being replaced by | text2image models anytime soon. That said, I have gotten access | to other large text2image models that claim to outperform DALLE | on several metrics, and my experience matched with that claim - | images were more detailed and handled relationships in the scene | better than DALLE does. So there's clearly a lot of room for | improvement left in the space. | jawns wrote: | One of the commercial use cases this post mentions is authors who | want to add illustrations to children's stories. | | I wonder if there is a way for DALL-E to generate a character, | then persist that character over subsequent runs. Otherwise, it | would be pretty difficult to generate illustrations that depict a | coherent story. | | Example ... | | Image 1 prompt: A character named Boop, a green alien with three | arms, climbs out of its spaceship. | | Image 2 prompt: Boop meets a group of three children and shakes | hands with each one. | minimaxir wrote: | You can cheat this to a limited extent using inpainting. | rahimnathwani wrote: | You mean just generate a single large image with all the | stuff you want for the whole story, and then use cropping and | inpainting to get only the piece you want for each page? | TaupeRanger wrote: | You can't do that. I can't see this working well for children's | book illustrations unless the story was specifically tailored | in a way that makes continuity of style and characters | irrelevant. | CobrastanJorji wrote: | As an aside, Ursula Vernon did pretty well under the | constraint you described. She set a comic in a dreamscape and | used AI to generate most of the background imagery: | https://twitter.com/UrsulaV/status/1467652391059214337 | | It's not the "specify the character positions in text" | proposed, but still a neat take on using this sort of AI for | art. | TaupeRanger wrote: | Nice example and very well done. But yeah, very niche | application unfortunately. | WalterSear wrote: | I would expect continuuity to be a relatively simple feature | to retrain for and implement. | bergenty wrote: | You cannot. But a workaround would be to say something like | "generate an alien in three different poses-- running, walking, | waving" | | Then use inpainting to only preserve that pose and generate new | content around it. It's definitely not perfect. | londons_explore wrote: | You can do better than this. Draw/generate your character. | | Then put that at the side of a transparent image, and use as | the prompt, "Two identical aliens side by side. One is | jumping" | can16358p wrote: | So can we now legally remove the "color blocks" watermark or not? | | What about generating NFTs? It was explicitly prohibited during | the previous period, now there is no notion of it. Without notion | and rights for commercial use I think it's allowed but because it | was an explicitly forbidden use case before, I want to be sure | whether it can be used or not. | | Regardless, excited to see what possibilities it opens. | gwern wrote: | Another user saying that OA has said it's OK to remove the | watermark: | https://www.reddit.com/r/dalle2/comments/w3qsxd/dalle_now_av... | | The commercial use language appears pretty clear to me to allow | NFTs. (But note the absence of any discussion of _derivative_ | works...) | blintz wrote: | The content policy is strikingly puritanical: | | > "Do not attempt to create, upload, or share images that are not | G-rated" | | https://labs.openai.com/policies/content-policy | anewpersonality wrote: | Feel sorry for the full time artists. | danielvf wrote: | I am thrilled about DALL-E, and the new terms of service. | However, how they implemented the improved "diversity" is | hilarious. | | Turns out that they randomly, silently modify your prompt text to | append words like "black male" or "female". See | https://twitter.com/jd_pressman/status/1549523790060605440 | | I don't know which emotion I feel more - applause at how glorious | this hack is or tears at how ugly it is. | | Good luck to them! | time_to_smile wrote: | This is funny because I work on a team that is using GPT-3 and | to fix a variety of issues we have with incorrect output we've | just been having the engineering team prepend/append text to | modify the query. As we encounter more problems the team keeps | tacking on more text to the query. | | This feels like a very hacky way to essentially reinvent | programming badly. | | My bet is that in a few years or so only a small cohort of | engineering and product people will even remember Dall-E and | GTP-3 and someone cringe at how we all thought this was going | to be a big thing in the space. | | There's are both really fascinating novelties, but at the end | of the day that's all they are. | throwaway4aday wrote: | How else would you specify the type of image you would like? | Surely, if you were hiring a designer you would provide them | with a detailed description of what you wanted. More likely, | you would spend a lot of time with them maybe even hours and | who knows how many words. For design work specifically to | create a first mockup or prototype of a product or image it | seems like DALL-E beats that initial phase hands down. It's | much easier to type in a description and then choose from a | set of images than it is to go back and forth with someone | who may take hours or days to create renderings of a few | options. I don't think it'll put designers out of work but I | do think they'll be using it regularly to boost their | productivity. | selestify wrote: | What are you using GPT-3 for in a commercial setting? | mysore wrote: | it's a hard problem. at least they tried. | Jerrrry wrote: | It's not a "problem," it's an unwanted shard of reality | piercing through an ideological guise. | gnulinux wrote: | How's it NOT a problem? If I'm trying to produce "stock | people images", and if it only gives me white men, it's | clearly broken because when I ask for "people" I'm actually | asking for "people". I'm having difficulty understanding | how it can be considered to be working as intended, when it | literally doesn't. Clearly, the software has substantial | bias that gets in way of it accomplishing its task. | | If I want to produce "animal images" but it only produces | images of black cats, do you think there is any question | whether it's a problem or not? | mysterydip wrote: | That's what Jerrrry is saying. Framing the reality of | diversity in the world as a "problem" is wrong. | ceeplusplus wrote: | Black people comprise 12.4% of the US population, yet | they are represented at substantially above that in | "OpenAI"'s "bias removal" process. Clearly it has, as you | put it, substantial bias that gets in the way of | accomplishing its task. | Jerrrry wrote: | That is clearly overfitting due to unrepresentative | training data. | | The "issue" is a different one: that training data - IE, | reality, has _unwanted_ biases in it, because reality is | biased. | | Producing images of men when prompting for "trash | collecting workers" should not be much of a surprise: 99% | of garbage collection/refuse is handled by men. I doubt | most will consider this a "problem," because of one's own | bias, nobody cares about women being represented for a | "shitty" job. | | But ask for picture of CEOs, and then act surprised when | most images are of white men? Only outrage, when | proportionally, CEO's are, on average, white men. | | The "problem" arises when we use these tools to make | decisions and further affect society - it has the obvious | issue of further entrenching stereotypical associations. | | This is not that. Asking DALLE for a bunch of football | players, would expectedly produce a huddled group of | black men. No issue, because the NFL are | disproportionately black men. No outrage, either. | | Asking DALLE for a group of criminals, likewise, produces | a group of black men. Outage! Except statistically, this | is not a surprise, as a disproportionate amount of | criminals are black men. | | The "problem" is with reality being used as training | data. The "problem" is with our reality, not the tooling. | | Except in the cases where these toolings are being used | to affect society - the obvious example being insurance | ML algorithms. et al - we should strive to fix the issues | present in reality, not hide them with handicapped | training data, and malformed inputs. | TomWhitwell wrote: | In the UK... "The Environmental Services Association, the | trade body, said that only 14 per cent of the country's | 91,300 waste sector workers were female." So 2x dall-e | searches should produce 1.2 women. | CuriousSkeptic wrote: | > Asking DALLE for a bunch of football players, would | expectedly produce a huddled group of black men | | I think, for about 95% of the world football is | synonymous with soccer. Its kind of interesting that you | take this particular example to represent what reality | looks like statistically | less_less wrote: | > This is not that. Asking DALLE for a bunch of football | players, would expectedly produce a huddled group of | black men. No issue, because the NFL are | disproportionately black men. No outrage, either. | | This is not great. Only about 57% of NFL players are | black, and the percentage is more like 47% among college | players. It would be better to at least reflect the | diversity of the field, even if you don't think it should | be widened in the name of dispelling stereotypes. | | > Asking DALLE for a group of criminals, likewise, | produces a group of black men. Outage! Except | statistically, this is not a surprise, as a | disproportionate amount of criminals are black men. | | Only about 1/3 of US prisoners are black. (Not quite the | same as "criminals" but of course we don't always know | who is committing crimes, only who is charged or | convicted.) That's disproportionate to their population, | but it's not even close to a majority. If DALLE were to | exclusively or primarily return images of black men for | "criminals", then it would be reinforcing a harmful | stereotype that does not reflect reality. | stuckinhell wrote: | Everything is an ideological war zone now. That's the world | we live in now. | Fnoord wrote: | Perhaps its a problem you don't care about? | ketzo wrote: | serious question: in what way is that not a "problem?" | TheFreim wrote: | It's not a problem in a few ways, let me know what you | think (feel free to ask for clarification). | | 1. The training data would've been the best way to get | organic results, the input is where it'd be necessary to | have representative samples of populations. | | 2. If the reason the model needs to be manipulated to | include more "diversity" is that there wasn't enough | "diversity" in the training set then its likely the | results will be lower quality | | 3. People should be free to manipulate the results how | they wish, a base model without arbitrary manipulations | of "diversity" would be the best starting point to allow | users to get the appropriate results | | 4. A "diverse" group of people depends on a variety of | different circumstances, if their method of increasing it | is as naive as some of the are claiming this could result | in absurdities when generating historical images or | images relating to specific locations/cultures where | things will be LESS representative | bobcostas55 wrote: | Well, it's a problem for the ideology. | kache_ wrote: | While their heart is in the right place, I'd like to | challenge the idea that certain groups are so fragile that | they don't understand that historically, there are more | pictures of certain groups doing certain things. | | It's a hard problem for sure. But remember, the bias ends | with the user using the tool. If I want a black scientist, I | can just say "black scientist". | | Let _me_ be mindful of the bias, until we have a generally | intelligent system that can actually do it. I 'm generally | intelligent too, you know. | micromacrofoot wrote: | Historically this is true, but it also seems dangerous to | load up these algorithms with pure history because they'll | essentially codify and perpetuate historical problems. | UmYeahNo wrote: | >But remember, the bias ends with the user using the tool. | If I want a black scientist, I can just say "black | scientist". | | That is a really, _really_ , narrow viewpoint. I think what | people would prefer is that if you query "Scientist" that | the images returned are as likely to be any combination of | gender and race. It's not that a group is "fragile", it's | that they have to specify race and gender at all, when that | specificity is not part of the intention. It seems that | they recognize that querying "Scientist" will predominantly | skew a certain way, and they're trying in some way to | unskew. | | Or, perhaps, you'd rather that the query be really, really | specific? like: "an adult human of any gender and any race | and skin color dressed in a laboratory coat...", but I | would much rather just say "a scientist" and have the | system recognize that _anyone_ can be a scientist. | | And then if I need to be specific, then I would be happy to | say "a black-haired scientist" | numpad0 wrote: | Kind of funny that NN tech is supposed to construct some | upper dimensional understanding, yet realistically cannot | be expected to be able to generate gender and race | indeterminate portrayal of a scientist. | kache_ wrote: | This is a problem with generative models across the | board. It's important that we don't skew our perceptions | by GAN outputs as a society, so it's definitely good that | we're thinking about it. I just wish that we had a | solution that solved across the class of problems | "Generative AI feeds into itself and society (which is in | a way, a generative AI), creating a positive feedback | loop that eventually leads to a cultural freeze" | | It's way bigger than just this narrow race issue the | current zeitgeist is concerned about. | | But I agree, maybe I should skew to being optimistic that | at least we're _trying_ | throwaway4aday wrote: | Have you seen the queries that are used to generate | actually useful results rather than just toy | demonstrations? They look a lot more like your first | example except with more specificity. It'd be more like | "an adult human of any gender and any race and skin color | dressed in a laboratory coat standing by a window holding | a beaker in the afternoon sun. 1950s, color image, Canon | 82mm f/3.6, desaturated and moody." so if instead you are | looking for an image with a person of a specific | ethnicity or gender then you are for sure going to add | that in along with all of the details. If you are instead | worried about the bias of the person choosing the image | to use then there is nothing short of restricting them to | a single choice that will fix that and even in that case | they would probably just not use the tool since it wasn't | satisfying their own preferences. | protonbob wrote: | Honestly I would rather that they not try. I don't understand | why a computer tool has to be held to a political standard. | daemoens wrote: | It's not a political standard though. There is actual | diversity in this world. Why wouldn't you want that in your | product? | [deleted] | mensetmanusman wrote: | Fix the data input side, not the data output side. The | data input side is slowly being fixed in real time as the | rest of the world gets online and learns these methods. | throwaway4aday wrote: | In a sane world we would be able to tack on a disclaimer | saying "This model was trained on data with a majority | representation of caucasian males from Western English | speaking countries and so results may skew in that | direction" and people would read it and think "well, duh" | and "hey let's train some more models with more data from | around the world" instead of opining about systemic | racism and sexism on the internet. | astrange wrote: | That wouldn't necessarily fix the issue or do anything. A | model isn't a perfect average of all the data you throw | into its training set. You have to actually try these | things and see if they work. | norwalkbear wrote: | I agree, the trust is broken now. Im going to skip on any | AI that pulls that crap. | Jerrrry wrote: | There are legitimate reasons to reduce externalizations of | societies innate biases. | | A mortgage AI that calculates premiums for the public | shouldn't bias against people with historically black | names, for example. | | This problem is harder to tackle because it is difficult to | expose and resign the "latent space" that results in these | biases; it's difficult to massage the ML algo's to identify | and remove the pathways that result in this bias. | | It's simply much easier to allow the robot to be | bias/racist/reflective of "reality" (its training data), | and add a filter / band-aid on top; which is what they've | attempted. | | when this is appropriate is the more cultured question; I | don't think we should attempt to band-aid these models, but | for more socially-critical things, it is definitely | appropriate. | | It's naive on either extreme: do we reject reality, and | substitute or own? Or do we call our substitute reality, | and hope the zeitgeist follows? | ceeplusplus wrote: | That's great, but by doing so you are also inadvertently | favoring, in your example, the people with black names. | For example, Chinese people save on average, 50 times | more than Americans according to the Fed [1]. That would | mean they would generally be overrepresented in loan | approvals because they have a better balance sheet. Does | that necessarily mean that Americans are discriminated | against in the approval process? No. | | My question to you is: is an algorithm that takes no | racial inputs (name, race, address, etc) yet still | produces disproportionate results biased or racist? I say | no. | | [1] https://files.stlouisfed.org/files/htdocs/publication | s/es/08... | Jerrrry wrote: | I would agree that it is not. | | The government, and many people, have moved the | definition and goal posts; so that anything that has the | end result of a non-proportional uniformity can be | labeled and treated as bias. | | Ultimately it is a nuanced game; is discriminating | against certain clothing or hair-styles racist? Of | course. Yet, neither of those are explicitly tied to | one's skin color or ethnicity, but are an indirect | associative trait because of culture. | | In America, we have intentionally muddled the waters of | demarcation between culture and race, and are starting to | see the cost of that. | mh- wrote: | _> A mortgage AI that calculates premiums for the public | shouldn 't bias against people with historically black | names, for example._ | | That's a great example, thanks. Also, I hope the teams | working on that come up with a different solution... | [deleted] | [deleted] | tablespoon wrote: | > Turns out that they randomly, silently modify your prompt | text to append words like "black male" or "female". | | I wonder what the distribution of those modifications is? | Hard_Space wrote: | Today, when DALL-E was still free, my Dad asked me to try a | prompt about the Buddha sitting by a river, contemplating. I | did about 4 prompt variations, and one of them was an Asian | female, if that gives any idea about the frequency (I should | note that the depiction was of a young, slim, and attractive | female Buddha, so I'm not sure they have the bias thing | licked just yet). | speedgoose wrote: | In my little testing, diversity in ethnicities was achieved | but not realistic given the context. I also got a few | androgynous people as I asked for a male or a female and | another gender was appended. | Invictus0 wrote: | A dumb solution to a dumber problem. | tshaddox wrote: | That Twitter thread is full of people saying "yeah that doesn't | seem to be true at all" so I'm hesitant to jump to conclusions | even if we're deciding to believe random tweets. | causi wrote: | Interesting. Considering this is now a paid product, is | modifying user input covered by their ToS? If I was spending a | lot of money on it I'd be rather annoyed my input was being | silently polluted. | zikduruqe wrote: | Don't spend money. Use https://www.craiyon.com | scott_s wrote: | _[shudder]_ | | I tried the first whimsical, benign thing I could think of: | "indiana jones eating spaghetti." The results are clearly | recognizable as that. But they are also a kaleidoscope of | body horror; a Indiana Jones monster melted into Cthulu | forms inhaling plates that are slightly _not_ spaghetti. | bhaney wrote: | This produces dramatically worse results in my experience. | minimaxir wrote: | Not worse, but different. It depends on the prompt but | DALL-E mini/mega seems to do better then DALL-E 2 for | certain types of absurd prompts, such as the ones in | /r/weirddalle | causi wrote: | Yes, there are very sharp lines where it does and doesn't | understand. It understands color and gender but not | materials. I got very good outputs for "blue female | Master Chief" but "starship enterprise made out of candy" | was complete garbage. | elcomet wrote: | Definitely worse-quality. Maybe more diverse for some | prompts yeah. | kuprel wrote: | This one is faster, I ported it | https://replicate.com/kuprel/min-dalle | minimaxir wrote: | Additionally, it's also open-sourced on GitHub and can be | self-hosted, with easy instructions to do so: | https://github.com/kuprel/min-dalle | practice9 wrote: | Thankfully it doesn't introduce any researcher bias, | doesn't ban people from using it on the basis of country, | doesn't use your personal data like phone number... | | And the best of all - it does have a meme community around | it, and you can always donate if you feel it adds value to | your life | kingkawn wrote: | The racist pollution came long before this product was a | glimmer in our eye. | tptacek wrote: | Your input isn't being polluted by this any more than it is | when the tokens in it are ground up into vectors and | transformed mathematically. You just have an easier time | understanding this transformation. | throwuxiytayq wrote: | Obviously, it's polluted. Undisputably. In a mathematical | sense, an extra (black box) transformation is performed on | the input to the model. In a practical sense (eg. if you're | researching the model), this is like having dirty | laboratory tools - all measurements are slightly off. The | presumption by OpenAI is that the measurements are _off in | the correct way_. | | I'm interested in using Dall-E commercially, but I think | some competitor offering sampling with raw input will have | a better chance at my wallet. | tptacek wrote: | throwuxiytayq wrote: | Yeah man, but literally the entire point of this AI | picture generator is that it's, like, super _accurate_ at | rendering the prompt, and stuff. | | I don't understand the relevance of the black box's | scrutability - _I just want to play with the black box_. | I am interested in increasing my understanding of the | black box, not of a trust-me-it 's-great-our-intern- | steve-made-it black box derivative. | tptacek wrote: | You should make your own black boxes then. By all means, | send your dollars to whatever service passes your purity | test; I'm just saying that the idea that DALL-E is | "polluting" your input is risible. It's already polluting | your data at, like, a subatomic level, at | dimensionalities it hadn't even occurred to you to | consider, and at enormous scale. | bantou_41 wrote: | Diversity = black now? That's even more racist. | xyzzyz wrote: | Diversity has meant exactly that all the way since Bakke. | [deleted] | konfusinomicon wrote: | as far as I can tell, they also concatenate "On LSD" to every | prompt as well. | DecayingOrganic wrote: | Since many people will start generating their first images soon, | be sure to check out this amazing DALL-E prompt engineering book | [0]. It will help you get the most out of DALL-E. | | [0]: https://dallery.gallery/wp-content/uploads/2022/07/The- | DALL%... (PDF) | ru552 wrote: | nice write up, thanks | uplifter wrote: | Thanks for this! A bit of prompt engineering know-how will help | me get the most bang for the buck out of this beta. I also just | want to say that dallery.gallery is delightfully clever naming. | ZeWaka wrote: | This is absolutely amazing. Thanks! | c0decracker wrote: | Interesting. I got access couple weeks ago (was on waitlist since | the initial announcement) and frankly as much as really want to | be excited and like it, DALL-E ended up being a bit | underwhelming. IMHO - often results that produced are of low | quality (distorted images, or quite wacky representation of the | query). Some styles of imagery are certainly a better fit for | being generated by DALL-E, but as far as commercial usage I think | it needs a few iterations and probably even larger underlying | model. | simonw wrote: | This book has some very good, actionable advice on crafting | prompts that get better results out of DALL-E: | https://dallery.gallery/the-dalle-2-prompt-book/ | andybak wrote: | I also got access a couple of weeks ago and I can't fathom how | anyone could be underwhelmed by it. | | What were you expecting? | c0decracker wrote: | Fundamentally I have two categories of issues I see with | DALL-E, but please don't get me wrong -- I think this is a | great demonstration of what is possible with huge models and | I think OpenAI work in general is fantastic. I will most | certainly continue using both DALL-E and OpenAI's GPT3. (1) | Between what DALL-E can do today and commercial utility is a | rift in my opinion. I readily admit that I am have not done | hundreds of queries (thank you folks for pointing that out, | I'll practice more!) but that means that there is a learning | curve, isn't it? I can't just go to DALL-E, mess with it for | 5-10 minutes and get my next ad or book cover or illustration | for my next project done? (2) I think DALL-E has issues with | faces and human form in general. Images it produces are often | quite repulsive and take the uncanny valley to the next | level. I absolutely surprise myself when I noticed thinking | that images with humans DALL-E produced lack of... soul? Cats | and dogs on the other hand it handles much better. I done | tests with other entities --- say cars or machinery -- and it | generally performs so so with them too, often creating | disproportionate representations of them or misplacing | chunks. If you're querying for multiple objects on a scene it | quite often melds them together. This is more pronounced in | photorealistic renderings. When I query for painting-style it | works mostly better. That said every now and then it does | produce a great image, but with this way of arriving at it, | how fast I'll have to replenish those credits?.. :) | | All in all though I think I am underwhelmed mostly because my | initial expectations were off, I am still a fan of DALL-E | specifically and GPT3 in general. Now when is GPT4 coming | out? :) | harpersealtako wrote: | Dalle seems to only have a few "styles" of drawing that it is | actually "good" at. It is particularly strong at these styles | but disappointingly underwhelming at anything else, and will | actively fight you and morph your prompt into one of these | styles even when given an inpainting example of exactly what | you want. | | It's great at photorealistic images like this: | https://labs.openai.com/s/0MFuSC1AsZcwaafD3r0nuJTT, but it's | intentionally lobotomized to be bad at faces, and often has | an uncanny valley feel in general, like this: | https://labs.openai.com/s/t1iBu9G6vRqkx5KLBGnIQDrp (never | mind that it's also lobotomized to be unable to recognize | characters in general). It's basically as close to perfect as | an AI can be at generating dogs and cats though, but anything | else will be "off" in some meaningful ways. | | It has a particular sort of blurry, amateur oil painting | digital art style it often tries to use for any colorful | drawings, like this: | https://labs.openai.com/s/EYsKUFR5GvooTSP5VjDuvii2 or this: | https://labs.openai.com/s/xBAJm1J8hjidvnhjEosesMZL . You can | see the exact problem in the second one with inpainting: it | utterly fails at the "clean" digital art style, or drawing | anything with any level of fine detail, or matching any sort | of vector art or line art (e.g. anime/manga style) without | loads of ugly, distracting visual artifacts. Even Craiyon and | DALLE-mini outperform it on this. I've tried over 100 prompts | to get stuff like that to generate and have not had a single | prompt that is able to generate anything even remotely good | in that style yet. It seems almost like it has a "resolution" | of detail for non-photographic images, and any detail below a | certain resolution just becomes a blobby, grainy brush | stroke, e.g. this one: | https://labs.openai.com/s/jtvRjiIZRsAU1ukofUvHiFhX , the | "fairies" become vague colored blobs here. It can generate | some pretty ok art in very specific styles, e.g. classical | landscape paintings: | https://labs.openai.com/s/6rY7AF7fWPb5wWiSH0rAG0Rm , but for | anything other than this generic style it disappoints _hard_. | | The other style it is ok at is garish corporate clip art, | which is unremarkable and there's already more than enough | clip art out there for the next 1000 years of our collective | needs -- it is nevertheless somewhat annoying when it | occasionally wastes a prompt generating that crap because you | weren't specific that you wanted "good" images of the thing | you were asking for. | | The more I use DALLE-2 the more I just get depressed at how | much wasted potential it has. It's incredibly obvious they | trimmed a huge amount of quality data and sources from their | databases for "safety" reasons, and this had huge effects on | the actual quality of the outputs in all but the most mundane | of prompts. I've got a bunch more examples of trying to get | it to generate the kind of art I want (cute anime art, is | that too much to ask for?) and watching it fail utterly every | single time. The saddest part is when you can see it's got | some incredible glimpse of inspiration or creative genius, | but just doesn't have the ability to actually follow through | with it. | napier wrote: | GPT3 has seen similar lobotomization since its initial | closed beta. Current davinci outputs tend to be quite | reserved and bland, whereas when I first had the fortunate | opportunity to experience playing with it in mid 2020, if | often felt like tapping into a friendly genius with access | to unlimited pattern recognition and boundless knowledge. | harpersealtako wrote: | I've absolutely noticed that. I used to pay for GPT-3 | access through AI Dungeon back in 2020, before it got | censored and run into the ground. In the AI fiction | community we call that "Summer Dragon" ("Dragon" was the | name of the AI dungeon model that used 175B GPT-3), and | we consider it the gold standard of creativity and | knowledge that hasn't been matched yet even 2 years | later. It had this brilliant quality to it where it | almost seemed to be able to pick up on your unconscious | expectations of what you wanted it to write, based purely | on your word choice in the prompt. We've noticed that | since around Fall 2020 the quality of the outputs has | slowly degraded with every wave of corporate censorship | and "bias reduction". Using GPT-3 playground (or story | writing services like Sudowrite which use Davinci) it's | plainly obvious how bad it's gotten. | | OpenAI needs to open their damn eyes and realize that a | brilliant AI with provocative, biased outputs is better | than a lobotomized AI that can only generate advertiser- | friendly content. | visarga wrote: | So it got worse for creative writing, but it got much | better at solving few-shot tasks. You can do information | extraction from various documents with it, for example. | napier wrote: | I mean yes, you're right insofar as it goes. However | nothing I am aware of implies technical reasons linking | these two variables into a necessarily inevitable trade- | off. And it's not only creative writing that's been | hobbled; GPT3 used to be an _incredibly promising_ | academic research tool and given the right approach to | prompts could uncover disparate connections between | siloed fields that conventional search can only dream of. | | I'm eager for OpenAi to wake up and walk back on the | clumsy corporate censorship, and/or for competitors to | replicate the approach and improve upon the original | magic without the "bias" obsession tacked on. Real | challenge though "bias" may pose in some scenarios, | perhaps a better way to address this would be at the | training data stage rather than clumsily gluing on an | opaque approach towards poorly implemented, idealist | censorship lacking in depth (and perhaps arguably, also | lacking sincerity). | arecurrence wrote: | I suspect you simply need to use it more with a lot more | variation in your prompts. In particular, it takes style | direction and some other modifiers to really get rolling. Run | at least a few hundred prompts with this in mind. Most will be | awful output... but many will be absolute gems. | | It has, honestly, completely blown me away beyond my wildest | imagination of where this technology would be at today. | [deleted] | dereg wrote: | I felt the same way. If anything, I realized how soulless and | uninteresting faceless art is. Dall-E 2 goes out of its way to | make terrible faces for, im guessing, deepfake reasons? | [deleted] | choppaface wrote: | A free alternative: | | https://huggingface.co/spaces/dalle-mini/dalle-mini | | Reminder that the OpenAI team claimed safety issues about | releasing the weights. Now they're charging, when the above link | GPU time is being paid for by investor dollars. I guess sama must | be hurting if he can only afford OpenAI credit packs for | celebrities and his friends. | softwaredoug wrote: | Surprised by the lack of comments on the ethics of DALL-E being | trained on artists content whereas copilot threads are chock full | of devs up in arms over models trained on open source code. Isn't | it the same thing? | MWil wrote: | I've been on the waitlist since April 16th. Would have loved to | have played around with the alpha but now clearly my ability to | experiment and learn to use the system to cut down on expenses is | extremely limited. | O__________O wrote: | Two questions: | | (1) Any opinions on if removing the watermark is possible? Is | doing so against the terms of service? | | (2) Appears the output is still at 1024x1024 - what are options | to upscale the resolution, for example would OpenCV super | resolution work? | jeanlucas wrote: | It is possible, they confirmed on discord you can remove the | watermark. | | Yep... The output is an issue, I'd like to pay if that was an | upgrade. | O__________O wrote: | Annoying that if removing the watermark is allowed that it is | even inserted. Imagine if Adobe did that. | | Here's more information on super resolution options beyond | what Adobe already offers: | | (1) List of options current options for super resolutions: | | https://upscale.wiki/wiki/Different_Neural_Networks | | (2) Older example of one way to benchmark: | | https://docs.opencv.org/4.x/dc/d69/tutorial_dnn_superres_ben. | .. | moron4hire wrote: | How do you interface with DALL-E? | | For MidJourney I was painfully surprised to find that everything | is done through chat messages on a Discord server. | | I'm not a paid member, so I have to enter my prompts in public | channels. It's extremely easy to lose your own prompts in the | rapidly flowing stream of prompts going on. I can kind of see why | they did it that way--maybe, if I squint really hard--to try to | promote visibility and community interaction, but it's just not | happening. It's hard enough to find my own images, say nothing | about follow what someone else is doing. This is literally the | worst user experience I have ever had with a piece of software. | | There are dozens of channels. It's so spammy, doing it through | Discord. It's constantly pinging new notifications and I have to | go through and manually mute each and every one of the channels. | Then they open a few dozen more. Rinse. Repeat. | | I understand paid users can have their own channels to generate | images, but I really don't see the point in paying for it when, | even subtracting the firehose of prompts and images, it's still | an objectively shitty interface to have to do everything through | Discord chat messages. | neya wrote: | I'm curious to know - does the community have any open source | alternatives to DALL.E? For an initiative named OpenAI, keeping | their source code and models closed behind a license is bullshit | in my opinion. | gwern wrote: | EAI/Emad/et al's 'Stable Diffusion' model will be coming out in | the next month or so. I don't know if it will hit DALL-E 2 | level but a lot of people will be using it based on the during- | training samples they've been releasing on Twitter. | minimaxir wrote: | The best open-source-but-actually-can-be-run-on-simple-infra | analogous to DALL-E 2 is min-dalle: | https://github.com/kuprel/min-dalle | arecurrence wrote: | LAION is working on open source alternatives. There's a lot of | activity in their discord and they have amassed the necessary | training data but I am uncertain as to whether they have | obtained the funding needed to deliver fully trained models. | Phil Wang created initial implementations of several papers | including imagen and parti in his GitHub account. EG: | https://github.com/lucidrains/DALLE2-pytorch | draw_down wrote: | selimnairb wrote: | I like how everyone's face is rendered by DALL-E to look either | like a still from a David Lynch film, or have teeth and hair | coming out of weird places. | pawelduda wrote: | That's disappointing given up until this point you could have 50 | free uses per 24h. I expected it to get monetized eventually, but | not so fast and drastically. Well, still had my fun and have to | say the creations are so good it's often mind blowing there's an | AI behind it. | mysore wrote: | they're a non-profit so the price is probably still dirt cheap | ajafari1 wrote: | Not correct. They have a for-profit entity now. That's why | there is a huge incentive to monetize. Any for-profit | investment gain is capped at 100x, with the rest required to | go to their nonprofit. This commercialization is just as I | predicted in my substack post 2 days ago that hit the front | page of Hacker News: https://aifuture.substack.com/p/the-ai- | battle-rages-on | dougmwne wrote: | Honestly, it is probably just that expensive to run. You can't | expect someone to hand you free compute of significant value | and directly charging for it is a lot better than other things | they could do. | bulbosaur123 wrote: | hhmc wrote: | So you actually _wanted_ images that perpetuate the biases of | the world? | Geonode wrote: | Reducing bias means affecting the data, instead of letting | the end user just choose an appropriate image generated by a | clean data set. | illwrks wrote: | I thought the same thing but I think the commenter is making | a joke, but I could be wrong. | | I think they are suggesting that things like this (neural | nets etc) work using bias, and by removing "bias" the | developers are making the product worse. | | It's a very sh!t comment if it's not a joke. | aloisdg wrote: | Just to be sure. Does "OC" here mean Original Comment? | illwrks wrote: | Typo, now fixed. | minimaxir wrote: | Unfortunately, the method OpenAI may be using to reduce bias | (by adding words to the prompt unknown to the user) is a | naive approach that can affect images unexpectedly and | outside of the domain OpenAI intended: | https://twitter.com/rzhang88/status/1549472829304741888 | | I have also seeing some cases where the bias correction may | not be working at all, so who knows. And it's why | transparancy is important. | CobrastanJorji wrote: | What a fascinating hack. I mean, yeah, naive and simplistic | and doesn't really do anything interesting with the model | itself, but props to the person who was given the "make | this more diverse" instruction and said "okay, what's the | simplest thing that could possibly work? What if I just | append some races and genders onto the end of the query | string, would that mostly work?" and then it did! Was it a | GOOD idea? Maybe not. But I appreciate the optimization. | kmeisthax wrote: | This sounds like something that could backfire _very badly_ | on certain prompts. "person eating a watermelon" for | example. | bulbosaur123 wrote: | Yes, I did. I want it to show world as it is not as people | want it to be. | scifibestfi wrote: | How do you remove bias as long as humans are in the loop? | Aren't they just swapping one bias for their own? | brycethornton wrote: | I'm blown away by this: | | "Starting today, users get full usage rights to commercialize the | images they create with DALL*E, including the right to reprint, | sell, and merchandise. This includes images they generated during | the research preview." | | I assumed this was going to be the sticking point for wider usage | for a long time. They're now saying that you have full rights to | sell Dall-E 2 creations? | vlunkr wrote: | Is the lesson here that these images are worth nothing so they | lose nothing by giving them away? | [deleted] | nutanc wrote: | And I just used it to create cover art for a book published in | Amazon :) | | https://twitter.com/nutanc/status/1549798460290764801?s=20&t... | pqdbr wrote: | What was your prompt? | nutanc wrote: | "girl with a cap standing next to a shadow man having a | speech bubble, digital art" | pferdone wrote: | Does DALL-E create different outputs for the same input? How | does ownership work there? | flatiron wrote: | yes it will. it'll keep on augmenting the image until it | recognizes it as the input | minimaxir wrote: | Not only that, but you can also upload an image (that doesn't | depict a real person) and generate variations of it without | providing a prompt. | berberous wrote: | I think they are reacting to competition. MidJourney is | amazing, was easier to get into, gives you commercial rights, | and frankly I found more fun to use and even better output in | most instances. | napier wrote: | The only thing I don't like about MidJourney is the Discord | based interface. I think I can grok why Dave chose this route | as it bakes in an active community element and allows users | to pick up prompt engineering techniques osmotically... but | I'd prefer a clean DALL-E style app and cli / api access. | berberous wrote: | In case you don't know, you can at least PM the MidJourney | bot so you have an uncluttered workspace. | | It's clearly personally preference, but I loathe Discord | but love it for MidJourney. As you said, there's an | interactive element where I see other people doing cool | things and adapting part of their prompts and vice versa. | It really is fun. And when you do it in a PM, you have all | your efforts saved. DALL-E is pretty clunky in that you | have to manually save an image or lose it once your history | rolls off. | napier wrote: | Thanks. Yeah fair point; I haven't ponied up for a | subscription yet so am still stuck in public channels and | often find my generations get lost in the stream. Imagine | you're right and having the PM option would change the | experience drastically for the better albeit still within | Discord's visually chaotic environment. | davedx wrote: | MidJourney seems a little less all-out commercial. The way | everyone's creations are in giant open Discord channels is | great too | stoicjumbotron wrote: | Really hope I get an invite for MidJourney soon. Been on the | waitlist since March :( | ozmodiar wrote: | Midjourney is in open beta now. Just go to their site and | you can get started right away. I got in and I wasn't even | on their waiting list. | pitzips wrote: | Midjourney recently changed their terms of service and now | the creators own the image and give a license back to | Midjourney. Pretty cool. | jaggs wrote: | nightcafe.studio is also free and good. Very good. | MatthiasPortzel wrote: | MidJourney definitely struggles more with complex prompts | from what I saw. If you like the output more, that's | subjective, but I think DALL*E is the leader in the space by | a wide margin. | berberous wrote: | I think both have strengths and weaknesses, but I don't | disagree DALL-E in most instances is technically better at | matching prompts. But I often enjoyed, artistically, the | results of MidJourney more; it just felt fun to use and | explore. | skybrian wrote: | Don't they both give you commercial rights now? | | I have access to both and they're good for different things. | DALL-E seems somewhat more likely to know what you mean. | Midjourney seems better for making interesting fantasy and | science fiction environments. | | For comparison, I tried generating images of accordions. | Midjourney doesn't really understand that an accordion has a | bellows [1]. DALL-E manages to get the right shape much of | the time, if you don't look too closely: [2], [3]. Neither of | them knows the difference between piano and button | accordions. | | Neither of them can draw a piano keyboard accurately, but | DALL-E is closer if you don't look too hard. (The black notes | aren't in alternating groups of two and three.) | | Neither of them understands text; text on a sign will be | garbled. Google's Parti project can do this [4], but it's not | available to the public. | | I expect DALL-E will have many people sign up for occasional | usage, because if you don't use it for a few months, the free | credits will build up. But Midjourney's pricing seems better | if you use it every day? | | [1] https://www.reddit.com/r/Accordion/comments/uuwrbj/midjou | rne... | | [2] https://www.reddit.com/r/Accordion/comments/vz9zxw/dalle_ | sor... | | [3] https://www.reddit.com/r/Accordion/comments/w0677q/accord | ion... | | [4] https://parti.research.google/ | minimaxir wrote: | Previously, OpenAI asserted they owned the generated images, so | the new licensing is a shift in that aspect. GPT-3 also has a | "you own the content" clause as well. | | Of course, that clause won't deter a third party from filing a | lawsuit against you if you commercialize a generated image | _too_ close to something realistic, as the copyrights of AI | generated content still hasn 't been legally tested. | LegitShady wrote: | As far as I can tell they still own the images they just | license your use of them commercially. | pornel wrote: | AFAIK only people can own copyright (the monkey selfie case | tested this), and machine-generated outputs don't count as | creative work (you can't write an algorithm that generates | every permutation of notes and claim you own every song[1]), | so DALL-E-generated images are most likely copyright-free. I | presume OpenAI only relies on terms of service to dictate | what users are allowed to do, but they can't own the images, | and neither can their users. | | [1]: https://felixreda.eu/2021/07/github-copilot-is-not- | infringin... | TaylorAlexander wrote: | The monkey selfie was not derived from millions of existing | works, and that is the difference. If an artist has a well- | known art style, and this algorithm was trained on it and | can copy that style, would the artist have grounds to sue? | I don't know. | l33t2328 wrote: | If I write a song am I not deriving it from the existing | works I've been exposed to? | TaylorAlexander wrote: | Sure but if you just release a basic copy of a Taylor | Swift song you will get sued to oblivion. So the law | seems (IANAL) to care about how similar your work is to | existing works. DALL-E does not seem capable of showing | you the work that influenced a result, so users will have | no idea if a result might be infringing. What this means | to me is that with many users, some of the results would | be legally infringing. | Melting_Harps wrote: | > If an artist has a well-known art style, and this | algorithm was trained on it and can copy that style, | would the artist have grounds to sue? I don't know. | | While nothing has been commercialized yet on the DALLE2 | subreddit, I know that it can do Dave Choe's work | remarkably well. I also saw Alex Gray's work to be close, | but not really identical either. It wasn't as intricate | as his work is. | | It will be interesting if this takes off and you have a | sort of Banksy effect take over where unless it's a | physical piece of art it doesn't have much value and is | only made all the better because of some sort polemic | attached to it, eg Girl with balloon. | lancesells wrote: | I'm going to guess there's not going to be much value | placed on anything out of DALLE for a long while. Digital | art is typically worth much less than physical art and I | would say these GAN images are going to worth less than | digital art generated by human hand. | | There will be outliers of course but I would be shocked | if there's much of a market in it for at least the | present. | napier wrote: | When these tools can generate layered tiff/psd images, | polygon meshes and automate UV packing; then we'll be | talking. | ZetaZero wrote: | > If an artist has a well-known art style, and this | algorithm was trained on it and can copy that style... | | A lawyer could argue that the algorithm is producing a | derivative work of the copyrighted input. | TaylorAlexander wrote: | Right but if that work isn't significantly changed from | the source, it could be ruled as infringement. DALL-E | cannot tell the users (afaik) if a result is close to any | source material. | lbotos wrote: | Well, music is not "pictures" but Marvin Gaye's family | got 5 million because Blurred Lines sounds similar enough | to a Marvin Gaye song (even though it was not a sample): | https://en.wikipedia.org/wiki/Pharrell_Williams_v._Bridge | por... | [deleted] | ChadNauseam wrote: | Even if you imitate someone's style intentionally, they | don't have grounds to sue. Style isn't copyrightable in | the US. Whether DALL-E outputs are a derivative work is a | different question, though | fanzhang wrote: | If this were a concern, a user can easily bypass this by | having a work-for-hire person add a minor transform layer | on top of the DALL-E generated images right? | JacobThreeThree wrote: | Wouldn't it have to meet the threshold of being a | "transformative" work? | | https://en.wikipedia.org/wiki/Transformative_use | leereeves wrote: | > DALL-E-generated images are most likely copyright-free | | The US Copyright Office did make a ruling that might | suggest that recently[1], but crucially, in that case, the | AI "didn't include an element of human authorship." The | board might rule differently about DALL-E because the | prompts do provide an opportunity for human creativity. | | And there's another important caveat that the felixreda.eu | link seems to miss. DALL-E output, whether or not it's | protected by copyright, can certainly _infringe_ other | copyrights, just like the output of any other mechanical | process. In short, Disney can still sue if you distribute | DALL-E generated images of Marvel characters. | | 1: https://www.theverge.com/2022/2/21/22944335/us- | copyright-off... | totetsu wrote: | Can I infringe another Dalle users rights if I take an | image generated by their acount and sell prints of it..? | unnah wrote: | DALL-E can generate recognizable pictures of Homer Simpson, | Batman and other commercial properties. Such images could | easily be considered derivative works of the original | copyrighted images that were used as training input. I'm | sure there are plenty of corporate IP lawyers ready to | argue the point at court. | numpad0 wrote: | I'm kind of surprised that no one had found "verbatim | copy" cases as were made with GitHub Copilot. Such exact | copies in photography are likely easier to go for than | with code snippets. | Nition wrote: | It might be interesting to find an image in the training | set with a long, very unique description, and try that | exact same description as input in DALL*E 2. | | Of course it's unlikely to produce the exact same image, | or if it does, you've also discovered an incredible image | compression algorithm. | obert wrote: | they still own the generated content, only grant usage. I | have mixed feelings about this confused approach, it won't | last long. | | > ...you own your Prompts and Uploads, and you agree that | OpenAI owns all Generations... | mensetmanusman wrote: | Image generating artificial intelligence is very analogous to | a camera. | | Both technologies have billions of dollars of R&D and tens of | thousands of engineers behind supply chains necessary to | create the button that a user has the press. | minimaxir wrote: | There have been decades of litigation around when/where/of | whom you can take a photo. AI generated art isn't there. | mensetmanusman wrote: | They will benefit by getting additional feedback on which | output images are most useful. | minimaxir wrote: | DALL-E 2 has a "Save" feature which is likely a data | gathering mechanism for this use case. | Melting_Harps wrote: | > "Starting today, users get full usage rights to commercialize | the images they create with DALL*E, including the right to | reprint, sell, and merchandise. This includes images they | generated during the research preview." | | >> And I just used it to create cover art for a book published | in Amazon :) | | Man... what a missed opportunity for Altman... he could have | had a really good cryptocurrency/token with a healthy ecosystem | and a creative based community if he didn't push this Worldcoin | biometric harvesting BS had he just waited for this to release | and coupled it with access to GPT. | | This is the kind of thing that Web3 (a joke) was pushing for | all along: revolutionary tech that the everyday person can | understand with it's own token based ecosystem for access with | full creative rights from the prompts. | | I wonder if he stepped down from Open AI and put it in a | figurehead as CEO could this still work? | | > Why is using a token better than using money, in this case? | | It would be better for OpenAI if it can monetize not just its | subscription based model via a token to pay for overhead and | for further R/D but also for it's ability to issue tokens it | can freely exchange for utility on it's platform for exclusive | access outside of it's capped $15 model and allow for pay as | you go models for those who don't have access to it like myself | as it's limited to 1 million users. | | I don't want an account, and I think that type of gatekeeping | wasn't cool during the gmail days either and I had early access | back then too, but I'd still personally buy $100s of dollars | worth of prompts right now since I think it is fascinating use | of NLP and I'm just one of many missed opportunities and | represent a lost userbase who just want access for specific | projects. By doing this they can still retain the caps of | useage on their platform and expand and contract them as they | see fit without excluding others. | | This in turn could justify the continual investment from the VC | World into these projects (under the guise of web3) and allow | them to scale into viable businesses and further expand the use | of AI/ML into other creative spaces, which as a person studying | AI and ML and a background in BTC, is what we all wanted to see | instead of these aimless bubbles in things like Solana or yield | farming via fake DeFi projects like Calesius that we've seen. | | It would legitimize the use of a token for use of an ecosystem | model outside of BTC, which to be honest doesn't really exist | and has still a tarnished view with all these failed projects, | while gaining reception amongst a greater audience since it's | captivated so many since it's release. | pliny wrote: | Why is using a token better than using money, in this case? | mod wrote: | I assume something to do with proving ownership via NFT. | rvz wrote: | It also means there will possibly be another renaissance of | fully automated, mass generated NFTs and tons of derivatives | and remixes flooding the NFT market in an attempt to pump the | NFT hype again. | | It doesn't matter, OpenAI wins anyway as these companies will | pour hundreds of thousands into generated images. | | It seems that the NFT grift is about to be rebooted again, such | that it isn't going to die _that_ quickly. But still, | eventually 90% of these JPEG NFTs will die anyway. | WalterSear wrote: | NFTs were never limited by artwork availability - they are | limited by wash-trading ability. | rvz wrote: | These high photorealistic images can be generated on a | mass-scale, completely automated without a human which | ultimately cuts the need for an artist to do that. | | They will be replaced by DALL*E 2 for creating these | illustrations, book covers, NFT variants, etc opening up | the whole arena to anyone to do this themselves. All it | takes is to _describe what they want in text_ and less than | a minute, the work is delivered as little as $15. | | OpenAI still wins either way. If a crypto company goes to | using DALL*E 2 to generate photorealistic NFTs, they won't | stop them and they will take the money. | WalterSear wrote: | I'm not sure I understand the point you are trying to | make. | | Art is already dirt cheap. People aren't buying NFTs for | their content. This doesn't make it appreciably easier to | con rubes. | bilsbie wrote: | Every tech should do this. Could google maps silently change | your designation to a minority owned alternative? | [deleted] | peteforde wrote: | I have been having a blast with DALL-E, spending about an hour a | day trying out wild combinations and cracking my friends up. I | cannot imagine getting bored of it; it's like getting bored with | visual stimulus, or art in general. | | In fact, I've been glad to have a 50/day limit, because it helps | me contain my hyperfocus instincts. | | The information about new pricing is, to me as someone just | enjoying making crazy imagines, a huge drag. It means that to do | the same 50/day I'd be spending $300/month. | | OpenAI: introduce a $20/month non-commercial plan for 50/day, and | I'll be at the front of the line. | jnovek wrote: | My heart sank when I saw the pricing model. | | I've been creating generative art since 2016 and I've been | anxiously waiting for my invite. I wont be able to afford to | generate the volume of images it takes to get good ones at this | price point. | | I can afford $20/mo for something like this but I just can't | swing $200 to $300 it realistically takes to get interesting | art out of these CLIP-centric models. | | Heck, the initial 50 images isn't even enough to get the hang | of how the model behaves. | blueboo wrote: | If you're technically inclined, I urge you to explore some | newer Colabs being shared in this space. They offer vastly | more configurable tools, work great for free on Google Colab, | are straightforward to run on a local machine. | | Meanwhile we should prepare ourselves for a future where the | best generative models cost a lot more as these companies | slice and dice the (huge) burgeoning market here. | pkaye wrote: | I'm sure the prices will go down each year as the computing | costs go down. | wongarsu wrote: | MidJourney is a good alternative. Maybe not quite as good as | DALL-E, but close enough, without a waitlist and with hobby- | friendly prices ($10/month for 200 images/month, or $30 for | unlimited) | commandlinefan wrote: | > trying out wild combinations and cracking my friends up | | Wait until the next edition comes out where it automatically | learns the sorts of things that crack you up and starts | generating them without any input from you. | Filligree wrote: | MidJourney gives ~unlimited generation for $30/month, and is | nearly as good. Unlike DALL-E it doesn't deliberately nerf face | generation. I've been having a blast. | irrational wrote: | Sounds kind of like scribblenauts. I would try the craziest | things to see what it could come up with. | dave_sullivan wrote: | I think people don't realize how huge these models really are. | | When they're free, it's pretty cool. But charge an amount where | there's actual profit in the product? Suddenly seems very | expensive and not economically viable for a lot of use cases. | | We are still in the "you need a supercomputer" phase of these | models for now. Something like DALLE mini is much more | accessible but the results aren't good enough. Early early | days. | TigeriusKirk wrote: | What _are_ the resources at work here? | | What are the resources needed to train this model? | | If someone just gave you the model for free, what resources | would you need to use it to generate new results? | dplavery92 wrote: | In the unCLIP/DALL-E 2 paper[0], they train the | encoder/decoder with 650M/250M images respectively. The | decoder alone has 3.5B parameters, and the combined priors | with the encoder/decoder are the in the neighborhood of ~6B | parameters. This is large, but small compared to the name- | brand "large language models" (GPT3 et. al.) | | This means the parameters of the trained model fit in | something like 7GB (decoder only, half-precision floats) to | 24GB (full model, full-precision). To actually run the | model, you will need to store those parameters, as well as | the activations for each parameter on each image you are | running, in (video) memory. To run the full model on device | at inference time (rather than r/w to host between each | stage of the model) you would probably want an enterprise | cloud/data-center GPU like an NVIDIA A100, especially if | running batches of more than one image. | | The training set size is ~97TB of imagery. I don't think | they've shared exactly how long the model trained for, but | the original CLIP dataset announcement used some benchmark | GPU training tasks that were 16 GPU-days each. If I were to | WAG the training time for their commercial DALL-E 2 model, | it'd probably be a couple of weeks of training distributed | across a couple hundred GPUs. For better insight into what | it takes to train (the different stages/components of) a | comparable model, you can look through an open-source | effort to replicate DALL-E 2.[2] | | [0] https://cdn.openai.com/papers/dall-e-2.pdf [1] | https://openai.com/blog/clip/ [2] | https://github.com/lucidrains/dalle2-pytorch | peteforde wrote: | Thanks for the really excellent insight and links. | | I do hope that the conversation starts to acknowledge the | difference between sunk costs and running costs. | | Employees, office leases and equiment are all happening, | regardless and ongoing. | | Training DALL-E 2: very expensive, but done now. A sunk | cost where every dollar coming in makes the whole | endeavor more profitable. | | Operating the trained model: still expensive, but you can | chart out exactly how expensive by factoring in hardware | and electricity. | | I believe that by not explicitly separating these | different columns when discussing expense vs profit, | we're making it harder than it needs to be to reason | about what it actually costs every time someone clicks | Generate. | woojoo666 wrote: | > This means the parameters of the trained model fit in | something like 7GB (decoder only, half-precision floats) | to 24GB (full model, full-precision) | | > you would probably want an enterprise cloud/data-center | GPU like an NVIDIA A100, especially if running batches of | more than one image. | | That doesn't seem so bad. | | _looks up price of NVIDIA A100 - $20,000_ | | oh...ok I'll probably just pay for the service then | fennecfoxen wrote: | p4d.24xlarge is only $33/hr! And you get 400 Gbe so it | should be quick to load. | binarymax wrote: | If I had to guess, based on other large models, it's in the | range of hundreds of GBs. It might even be in the TB range. | To host that model for fast production SaaS inference | requires many GPUs. An A100 has 80GB, so a dozen A100s just | to keep it in memory, and more if that doesn't meet the | request demand. | | Training requires even more GPUs, and I wouldn't be | surprised if they used more than 100 and trained over 3 | months. | judge2020 wrote: | > Training requires even more GPUs, and I wouldn't be | surprised if they used more than 100 and trained over 3 | months. | | Based on this blog post where they scale to 7,500 | 'nodes', they say: | | > A large machine learning job spans many nodes and runs | most efficiently when it has access to all of the | hardware resources on each node. | | So I wouldn't be surprised if they do have a total of | 7500+ GPUs to balance workloads between. TO add, OpenAI | has a long history of getting unlimited access to | Google's clusters of GPUs (nowadays they pay for it, | though). When they were training 'OpenAI Five' to play | Dota 2 at the highest level, they were using 256 P100 | GPUs on GCP[0] and they casually threw 256 GPUs at 'clip' | for a short while in January of 2021[1]. | | As for how they do it, see these posts: | | https://openai.com/blog/techniques-for-training-large- | neural... | | https://openai.com/blog/triton/ | | 0: https://openai.com/blog/openai-five/ | | 1: https://openai.com/blog/clip/ | dave_sullivan wrote: | Facebook released over 100 pages of notes a few months ago | detailing their training process for a model that is | similar in size. Does anyone have a link? I can't seem to | find it in my notes, googling links to posts that have been | removed or are behind the facebook walled garden. | | But I seem to remember they were running 1,000+ 32gb GPUs | for 3 months to train it and keeping that infrastructure | running day-to-day and tweaking parameters as training | continued was the bulk of the 100 pages. It is beyond the | reach of anybody but a really big company, at least in the | area of very large models, and the large models are where | all the recent results are. I wish I was more bullish on | algorithm improvements meaning you can get better results | on less hardware; there will definitely be some algorithm | improvements, but I think we might really need more | powerful hardware too. Or pooled resources. Something. | These models are huge. | ninjaranter wrote: | > Facebook released over 100 pages of notes a few months | ago detailing their training process for a model that is | similar in size. Does anyone have a link? | | Is https://github.com/facebookresearch/metaseq/blob/main/ | projec... what you're referring to? | dave_sullivan wrote: | Yes! Thank you! Very good read for anyone interested in | the field. | Ajedi32 wrote: | Training is obviously very expensive, and ideally they'd | want to recoup that investment. But I'm curious as to what | the marginal cost is to run the model after it's trained. | Is it close to 30 images per dollar, like what they're | charging now? Or do training costs make up the majority of | that price? | sinenomine wrote: | > I think people don't realize how huge these models really | are. | | They really aren't that large by the contemporary _scaling | race_ standards. DALLE-2 has 3.5B parameters, which should | fit on an old GPU like Nvidia RTX2080, especially if you | optimize your model for inference [1][2] which is commonly | done by ML engineers to minimize costs. With optimized model, | your memory footprint is ~1 byte per parameter, and some less | than 1 ratio (commonly ~0.2) of all parameters to store | intermediate activations. | | You should be able to run it on Apple M1/M2 with 16GB RAM via | CoreML pretty fine, if an order of magnitude slower than on | an A100. | | Training isn't unreasonably costly as well: you can train a | model given O(100k)$ which is less than a yearly salary of a | mid-tier developer in silicon valley. | | There is no reason these models shouldn't be trained | cooperatively and run locally on our own machines. If someone | is interested in cooperating with me on such a project, my | email is in the profile. | | 1. https://arxiv.org/abs/2206.01861 | | 2. https://pytorch.org/blog/introduction-to-quantization-on- | pyt... | andreyk wrote: | Check out Artbreeder, it is likewise a ton of fun! | | Multimodal.art (https://multimodal.art/) is working on a free | version of something like DALLE, though it's not that good as | of yet. | nsxwolf wrote: | I'm already bored of it. When you have everything, you have | nothing. | [deleted] | peteforde wrote: | I don't know how to say this without sounding like a jerk, | even if I bend over backwards to preface that this isn't my | intent: this statement says more about your creativity and | curiosity than a ceiling on how entertaining DALL-E can be to | someone who could keep multiple instances busy, like grandma | playing nine bingo cards at once. | | Knowing that it will only get better - animation cannot be | far behind - makes me feel genuinely excited to be alive. | nwienert wrote: | Dall-e has novelty, but no intent, meaning, originality. | Yes the author can be creative at generating prompts, but | visually I haven't seen it generate anything that feels | artistically interesting. If you want pre-existing concepts | in novel combinations then yes it works. | | It's good at "in the style of" but there's no "in a new | style". | | It has a house style too that tends to feel Reddit-like. | gsk22 wrote: | Isn't every "new style" just a novel combination of pre- | existing concepts? Nothing new under the sun and all | that. | | Either way, I feel like your view is an exhaustingly | pessimistic take on AI-generated art. I mean, sure, most | of what DALL-E generates is pretty mundane, but other | times I have been surprised at how bizarre and unique | certain images are. | | You seem to imply that because an AI is not human, its | art is not imbued with meaning or originality -- but I | find that an AI's non-human nature is precisely what | _makes_ the art so original and meaningful. | hansword wrote: | I would say it helps to first think what you want to get | out of it. | | If your task is "show me something that breaks through | our hyperspeed media", then I guess some obscure museum | is a better place than an ML model. | | If your task is "find the best variation on theme X" or | "quick draft visualization", they are often very useful. | I am sure there will be many further tasks to which | current and future models will be well suited. They are | not magic picture machines. At least not yet. | danielvaughn wrote: | I'm sure the novelty wears off. But I'm already coming up | with several applications for it. | | On the personal side, I've been getting into game | development, but the biggest roadblock is creating concept | art. I'm an artist but it takes a huge amount of time to get | the ideas on paper. Using DALLE will be a massive benefit and | will let me expedite that process. | | It's important to note that this is _not_ replacing my entire | creative process. But it solves the issue I have, where I 'm | lying in bed imagining a scene in my mind, but don't have the | time or energy to sketch it out myself. | ausbah wrote: | >I'm an artist but it takes a huge amount of time to get | the ideas on paper. | | this is what I really like about DALLE-mini, it's ability | to create pretty good basic outlines for a scene. it's low | resolution enough that there's room for your own creativity | while giving you a good template to spring off from. things | like poses, composition of multiple people, etc. | zanderwohl wrote: | I've used AI to try out different composition/layout | possibilities. Sometimes it comes up with an arrangement | of objects I hadn't considered. Sometimes it uses colors | in really interesting ways. Great jumping-off point for | drafting. | nsxwolf wrote: | I did notice it is very good at making small pixel art | icons/sprites. | jnovek wrote: | I've been using generative models as an art form in and of | themselves since the mid/late 2010s. I like generating | mundane things that bump right up along the edge of the | uncanny valley and finding categories of images that | challenge the model (e.g. for CLIP, phrases that have a | clear meaning but are infrequently annotated). | | Generating itself can be art. I'm not going to win a | Pulitzer here, it's for the personal joy of it, but I will | certainly never get tired of it. | thatguy0900 wrote: | I've been having a blast using it in my dungeons and | dragons games. If you type in, say, "dnd village battlemap" | it's really pretty usable. Not to mention the wild magic | weapons and monsters it can come up with. | bfgoodrich wrote: | zone411 wrote: | I have some first-hand experience about how the copyright office | views these works from creating an AI assistant to help me write | these melodies: | https://www.youtube.com/playlist?list=PLoCzMRqh5SkFwkumE578Y.... | Here is a quote from the response from the Copyright Office email | before I provided additional information about how they were | created: | | "To be copyrightable, a work must be fixed in a tangible form, | must be of human origin, and must contain a minimal degree of | creative expression" | | So some employees there are aware of the impact that AI can have. | Getting these DALL-E images copyrighted won't be trivial. I think | it will be many years before the law is clarified. | rvz wrote: | > Starting today, users get full usage rights to commercialize | the images they create with DALL*E, including the right to | reprint, sell, and merchandise. This includes images they | generated during the research preview. | | So DALL*E 2 is going to restart, revive and cause another | renaissance of fully automated and mass generated NFTs, full of | derivatives and remixing etc to pump up the crypto NFT hype | squad? | | Either way, OpenAI wins again as these crypto companies are going | to pour tens of thousands of generated images to pump their NFT | griftopia off of life support, reconfirming that it isn't going | to die that easily. | | Regardless of this possible revival attempt, 90% of these JPEG | NFTs will _eventually_ still die. | randomperson_24 wrote: | I have tried playing around with the beta access to make it | generate NFT art with different prompts, but in vail. | | I think it has not been trained on NFT art (crypto punks and so | on). | Melting_Harps wrote: | > I think it has not been trained on NFT art (crypto punks | and so on). | | How exactly are you defining NFT art? | | I mean, it can literately be anything: Dorsey sold a | screencap of his 1st tweet, Nadya from Pussy Riot did some | creative stuff, and the Ape crap was the bulk of this stuff | that got passed around. | | I think what can be gleaned from that short-lived non-sense | is that value is subjective and that the quality of a valuabe | piece of 'art' is equally as hard to define. Much the same | with its predecessor: cryptokitties. | SquareWheel wrote: | Heads up: I think you meant "in vain" rather than "in vail". | However, a similar phrase is "to no avail" which also means | that something was not successful. | ask_b123 wrote: | I think you meant "in vain" rather than "in vein". | SquareWheel wrote: | I sure did! Thank you, I've corrected that now. | Bud wrote: | I don't see why there's any credible reason to expect that | DALL-E will do anything at all to help those promoting the NFT | silliness. Two separate issues. | mdanger007 wrote: | If OpenAI could make a profit selling Dall-E images as NFT, | I'd assume they'd do it, yeah? | Melting_Harps wrote: | Altman tried his hand at that by launching Worldcoin, and | it didn't go well at all. | | So I think it's prudent that OpenAI keep the 'sell shovels' | business model instead with DALLE and GPT, at least for the | time being. | nbzso wrote: | Anything to end Corporate Memphis. Even, if we as illustrators | will not have jobs or commissions. Let's hope that every creative | human endeavour, painting, music, poetry will be replaced and | removed from the commercial realm. Then maybe we will see | artistic humanism instead of synthetic trans-humanistic "pop | art". | | Happily for me I stopped painting digitally long time ago. I even | stopped calling myself "an artist". Nowadays I paint and draw | only with real medium and call all of that "Archivist | craftsmanship with analogue medium". :) | cm2012 wrote: | I really want access, wish there was a way to pay to get in. | TekMol wrote: | I wonder how fast they will invite the 1 million users? | | I have been on the waitlist for a while and did not get access | yet. | | Did anybody get access already today? | deviner wrote: | nope, I've been for quite some time too | Sohcahtoa82 wrote: | The name "OpenAI" to me implies being open-source. | | I have an RTX 3080 and will likely be buying a 4090 when it comes | out. Will I ever be able to generate these images locally, rather | than having to use a paid service? I've done it with DALL-E Mini, | but the images from that don't hold a candle to what DALL-E 2 | produces. | whywhywhywhy wrote: | Their choice of name gets funnier every month. | ronsor wrote: | I'm not sure if any current or next-generation GPU even has | enough power to run DALL-E 2 locally. | | Anyway, OpenAI is unlikely to release the model. The situation | will like it is with GPT-3; however, it's also likely another | team will attempt to duplicate OpenAI's work. | jazzyjackson wrote: | From what i've seen it's all about the VRAM | | if you've got 60GB available to your GPU then maybe you can get | close | | I'm really curious if Apple's unified memory architecture is of | benefit here, especially a few years from now if we can start | getting 128/256GB of shared RAM on the SoC | ajafari1 wrote: | I wrote about this happening two days ago on my sub stack post, | "OpenAI will start charging businesses for images based on how | many images they request. Just like Amazon Web Services charges | businesses for usage across storage, computing, etc. Imagine a | simple webpage where OpenAI will list out their AI-job suite, | including "jobs" such as software developer, graphics designer, | customer support rep, and accountant. You can select which | service offerings you'd like to purchase ad-hoc or opt into the | full AI-job suite." | | In case you are interested in reading the whole take: | https://aifuture.substack.com/p/the-ai-battle-rages-on | arrow7000 wrote: | "Business monetises their offering" can't say I'm entirely | blown away by the prediction | ukzuck wrote: | Can anyone invite me for DALL E! | [deleted] | naillo wrote: | This news is funny since it doesn't actually change anything. | It's still a waitlist that they're pushing out slowly (not an | open beta). Nice way to stay in the news though. | outsider7 wrote: | Amazing stuff (really fun)... can it solve climate change ? | bemmu wrote: | I was supposed to be making a video game, but got a bit | sidetracked when DALL*E came out and made this website on the | side: http://dailywrong.com/ (yes I should get SSL). | | It's like The Onion, but all the articles are made with GPT-3 and | DALL*E. I start with an interesting DALL*E image, then describe | it to GPT-3 and ask it for an Onion-like article on the topic. | The results are surprisingly good. | jelliclesfarm wrote: | Love it! Better than other news I get to read these days. Some | of it rings..like the bluebird suing the cat. | | Thank you! Bookmarked! | picozeta wrote: | These are actually quite funny. A bit of a surreal touch, but | that makes them even more fun. | tiborsaas wrote: | Thanks, finally a legit news publication :) | | This was really funny :) | | http://dailywrong.com/man-finally-comfortable-just-holding-a... | biztos wrote: | So the other men in the pictures are the uncomfortable ones? | zanderwohl wrote: | Somehow these articles are more readable than typical AI- | generated search engine fodder... Is it because I'm entering | the site with an expectation of nonsense? | slavak wrote: | Probably because, by the creator's own admission, the | articles are heavily cherry-picked to make sure the output | is decent, which is probably a lot more human effort than | goes into the aforementioned search engine fodder. | | http://dailywrong.com/sample-page/ | pwillia7 wrote: | I would guess that most Spam farms are not using openAI | davinci model which is really really good, but expensive. | Just a guess. | hanselot wrote: | layer8 wrote: | This one seems like it could actually be real in Japan: | http://dailywrong.com/anime-pillow-gym-opens-in-tokyo/ ;) | busyant wrote: | This is clever. Does GPT-3 come up with the title of the | article, too? That's the funniest part. | bemmu wrote: | At first I came up with them myself, but found that it often | comes up with better ones, so I ask it for variations. | | I think I got it to even fill the title given a picture, | something like "Article picture caption: Man holding an | apple. Article title: ...". Might experiment more with that | in the future. | sillysaurusx wrote: | How do you prompt GPT-3 to come up with the titles? That's | an interesting problem. | busyant wrote: | Well, then I'm impressed with GPT-3's ability to generate | those titles! | | The combination of photo/title feels like they come from | the more absurd articles published by theonion. | | If we aren't living in a simulation, it's just a matter of | time... | lagrange77 wrote: | http://dailywrong.com/new-course-teaches-guinea-pigs-househo... | | lol | walrus01 wrote: | The results with things that are artworks or more general | concepts are fascinating, but there is for sure something | creepy with "photorealistic" human eyes and faces going on... | | If you want to see some really creepy AI generated human | "photo" faces, take a look at Bots of New York: | | https://www.facebook.com/botsofnewyork | dntrkv wrote: | Spam advertising is about to reach whole new levels of weird. | ttyyzz wrote: | NGL this shit is pretty cursed and I like it. | benbristow wrote: | From the server IP looks like you're on some managed WordPress | hosting that only offers free SSL on the more 'premium' | packages. | | Easiest way for free SSL would be to just throw the domain on | CloudFlare :) | pieter_mj wrote: | Very funny! The "Scientists Warn New Faster Toothbrush May | Cause Insanity"-story is not fake though, I've experienced it | ;) | stuaxo wrote: | This is fantastic, the fake news the world needs. | aantix wrote: | Feels like the headlines could be generated similar to the | style of "They Fight Crime!" | | "He's a hate-fuelled neurotic farmboy searching for his wife's | true killer. She's a tortured insomniac snake charmer from a | family of eight older brothers. They fight crime!" | | https://theyfightcrime.org/ | | Here's an implementation in Perl. | | http://paulm.com/toys/fight_crime.pl.txt | edm0nd wrote: | lol that site is great | | >He's an unconventional gay paranormal investigator moving | from town to town, helping folk in trouble. She's a violent | motormouth wrestler from the wrong side of the tracks. They | fight crime! | | >He's a Nobel prize-winning sweet-toothed rock star who | believes he can never love again. She's a strong-willed | communist widow with a knack for trouble. They fight crime! | | >He's an obese white trash barbarian with a secret. She's a | virginal thirtysomething traffic cop with the power to bend | men's minds. They fight crime! | aasasd wrote: | http://dailywrong.com/wp-content/uploads/2022/07/DALL%C2%B7E... | | Hot dang. Some Reddit subs can be auto-generated now. | tildef wrote: | Actually got a chuckle out of the duck one | (http://dailywrong.com/man-finally-comfortable-just- | holding-a...). Thanks! I hope your keep generating them. Kind | of wish there weren't a newsletter nag, but on the other hand | it adds to the realism. Could be worthwhile to generate the | text of the nag with gpt too; call it a kind of lampshading. | aantix wrote: | Parenting > "Gillette Releases a New Razor for Babies" | bemmu wrote: | I loved how it just consistently decided that if babies have | facial hair, it's always white fluff. | lancesells wrote: | I think it's because it's using images of babies with soap | on their face to learn. Still funny though! | uxamanda wrote: | The part where you have to confirm you are not a robot to | subscribe to the mailing list is the best part of this, my new | favorite website. | drusepth wrote: | Haha, I was in a very similar boat when I built | https://novelgens.com -- I was also supposed to be making a | video game, but got a bit sidetracked with VQGAN+CLIP and other | text/image generation models. | | Now I'm using that content _in_ the video game. I wonder if you | could use these articles as some fake news in your game, too. | :) | astroalex wrote: | This is amazing! Honestly one of the first uses of GPT3/DALL E | that has held my attention for longer than a few seconds. | mark_l_watson wrote: | I tried DALLE once and liked the generated images. Not really my | thing, but so cool. | | What I do use is OpenAI's GPT-3 APIs, I am a paying customer. | Great tool! | lagrange77 wrote: | Has anyone else had problems with the 'Generate Variations' | functions lately? Tried it out first 3 days ago, and it says | 'Something went wrong. Please try again later, or contact | support@openai.com if this is an ongoing problem.' everytime | since then. | Plough_Jogger wrote: | Is this referring to the first version of the model, or DALL-E 2? | nharada wrote: | Something I haven't seen anyone talking about with these huge | models: how do future models get trained when more content online | is model generated to start with? Presumably you don't wanna | train a model on autogenerated images or text, but you can't | necessarily know which is which. | cpach wrote: | Makes me think of Ouroboros | | https://en.m.wikipedia.org/wiki/Ouroboros | nharada wrote: | Reminds me of https://en.wikipedia.org/wiki/Low- | background_steel | espadrine wrote: | In this situation, the low-background steel is the MS-COCO | dataset, associated with the Frechet inception distance | computed by comparing the statistical divergence between | the high-level vector outputs of passing MS-COCO images | through Google's InceptionV3 classifier, and passing DALL-E | images (or its competitors) through it. | | For now at least, there is a detectable difference in | variety. | zitterbewegung wrote: | This should be a step in cleaning your data to begin with. If | you don't know the providence of your data then you shouldn't | be even training with it. | | Getting humans to refine your data is the best solution right | now and many companies and researches go with this approach. | jazzyjackson wrote: | s/ide/ena | Voloskaya wrote: | > Getting humans to refine your data is the best solution | right now | | Source ? | | All those big models are trained with data for which the | source is not known or vetted. The amount of data needed is | not human-refinable. | | For example for language models we train mostly on subsets of | CommonCrawl + other things. CommonCrawl data is "cleaned" by | filtering out known bad sources and with some heuristics such | as ratio of text to other content, length of sentences etc. | | The final result is a not too dirty but not clean huge pile | of data that comes from millions of sources that no human as | vetted and that no one in the team using the data knows | about. | | The same applies to large images dataset, e.g. Laon 400m that | also comes from CommonCrawl and is not curated. | nharada wrote: | But how would you know? A random string of text or an image | with the watermark removed is going to be very hard to | distinguish generated from human written. | FrenchDevRemote wrote: | You can't use humans to manually refine a dataset on the | scale of GPT-3 or DALL-E | | Clip was trained on 400,000,000 images, GPT is roughly 180B | tokens, at 1-2 tokens per word, that's 120,000,000,000 words. | pshc wrote: | At least cleaning it up is an embarrassingly parallel | problem, so if you had the resources to throw incentives at | millions of casual gamers, you might make a nice dent on | Clip. | zanderwohl wrote: | Alternatively, making a captcha where half the data is | unlabeled, and half is labeled, forcing users to | categorize data for you as they log into accounts. | Jleagle wrote: | The images i have created all have a watermark.. This is at | least one way to filter out most images, by the same AI at | least. | goolulusaurs wrote: | It's a cybernetic feedback system. Dalle is used to create new | images, the images that people find most interesting and | noteworthy get shared online, and reincorporated into the | training data, but now filtered through human desire. | [deleted] | can16358p wrote: | I think with the terms requiring explicitly telling which | images/parts were generated, they could be filtered out and | prevent a feedback loop of "generated in/generated out" images. | I'm sure there will be some illegal/against terms of use cases | there but the majority should represent fair use. | mikeyouse wrote: | This precise thing is causing a funny problem in specialty | areas. People are using e.g. Google Lens to identify plants, | birds and insects, which sometimes returns wrong answers e.g. | say it sees a picture of a Summer Tanager and calls it a | Cardinal. If the people then post "Saw this Cardinal" and the | model picks up that picture/post and incorporates it into its | training set, it's just reinforcing the wrong identification.. | bobbylarrybobby wrote: | https://xkcd.com/978/ | Pxtl wrote: | Then that's a cardinal now. | scarmig wrote: | That's not really a new problem, though. At one point someone | got some bad training data about an old Incan town, the | misidentification spread, and nowadays we train new human | models to call it Macchu Picchu. | vanillaicesquad wrote: | The difference between the name of an old Incan town and a | modern time plant identification mistake is that maybe the | plant is poisonous. | | Made with gpt3 | blfr wrote: | Training on auto generated images collected off the Internet is | gonna be fine for a while since the images surfacing will be | curated (ie. selected as good/interesting/valuable) still | mostly by humans. | jmartrican wrote: | I wonder if human artists can demand that their work not be | used for modelling. So as the robots are stuck using older | styles for their creations, the humans will keep creating new | styles of art. | naillo wrote: | One interesting comment about this is that some models actually | benefit from being fed their own output. Alphafold for instance | was fed with its own 'high likelihood' outputs (as demis | hassabis described in his lex friedman interview). | gwern wrote: | My discussion of this issue (which actually comes up in like | every DALL-E 2 discussion on HN): | https://www.lesswrong.com/posts/uKp6tBFStnsvrot5t/what-dall-... | ccmcarey wrote: | That's about 10x as expensive as it should be | karencarits wrote: | $15 for 115 iterations/460 images? | ccmcarey wrote: | Yep. During the alpha it was (50*6) 300 images per day, by | their pricing model that would be $300 a month now | WalterSear wrote: | $15 for 115 _attempts_ to get usable images. | kache_ wrote: | Give it some time. Other organizations will race to the bottom. | | They might even provide image generation at a loss to drive | people to their platforms. | gverri wrote: | $0.13/prompt can only be useful for artists/end users. Anyone | thinking about using this at scale would need a 20/30x | reduction in price. But there's still no API available so I | think that will change with time. Maybe they will add different | tiers based on volume. | jeanlucas wrote: | Thing is, as a current user: you rarely get it right in the | first prompt, you can iterate 10 times until you get what you | want. | | I spent several tries yesterday to get this angle "from the | ground up": | https://labs.openai.com/s/mz8LiyvkI8KwD2luJ6MrS23m | dntrkv wrote: | So $1.30 for getting a result that would have cost how much | to pay someone to make? Not to mention the 59 other | variations you would have. | raisedbyninjas wrote: | When there is a competitor, they can adjust pricing. For now, | it's virtually magic. | bradleybuda wrote: | You should ship a competitor! Sounds like you found a great | market opportunity. | scifibestfi wrote: | What are you basing that on? What should the price be? The | training and generation are probably expensive. | thorum wrote: | Until you consider the level of demand for this product, which | is surely higher than OpenAI can scale to with the number of | GPUs they have. If they price it lower they'll be overwhelmed. | Workaccount2 wrote: | Welcome to SaaS. | isoprophlex wrote: | Wait until someone trains a model like this, for porn. | | There seems to be a post-DALLE obscenity detector on openAI's | tool, as so far I've found it to be entirely robust against | deliberate typos designed to avoid simple 'bad word lists'. Ask | it for a "pruple violon" and you get purple violins... you get | the deal. | | "Metastable" prompts that may or may not generate obscene | (content with nudity, guns, violence as I've found) results | sometimes shown non-obscene generations, and sometimes trigger a | warning. | jug wrote: | I've thought about this and in fact porn generation sounds like | a good thing?? It ensures that it's victimless. Of course, | there is a problem with generation of illegal (underage) porn | but other than this, I think it could be helpful for this | world. | jowday wrote: | If I had to guess, I'd bet they have a supervised classifier | trained to recognize bad content (violence, porn, etc) that | they use to filter the generated images before passing them to | the user, on top of the bad word lists. | cmarschner wrote: | Most likely they just take the one from bing. Or, if they | trained a better one, it goes vice versa sooner or later | isoprophlex wrote: | Exactly! | zionic wrote: | Honestly that part pisses me off. Who cares if their AI "makes | porn" or something "offensive". | fishtoaster wrote: | I suspect it's more a business restriction than a moral one. | If OpenAI allows people to make porn with these tools, people | will make a _ton_ of it. OpenAI will become known as "the | company that makes the porn-generating AIs," not "the company | that keeps pushing the boundaries of AI." Being known as the | porn-ai company is bad for business, so they restrict it. | alana314 wrote: | I tried the term "cockeyed" and got a TOS violation notice | [deleted] | justinzollars wrote: | I would love access to this in order to design Silver Rounds. If | you work at open API please reach out! | dharbin wrote: | I find it amusing that they suggest DALL-E, which typically | generates lovecraftian nightmare images, for making children's | story illustrations. | driverdan wrote: | How so? If you give it prompts for children story illustrations | with a detailed description it will not give you "lovecraftian | nightmare images". | throwaway0x7E6 wrote: | yeah. dalle is "so bad it's good". | | it's great for post-post-ironic memes, but I don't see it being | useful for anything else | arkitaip wrote: | No wireless. Less space than a nomad. Lame. | andybak wrote: | Have you tried any of the "human or Dall-E" tests? | | How did you score? | | I only scored as well as I did because I knew the kind of | stylistic choices to look out for. In terms of "quality" I | really don't understand how you've reached this conclusion. | throwaway0x7E6 wrote: | I've only seen this thing | https://huggingface.co/spaces/dalle-mini/dalle-mini | | is it not dall-e? | _flux wrote: | It is not and that's why OpenAI asked them to change the | name, which they did. | throwaway0x7E6 wrote: | oh. I retract my OP then | andybak wrote: | It's a reimplementation. | | It's a long way off in terms of quality (at the moment | anyway) | astrange wrote: | It's a model inspired by DALLE 1 but it's not even very | close to that. | | But it does seem to know a lot of things the real DALLE2 | doesn't. | Nevin1901 wrote: | I don't like how they're charging money for Dalle, yet they don't | have an API available. ___________________________________________________________________ (page generated 2022-07-20 23:00 UTC)