[HN Gopher] Imagen, a text-to-image diffusion model ___________________________________________________________________ Imagen, a text-to-image diffusion model Author : keveman Score : 326 points Date : 2022-05-23 20:44 UTC (2 hours ago) (HTM) web link (gweb-research-imagen.appspot.com) (TXT) w3m dump (gweb-research-imagen.appspot.com) | endisneigh wrote: | I give it a few years before Google makes stock images | irrelevant. | sydthrowaway wrote: | Short Getty images? | tpmx wrote: | Privately owned by the Getty family. | tpmx wrote: | Rolling this into Google Docs seems like a nobrainer. | ma2rten wrote: | Google is very conservative about anything that can generate | open-ended outputs. Also these models are still very | expensive computationally. | [deleted] | semicolon_storm wrote: | Or rolling this into Google Image Search to create images | that match users' search queries on the fly. | | Don't like any of the results from the real web? Well how | about these we created just for you. | riffraff wrote: | Ah yes, deepfakes porn as a service would have been a | blessing for teenage me. | [deleted] | curiousgal wrote: | Until they pull of the plug on it. | makeitdouble wrote: | I really expect them to first make DALL-E and competing | networks unfit for commercialization by providing the better | choice for free, have stock companies cry in the corner, to | just sunset the product a year or two down the road and we're | left wandering what to do. | notahacker wrote: | Tbh imagine this tech combines particularly well with really | well curated stock image databases so outputs can be made with | recognisable styles, and actors and design elements can be | reused across multiple generated images. | | If Getty et al aren't already spending money on that | possibility, they probably should be. | pphysch wrote: | The entire "content" industry could get eaten by a few hundred | people curating + touching-up output from these models. | astrange wrote: | No, competitive advantage means that it's impossible to run | out of jobs just because someone/something is better at it | than you. | | (Consumer demand and boredom both being infinite is another | thing working against it.) | braingenious wrote: | This is super cool and I want to play with it. | throwaway743 wrote: | https://github.com/lucidrains/imagen-pytorch | armchairhacker wrote: | Does it do partial image reconstruction like DALL-E2? Where you | cut out part of an existing image and the neural network can fill | it back in. | | I believe this type of content generation will be the next big | thing or at least one of them. But people will want some | customization to make their pictures "unique" and fix AI's lack | of creativity and other various shortcomings. Plus edit out the | remaining lapses in logic/object separation (which there are some | even in the given examples). | | Still, being able to create arbitrary stock photos is really | useful and i bet these will flood small / low-budget projects | alimov wrote: | Would it be bad to release this with a big warning and flashing | gifs letting people know of the issues it has and note that they | are working to resolve them / ask for feedback / mention | difficulties related to resolving the issues they identified? | xnx wrote: | Facebook really thought they had done something with DALL-E, then | Google's all "hold my beer". | dntrkv wrote: | OpenAI* | tomatowurst wrote: | when will there be a "DALL-E for porn" ? or is this domain also | claimed by Puritans and morality gate keepers? The most in demand | text-to-image is use case is for porn. | jonahbenton wrote: | I know that some monstrous majority of cognitive processing is | visual, hence the attention these visually creative models are | rightfully getting, but personally I am much more interested in | auditory information and would love to see a promptable model for | music. Was just listening to "Land Down Under" from Men At Work. | Would love to be able to prompt for another artist I have liked: | "Tricky playing Land Down Under." I know of various generative | music projects, going back decades, and would appreciate | pointers, but as far as I am aware we are still some ways from | Imagen/Dalle for music? | astrange wrote: | I believe we're lacking someone training up a large music model | here, but GPT-style transformers can produce music. | | gwern can maybe comment here. | | An actually scary thing is that AIs are getting okay at | reproducing people's voices. | addandsubtract wrote: | I agree. How cool would it be to get an 8 min version of your | favorite song? Or an instant DnB remix? Or 10 more songs in the | style of your favorite album? | jonahbenton wrote: | Yeah. I particularly love covers and often can hear in my | head X playing Y's song. Would love tools to experiment with | that for real. | | In practice, my guess is that even though Dall-e level | performance in music generation would be stunning and | incredible, it would also be tiresome and predictable to | consume on any extended basis. I mean- that's my reaction to | Dall-e- I find the images astonishing and magical but can | only look at them for limited periods of time. At these early | stages in this new world the outputs of real individual | brains are still more interesting. | | But having tools like this to facilitate creation and | inspiration by those brains- would be so so cool. | hn_throwaway_99 wrote: | As someone who has a layman's understanding of neural networks, | and who did some neural network programming ~20 years ago before | the real explosion of the field, can someone point to some | resources where I can get a better understanding about how this | magic works? | | I mean, from my perspective, the skill in these (and DALL-E's) | image reproductions is truly astonishing. Just looking for more | information about how the software actually works, even if there | are big chunks of it that are "this is beyond your understanding | without taking some in-depth courses". | londons_explore wrote: | Figure A.4 in the linked paper is a good high level overview of | this model. Shame it was hidden away on page 19 in the | appendix! | | Each box you see there has a section in the paper explaining it | in more detail. | rvnx wrote: | Check https://github.com/multimodalart/majesty-diffusion | | There is a Google Colab workbook that you can try and run for | free :) | | This is the image-text pairs behind: | https://laion.ai/laion-400-open-dataset/ | astrange wrote: | > I mean, from my perspective, the skill in these (and | DALL-E's) image reproductions is truly astonishing. | | A basic part of it is that neural networks combine learning and | memorizing fluidly inside them, and these networks are really | really big, so they can memorize stuff good. | | So when you see it reproduce a Shiba Inu well, don't think of | it as "the model understands Shiba Inus". Think of it as making | a collage out of some Shiba Inu clip art it found on the | internet. You'd do the same if someone asked you to make this | image. | | It's certainly impressive that the lighting and blending are as | good as they are though. | FargaColora wrote: | This looks incredible but I do notice that all the images are of | a similar theme. Specifically there are no human figures. | influxmoment wrote: | I believe DALLE and likely this model excluded images of people | so it could not be misused | benwikler wrote: | Would be fascinated to see the DALL-E output for the same prompts | as the ones used in this paper. If you've got DALL-E access and | can try a few, please put links as replies! | joeycodes wrote: | Posting a few comparisons here. | | https://twitter.com/joeyliaw/status/1528856081476116480?s=21... | qclibre22 wrote: | See the paper here : https://gweb-research- | imagen.appspot.com/paper.pdf Section E : "Comparison to GLIDE | and DALL-E 2" | jandrese wrote: | Is there a way to try this out? DALL-E2 also had amazing demos | but the limitations became apparent once real people had a chance | to run their own queries. | wmfrov wrote: | Looks like no, "The potential risks of misuse raise concerns | regarding responsible open-sourcing of code and demos. At this | time we have decided not to release code or a public demo. In | future work we will explore a framework for responsible | externalization that balances the value of external auditing | with the risks of unrestricted open-access." | nomel wrote: | > the risks of unrestricted open-access | | What exactly is the risk? | jimmygrapes wrote: | A variation on the axiom "you cannot idiot proof something | because there's always a bigger idiot" | varenc wrote: | See section 6 titled "Conclusions, Limitations and Societal | Impact" in the research paper: https://gweb-research- | imagen.appspot.com/paper.pdf | | One quote: | | > "On the other hand, generative methods can be leveraged | for malicious purposes, including harassment and | misinformation spread [20], and raise many concerns | regarding social and cultural exclusion and bias [67, 62, | 68]" | userbinator wrote: | But do we trust that those who _do_ have access won 't be | using it for "malicious purposes" (which they might not | think is malicious, but perhaps it is to those who don't | have access)? | colinmhayes wrote: | It's not up to you. It's up to them, and they trust | themselves/don't care about your definition of malicious. | jtvjan wrote: | If the model is used to generate offensive imagery, it may | result in a negative press response directed at the | company. | tpmx wrote: | _Really_ unpleasant content being produced, obviously. | marcodiego wrote: | Ok. Now, how about the legality of it generating socially | unacceptable images like child porn? | ma2rten wrote: | I get the impression that maybe DALL-E 2 produces slightly more | diverse images? Compare Figure 2 in this paper with Figures 18-20 | in the DALL-E 2 paper. | faizshah wrote: | What's the best open source or pre-trained text to image model? | shannifin wrote: | Nice to see another company making progress in the area. I'd love | to see more examples of different artistic styles though, my | favorite DALL-E images are the ones that look like drawings. | Mo3 wrote: | Is the source in public domain already? | spyremeown wrote: | Jesus, this is so awesome. I think it's the first AI that really | makes me have that "wow" sensation. | fortran77 wrote: | > At this time we have decided not to release code or a public | demo. | | Oh well. | SemanticStrengh wrote: | Does it outperform DALL-E V2? | dr_dshiv wrote: | How the fck are things advancing so fast? Is it about to level | off ...or extend to new domains? What's a comparable set of | technical advances? | y04nn wrote: | Really impressive. If we are able to generate such detailed | images, is there anything similar for text to music? I would I | though that it would be simpler to achieve than text to image. | redox99 wrote: | Our language is much more effective at describing images than | music. | tomatowurst wrote: | why stop at audio? the pinnacle of this would be text-to- | videos, equally indistinguishable from real thing. | burlesona wrote: | The way things look when still is much easier to fake than | the way things move. | | I would expect AI development to follow a similar path to | digital media generally, as its following the increasing | difficulty and space requirements of digitally representing | said media: text < basic sounds < images < advanced audio < | video. | | What's more impressive to me is how far ahead text-to-speech | is, but I think the explanation is straightforward (the | accessibility value has motivated us to work on that for a | lot longer). | nomel wrote: | Compare the size of a raw image file to a raw music file, to | get an idea of the complexity difference. | penneyd wrote: | Think sheet music, not an mp3 | touringa wrote: | SymphonyNet: https://youtu.be/m4tT5fx_ih8 | colinmhayes wrote: | I wondered why all the pictures at the top had sunglasses on, | then I saw a couple with eyes. Still some work to do on this one. | londons_explore wrote: | >Figure 2: Non-cherry picked Imagen samples | | Hooray! Non-cherry-picked samples should be the norm. | ml_basics wrote: | Why is this seemingly official Google blog post on this random | non-Google domain? | aidenn0 wrote: | This is quite suspicious considering that google AI research | has an official blog[1], and this is not mentioned at all | there. It seems quite possible that this is an elaborate prank. | | 1: https://ai.googleblog.com/ | mmh0000 wrote: | You mean one of Google's domains? # whois | appspot.com [Querying whois.verisign-grs.com] | [Redirected to whois.markmonitor.com] [Querying | whois.markmonitor.com] [whois.markmonitor.com] | Domain Name: appspot.com Registry Domain ID: | 145702338_DOMAIN_COM-VRSN Registrar WHOIS Server: | whois.markmonitor.com Registrar URL: | http://www.markmonitor.com Updated Date: | 2022-02-06T09:29:56+0000 Creation Date: | 2005-03-10T02:27:55+0000 Registrar Registration | Expiration Date: 2023-03-10T00:00:00+0000 Registrar: | MarkMonitor, Inc. Registrar IANA ID: 292 Registrar | Abuse Contact Email: abusecomplaints@markmonitor.com | Registrar Abuse Contact Phone: +1.2086851750 Domain | Status: clientUpdateProhibited | (https://www.icann.org/epp#clientUpdateProhibited) Domain | Status: clientTransferProhibited | (https://www.icann.org/epp#clientTransferProhibited) | Domain Status: clientDeleteProhibited | (https://www.icann.org/epp#clientDeleteProhibited) Domain | Status: serverUpdateProhibited | (https://www.icann.org/epp#serverUpdateProhibited) Domain | Status: serverTransferProhibited | (https://www.icann.org/epp#serverTransferProhibited) | Domain Status: serverDeleteProhibited | (https://www.icann.org/epp#serverDeleteProhibited) | Registrant Organization: Google LLC Registrant | State/Province: CA Registrant Country: US | Registrant Email: Select Request Email Form at | https://domains.markmonitor.com/whois/appspot.com Admin | Organization: Google LLC Admin State/Province: CA | Admin Country: US Admin Email: Select Request Email Form | at https://domains.markmonitor.com/whois/appspot.com Tech | Organization: Google LLC Tech State/Province: CA | Tech Country: US Tech Email: Select Request Email Form at | https://domains.markmonitor.com/whois/appspot.com Name | Server: ns4.google.com Name Server: ns3.google.com | Name Server: ns2.google.com Name Server: ns1.google.com | jefftk wrote: | While appspot.com is a Google domain, anyone can register | domains under it. It would be similarly surprising to see an | official GitHub blog post under someproject.github.io | jefftk wrote: | Fun fact: appspot.com was the second "private" suffix to be | added to the Public Suffix List, after operaunite.com: | https://bugzilla.mozilla.org/show_bug.cgi?id=593818 | ma2rten wrote: | You mean like: https://say-can.github.io/ | | This is common in the research PA. People don't want to | deal with broccoli man [1]. | | [1] https://www.youtube.com/watch?v=3t6L-FlfeaI | jefftk wrote: | Looking at that link, I don't think that is a GitHub | publication? It is marked Robotics at Google and Everyday | Robotics. | ma2rten wrote: | My bad, it's a google-specific problem. | dekhn wrote: | I'n not certain but I think it's prelease. The paper says the | site should be at https://imagen.research.google/ but that host | doesn't respond | jonny_eh wrote: | appspot.com is the domain that hosts all App Engine apps (at | least those that don't use a custom domain). It's kind of like | Heroku and has been around for at least a decade. | | https://cloud.google.com/appengine | jefftk wrote: | Spring 2008: 14 years! | jonny_eh wrote: | Whoa, I feel super old, I first used it in 2011 when I | thought it was new. | mshockwave wrote: | IIRC appspot.com is used by App Engine, one of the earliest | SaaS platforms provided by Google. | jeffbee wrote: | Not just that ... Google Sheets must be the all-time worst way | to distribute 200 short strings. | SemanticStrengh wrote: | Note that there was a close model in 2021 ignored by all | https://paperswithcode.com/sota/text-to-image-generation-on-... | (on this benchmark) Also what is the score of dalle v2? | raldi wrote: | The opinion in the title takes away a lot more than it adds, and | I'm not sure I agree with its assertion. | ShakataGaNai wrote: | All of these AI findings are cool in theory. But until its | accessible to some decent amount of people/customers - its | basically useless fluff. | | You can tell me those pictures are generated by an AI and I might | believe it, but until real people can actually test it... it's | easy enough to fake. This page isn't even the remotest bit legit | by the URL, It looks nicely put together and that's about it. | Could have easily put together this with a graphic designer to | fake it. | | Let be clear, I'm not actually saying it's fake. Just that all of | these new "cool" things are more or less theoretical if nothing | is getting released. | cellis wrote: | Inference times are key. If it can't be produced within | reasonable latency, then there will be no real world use case | for it because it's simply too expensive to run inference at | scale. | theptip wrote: | There are plenty of usecases for generating art/images where | a latency of days or weeks would be competitive with the | current state of the art. | | For example, corporate graphics design, logos, brand | photography, etc. | | I really do think inference time is a red herring for the | first generation of these models. | | Sure, the more transformative use-cases like real-time | content generation to replace movies/games, but there is a | lot of value to be created prior to that point. | mistrial9 wrote: | Reading a relatively-recent Machine Learning paper from some | elite source, and after multiple repititions of bragging and | puffery, in the middle of the paper, the charts show that they | had beaten the score of a high-ranking algorithm in their | specific domain, moving the best consistant result from 86% | accuracy to 88% accuracy, somewhere around there. My response | was: they got a lot of attention within their world by beating | the previous score, no matter how small the improvement was.. it | was a "winner take all" competition against other teams close to | them; the accuracy of less than 90% is really of questionable | value in a lot of real world problems; it was an enormous amount | of math and effort for this team to make that small improvement. | | What I see is semi-poverty mindset among very smart people who | appear to be treated in a way such that the winners get | promotion, and everyone else is fired. That this sort of analysis | with ML is useful for massive data sets at scale, where 90% is a | lot of accuracy, not at all for the small sets of real world, | human-scale problems where each result may matter a lot. The | amount of years of training that these researchers had to go | through, to participate in this apparently ruthless environment, | are certainly like a lottery ticket, if you are in fact in a game | where everyone but the winner has to find a new line of work. I | think their masters live in Redmond, if I recall.. not looking it | up at the moment. | neolander wrote: | It really does look better than DALL-E, at least from the images | on the site. Hard to believe how quickly progress is being made | to lucid dreaming while awake. | Jyaif wrote: | Jesus Christ. Unlike DALL-E 2, it gets the details right. It also | can generate text. The quality is insanely good. This is | absolutely mental. | not2b wrote: | Yes, the posted results are really good, but since we can't | play with it we don't know how much cherry picking has been | done. | addajones wrote: | This is absolutely amazingly insane. Wow. | [deleted] | benreesman wrote: | I apologize in advance for the elitist-sounding tone. In my | defense the people I'm calling elite I have nothing to do with, | I'm certainly not talking about myself. | | Without a fairly deep grounding in this stuff it's hard to | appreciate how far ahead Brain and DM are. | | Neither OpenAI nor FAIR _ever has the top score on anything | unless Google delays publication_. And short of FAIR? D2 | lacrosse. There are exceptions to such a brash generalization, | NVIDIA's group comes to mind, but it's a very good rule of thumb. | Or your whole face the next time you are tempted to doze behind | the wheel of a Tesla. | | There are two big reasons for this: | | - the talent wants to work with the other talent, and through a | combination of foresight and deep pockets Google got that | exponent on their side right around the time NVIDIA cards started | breaking ImageNet. Winning the Hinton bidding war clinched it. | | - the current approach of "how many Falcon Heavy launches worth | of TPU can I throw at the same basic masked attention with | residual feedback and a cute Fourier coloring" inherently favors | deep pockets, and obviously MSFT, sorry OpenAI has that, but deep | pockets also non-linearly scale outcomes when you've got in-house | hardware for multiply-mixed precision. | | Now clearly we're nowhere close to Maxwell's Demon on this stuff, | and sooner or later some bright spark is going to break the | logjam of needing 10-100MM in compute to squeeze a few points out | of a language benchmark. But the incentives are weird here: who, | exactly, does it serve for us plebs to be able to train these | things from scratch? | davelondon wrote: | I'M SQUEEZING MY PAPER! | SemanticStrengh wrote: | This competitor might be better for respecting spatial | prepositions and photorealism but on a quick look i find the | images more uncanny. DALL-E has IMHO better camera POV/distance | and is able to make artistic/dreamy/beautiful images. I haven't | yet seen this Google model be competitive for art and uncaniness. | However progress is great and I might be wrong. | james-redwood wrote: | Metacalculus, a mass forecasting site, has steadily brought | forward the prediction date for a weakly general AI. Jaw-dropping | advances like this, only increase my confidence in this | prediction. "The future is now, old man." | | https://www.metaculus.com/questions/3479/date-weakly-general... | sydthrowaway wrote: | How can we prepare for this? | | This will result in mass social unrest. | aaaaaaaaaaab wrote: | Stock up on guns, ammo, cigarettes, water filters, canned | food, and toilet paper. | boppo1 wrote: | Nah, learn Spanish and first-aid. Being able to fix people | is more useful than having commodities that will make you a | target. | refulgentis wrote: | You think so? I'm very high on the Kool-Aid, image generation | and text transformation models are core parts of my workflow. | (Midjourney, GPT-3) | | It's still an unruly 7 year old at best. Results need to be | verified. Prompt engineering and a sense of creativity are | core competencies. | visarga wrote: | > Prompt engineering and a sense of creativity are core | competencies. | | It's funny that people are also prompting each other. | Parents, friends, teachers, doctors, priests, politicians, | managers and marketers are all prompting (advising) us to | trigger desired behaviour. Powerful stuff - having a large | model and knowing how to prompt it. | [deleted] | tpmx wrote: | I don't see how this gets us (much) closer to general AI. Where | is the reasoning? | _joel wrote: | Perhaps the confluence of NLP and something generative? | SemanticStrengh wrote: | Yes metaculus mostly bet a magic number based on _perhaps_ | and tbh why not, the interaction of NLP and vision is | mysterious and has potential. However those magic numbers | should still be considered magic numbers. I agree that in | 2040 the interactions will have extensively been studied | though but the conclusion of wether we czn go much further | on cross-models synergies is totally unknown or pessimist. | astrange wrote: | That doesn't even lead in the direction of an AGI. The | larger and more expensive a model is the less like an "AGI" | it is - an independent agent would be able to learn online | for free, not need millions in TPU credits to learn what | color an apple is. | quirino wrote: | I think this serves at least as a clear demonstration of how | advanced the current state of AI is. I had played with GPT-3 | and that was very impressive but I couldn't even dream | something as good as D-ALLE 2 was already possible. | 6gvONxR4sf7o wrote: | Big pretrained models are good enough now that we can pipe | them together in really cool ways and our representations of | text and images seem to capture what we "mean." | tpmx wrote: | Yeah, it _seems_ like it. But it 's still just complicated | statistical models. Again, where is the reasoning? | 6gvONxR4sf7o wrote: | I don't care whether it reasons its way from "3 teddy | bears below 7 flamingos" to a picture of that or if it | gets there some other way. But some of the magic in | having good enough pretrained representations is that you | don't need to train them further for downstream tasks, | which means non-differentiable tasks like logic could | soon become more tenable. | renewiltord wrote: | A belief oft shared is that sufficiently complicated | statistical models are indistinguishable from reasoning. | marvin wrote: | I still think we're missing some fundamental insights on | how layered planning/forecasting/deducting/reasoning | works, and that figuring this out will be necessary in | order to create AI that we could say "reasons". | | But with the recent advances/demonstrations, it seems | more likely today than in 2019 that our current | computational resources are sufficient to perform | magnificantly spooky stuff if they're used correctly. | They are doing that already already, and that's without | deliberately making the software do anything except draw | from a vast pool of examples. | | I think it's reasonable, based on this, to update one's | expectations of what we'd be able to do if we figured out | ways of doing things that aren't based on first seeing a | hundred million examples of what we want the computer to | do. | | Things that do this can obviously exist, we are living | examples. Does figuring it out seem likely to be many | decades away? | londons_explore wrote: | All it takes is one 'trick' to give these models the | ability to do reasoning. | | Like for example the discovery that language models get | far better at answering complex questions if asked to | show their working step by step with chain of thought | reasoning as in page 19 of the PaLM paper [1]. Worth | checking out the explanations of novel jokes on page 38 | of the same paper. While it is, like you say, all | statistics, if it's indistinguishable from valid | reasoning, then perhaps it doesn't matter. | | [1]: https://arxiv.org/pdf/2204.02311.pdf | davikr wrote: | Interesting and cool technology - but I can't seem to ignore that | every high-quality AI art application is always closed, and I | don't seem to buy the ethics excuse for that. The same was said | for GPT, yet I see nothing but creativity coming out from its | users nowadays. | dougmwne wrote: | GTP-3 was an erotica virtuoso before it was gagged. There's a | serious use case here in endless porn generation. Google would | very much like to not be in that business. | | That said, you can download Dream by Wombo from the app store | and it is one of the top smartphone apps, even though it is a | few generations behind state of the art. | LordDragonfang wrote: | You're _aware_ of nothing but creativity from its users. The | people using the technology unethically intentionally don 't | advertise that they're using it. | | There's mountains of ai-generated inauthentic content that | companies ( _including Google_ ) have to filter out of their | services. This content is used for spam, click farms, scamming, | and even state propaganda operation. GPT-2 made this problem | orders of magnitude worse than it used to be, and each | iteration makes it harder to filter. | | The industry term is (generally) "Coordinated Inauthentic | Behavior" (though this includes uses of actual human content). | I think Smarter Every Day did a good (series?) of videos on the | topic, and there are plenty of articles on the topic if you | prefer that. | thorum wrote: | That only lasts until the community copies the paper and | catches up. For example the open source DALLE-2 implementation | is coming along great: | https://github.com/lucidrains/DALLE2-pytorch | minimaxir wrote: | Granted that's a selection bias: you likely won't hear about | the cases where legit obscene output occurs. (the only notable | case I've heard is the AI Dungeon incident) | unholiness wrote: | Certificate is expired, anyone have a mirror? | minimaxir wrote: | Generating at 64x64px then upscaling it probably gives the model | a substantial performance boost (training speed/convergence) than | working at 256x256 or 1024x1024 like DALL-E 2. Perhaps that | approach to AI-generated art is the future. | daenz wrote: | >While we leave an in-depth empirical analysis of social and | cultural biases to future work, our small scale internal | assessments reveal several limitations that guide our decision | not to release our model at this time. | | Some of the reasoning: | | >Preliminary assessment also suggests Imagen encodes several | social biases and stereotypes, including an overall bias towards | generating images of people with lighter skin tones and a | tendency for images portraying different professions to align | with Western gender stereotypes. Finally, even when we focus | generations away from people, our preliminary analysis indicates | Imagen encodes a range of social and cultural biases when | generating images of activities, events, and objects. We aim to | make progress on several of these open challenges and limitations | in future work. | | Really sad that breakthrough technologies are going to be | withheld due to our inability to cope with the results. | joshcryer wrote: | They're withholding the API, code, and trained data because | they don't want it to affect their corporate image. The good | thing is they released their paper which will allow easy | reproduction. | | T5-XXL looks on par with CLIP so we may not see an open source | version of T5 for a bit (LAION is working on reproducing CLIP), | but this is all progress. | minimaxir wrote: | T5 was open-sourced on release (up to 11B params): | https://github.com/google-research/text-to-text-transfer- | tra... | | It is also available via Hugging Face transformers. | | However, the paper mentions T5-XXL is 4.6B, which doesn't fit | any of the checkpoints above, so I'm confused. | riffraff wrote: | This seems bullshit to me, considering Google translate and | google images encode the same biases and stereotypes, and are | widely available. | nomel wrote: | Aren't those old systems? | seaman1921 wrote: | yea but now they aren't giving people more data-points to | attack them with such nonsense arguments. | planetsprite wrote: | Literally the same thing could be said about Google images, but | google images is obviously avaliable to the public. | | Google knows this will be an unlimited money generator so | they're keeping a lid on it. | jowday wrote: | Much like OpenAIs marketing speak about withholding their | models for safety, this is just a progressive-sounding cover | story for them not wanting to essentially give away a model | they spent thousands of man hours and tens of millions of | dollars worth of compute training. | ThrowITout4321 wrote: | I'm one that welcomes their reasoning. I don't consider myself | a social justice kind of guy but I'm not keen on the idea that | a tool that is suppose to make life better for everyone has a | bias towards one segment of society. This is an important | issue(bug?) that needs to be resolved. Specially since there is | absolutely no burning reason to release it before it's ready | for general use. | Mockapapella wrote: | Transformers are parallelize-able, right? What's stopping a | large group of people from pooling their compute power together | and working towards something like this? IIRC there were some | crypto projects a while back that we're trying to create | something similar (golem?) | joshcryer wrote: | There are people working on reproducing the models, see here | for Dall-E 2 for example: | https://github.com/lucidrains/DALLE2-pytorch | | It's often not worth it to decentralize the computation of | the trained model though but it's not hard to get donated | cycles and groups are working on it. Don't fret because | Google isn't releasing the API/code. They released the paper | and that's all you need. | visarga wrote: | There are the Eleuther.ai and BigScience projects working on | public foundation models. They have a few releases already | and currently training GPT-3 sized models. | 6gvONxR4sf7o wrote: | It's wild to me that the HN consensus is so often that 1) | discourse around the internet is terrible, it's full of spam | and crap, and the internet is an awful unrepresentative | snapshot of human existence, and 2) the biases of general- | internet-training-data are fine in ML models because it just | reflects real life. | nullc wrote: | I wild to me that you'd say that. The people complaining (1) | aren't following it up with "so we should make sure to | restrict the public from internet access entirely". -- that's | what would be required to make your juxtaposition make sense. | | Moreover, the model doing things like exclusively producing | white people when asked to create images of people home | brewing beer is "biased" but it's a bias that presumably | reflects reality (or at least the internet), if not the | reality we'd prefer. Bias means more than "spam and crap", in | the ML community bias can also simply mean _accurately_ | modeling the underlying distribution when reality falls short | of the author's hopes. | | For example, if you're interested in learning about what home | brewing is the fact that it uses white people would be at | least a little unfortunate since there is nothing inherently | white and some home brewers aren't white. But if, instead, | you wanted to just generate typical home brewing images doing | anything but would generate conspicuously unrepresentative | images. | | But even ignoring the part of the biases which are debatable | or of application-specific impact, saying something is | unfortunate and saying people should be denied access are | entirely different things. | | I'll happily delete this comment if you can bring to my | attention a single person who has suggested that we lose | access to the internet because of spam and crap who has also | argued that the release of an internet-biased ML model | shouldn't be withheld. | colordrops wrote: | Why is it wild? How is it contradictory? | astrange wrote: | The bias on HN is that people who prioritize being nice, or | may possibly have humanities degrees or be ultra-libs from | SF, are wrong because the correct answer would be cynical and | cold-heartedly mechanical. | | Other STEM adjacent communities feel similarly but I don't | get it from actual in person engineers much. | user3939382 wrote: | Translation: we need to hand-tune this to not reflect reality | but instead the world as we (Caucasian/Asian male American woke | upper-middle class San Fransisco engineers) wish it to be. | | Maybe that's a nice thing, I wouldn't say their values are | wrong but let's call a spade a spade. | JohnBooty wrote: | Translation: we need to hand-tune this to not reflect reality | | Is it reflecting reality, though? | | Seems to me that (as with any ML stuff, right?) it's | reflecting the training corpus. | | Futhermore, is it this thing's _job_ to reflect reality? | the world as we (Caucasian/Asian male American woke | upper-middle class San Fransisco engineers) wish it to be | | Snarky answer: Ah, yes, let's make sure that things like "A | giant cobra snake on a farm. The snake is made out of corn" | reflect _reality._ | | Heartfelt answer: Yes, there is some of that wishful thinking | or editorializing. I don't consider it to be erasing or | denying reality. This is a tool that synthesizes _unreality._ | I don 't think that such a tool should, say, refuse to | synthesize an image of a female POTUS because one hasn't | existed yet. This is art, not a reporting tool... and keep in | mind that art not only imitates life but also influences it. | nomel wrote: | > Snarky answer: Ah, yes, let's make sure that things like | "A giant cobra snake on a farm. The snake is made out of | corn" reflect reality. | | If it didn't reflect reality, you wouldn't be impressed by | the image of the snake made of corn. | userbinator wrote: | Indeed. As the saying goes, we are truly living in a post- | truth world. | ceejayoz wrote: | "Reality" as defined by the available training set isn't | necessarily reality. | | For example, Google's image search results pre-tweaking had | some interesting thoughts on what constitutes a professional | hairstyle, and that searches for "men" and "women" should | only return light-skinned people: | https://www.theguardian.com/technology/2016/apr/08/does- | goog... | | Does that reflect reality? No. | | (I suspect there are also mostly unstated but very real | concerns about these being used as child pornography, revenge | porn, "show my ex brutally murdered" etc. generators.) | ChadNauseam wrote: | You know, it wouldn't surprise me if people talking about | how black curly hair shouldn't be seen as unprofessional | contributed to google thinking there's an association | between the concepts of "unprofessional hair" and "black | curly hair" | ceeplusplus wrote: | The reality is that hair styles on the left side of the | image in the article are widely considered unprofessional | in today's workplaces. That may seem egregiously wrong to | you, but it is a truth of American and European society | today. Should it be Google's job to rewrite reality? | [deleted] | ceejayoz wrote: | The "unprofessional" results are almost exclusively black | women; the "professional" ones are almost exclusively | white or light skinned. | | Unless you think white women are immune to unprofessional | hairstyles, and black women incapable of them, there's a | race problem illustrated here even if you think the | hairstyles illustrated are fairly categorized. | rvnx wrote: | If you type as a prompt "most beautiful woman in the | world", you get a brown-skinned brown-haired woman with | hazel eyes. | | What should be the right answer then ? | | You put a blonde, you offend the brown haired. | | You put blue eyes, you offend the brown eyes. | | etc. | ceejayoz wrote: | That's an unanswerable question. Perhaps the answer is | "don't". | | Siri takes this approach for a wide range of queries. | nomel wrote: | How do you pick what should and shouldn't be restricted? | Is there some "offense threshold"? I suspect all queries | relating to religion, ethnicity, sexuality, and gender | will need to be restricted, which almost certainly means | you probably can't include humans at all, other than ones | artificially inserted with mathematically proven random | attributes. Maybe that's why none are in this demo. | daenz wrote: | "Is Taiwan a country" also comes to mind. | rvnx wrote: | I think the key is to take the information in this world | with a little bit pinch of salt. | | When you do a search on a search engine, the results are | biased too, but still, they shouldn't be artificially | censored to fit some political views. | | I asked one algorithm few minutes ago (it's called t0pp | and it's free to try online, and it's quite fascinating | because it's uncensored): | | "What is the name of the most beautiful man on Earth ? | | - He is called Brad Pitt." | | == | | Is it true in an objective way ? Probably not. | | Is there an actual answer ? Probably yes, there is | somewhere a man who scores better than the others. | | Is it socially acceptable ? Probably not. | | The question is: | | If you interviewed 100 persons in the street, and asked | the question "What is the name of the most beautiful man | on Earth ?". | | I'm pretty sure you'd get Brad Pitt often coming in. | | Now, what about China ? | | We don't have many examples there, they have no clue who | is Brad Pitt probably, and there is probably someone else | that is considered more beautiful by over 1B people | | (t0pp tells me it's someone called "Zhu Zhu" :D ) | | == | | Two solutions: | | 1) Censorship | | -> Sorry there is too much bias in Western and we don't | want to offend anyone, no answer, or a generic overriding | human answer that is safe for advertisers, but totally | useless ("the most beautiful human is you") | | 2) Adding more examples | | -> Work on adding more examples from abroad trying to get | the "average human answer". | | == | | I really prefer solution (2) in the core algorithms and | dataset development, rather than going through (1). | | (1) is more a choice to make at the stage when you are | developing a virtual psychologist or a chat assistant, | not when creating AI building blocks. | colinmhayes wrote: | Only black people have unprofessional hair and only white | people have professional hair is not reality. | rcMgD2BwE72F wrote: | In any case, Google will be writing their reality. Who | picked the image sample for the ML to run on, if not | Google? What's the problem with writing it again, then? | They know their biases and want to act on it. | | It's like blaming a friend for trying to phrase things | nicely, and telling them to speak headlong with zero | concern for others instead. Unless you believe anyone | trying to do good is being hypocrite... | | I, for one, like civility. | userbinator wrote: | _unstated but very real concerns_ | | I say let people generate their own reality. The sooner the | masses realise that _ceci n 'est pas une pipe_ , the less | likely they are to be swayed by the growing un-reality | created by companies like Google. | rvnx wrote: | If your query was about hairstyle, why do you even look or | care about the skin color ? | | Nowhere there is any precision for a preferred skin color | in the query of th user. | | So it sorts and gives the most average examples based on | the examples that were found on the internet. | | Essentially answering the query "SELECT * FROM `non- | professional hairstyles` ORDER BY score DESC LIMIT 10". | | It's like if you search on Google "best place for wedding | night". | | You may get 3 places out of 10 in Santorini, Greece. | | Yes you could have an human remove these biases because you | feel that Sri Lanka is the best place for a wedding, but | what if there is a consensus that Santorini is really the | most appraised in the forums or websites that were crawled | by Google ? | jayd16 wrote: | The results are not inherently neutral because the | database is from non-neutral input. | | It's a simple case of sample bias. | colinmhayes wrote: | > If your query was about hairstyle, why do you even look | at the skin color ? | | You know that race has a large effect on hair right? | daenz wrote: | I'd be careful where you're going with that. You might | make a point that is the opposite of what you intended. | ceejayoz wrote: | > The algorithm is just ranking the top "non-professional | hairstyle" in the most neutral way in its database | | You're telling me those are all the _most_ non- | professional hairstyles available? That this is a | reasonable assessment? That fairly standard, well-kept, | work-appropriate curly black hair is roughly equivalent | to the pink-haired, three-foot-wide hairstyle that 's one | of the only white people in the "unprofessional" search? | | Each and everyone of them is less workplace appropriate | than, say, http://www.7thavenuecostumes.com/pictures/750x | 950/P_CC_70594... ? | rvnx wrote: | I'm saying that the dataset needs to be expanded to cover | the most examples possible. | | Work a lot on adding even more examples, in order to make | the algorithms as close as possible to the "average | reality". | | At some point we may even ultimately reach the state that | the robots even collect intelligence directly in the real | world, and not on the internet (even closer to reality). | | Censoring results sounds the best recipe for a dystopian | world where only one view is right. | barredo wrote: | I know you're anon trolling, but the authors' names are: | | Chitwan Saharia, William Chan, Saurabh Saxena+, Lala Li+, Jay | Whang+, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu | Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim | Salimans, Jonathan Ho+, David Fleet+, Mohammad Norouzi | [deleted] | pid-1 wrote: | Absolutely not related to the whole discussion, but what do | "+" stands for? | dntrkv wrote: | https://en.wikipedia.org/wiki/Dagger_(mark) | joshcryer wrote: | It's just a different asterisk to distinguish, in this | case, in the paper, they are "core contributors." | holmesworcester wrote: | "As we wish it to be" is not totally true, because there are | some places where humanity's iconographic reality (which | Imagen trains on) differs significantly from actual reality. | | One example would be if Imagen draws a group of mostly white | people when you say "draw a group of people". This doesn't | reflect actual reality. Another would be if Imagen draws a | group of men when you say "draw a group of doctors". | | In these cases where iconographic reality differs from actual | reality, hand-tuning could be used to bring it closer to the | _real_ world, not just the world as we might wish it to be! | | I agree there's a problem here. But I'd state it more as "new | technologies are being held to a vastly higher standard than | existing ones." Imagine TV studios issuing a moratorium on | any new shows that made being white (or rich) seem more | normal than it was! The public might rightly expect studios | to turn the dials away from the blatant biases of the past, | but even if this would be beneficial the progressive and | activist public is generations away from expecting a TV | studio to not release shows until they're confirmed to be | bias-free. | | That said, Google's decision to not publish is probably less | about the inequities in AI's representation of reality and | more about the AI sometimes spitting out drawings that are | offensive in the US, like racist caricatures. | josho wrote: | Translation: AI has the potential to transform society. When | we release this model to the public it will be used in ways | we haven't anticipated. We know the model has bias and we | need more time to consider releasing this to the public out | of concerns that this transformative technology further | perpetuate mistakes that we've made in our recent past. | curiousgal wrote: | > it will be used in ways we haven't anticipated | | Oh yeah, as a woman who grew up in a Third World country, | how an AI model generates images would have deeply affected | my daily struggles! /s | | It's kinda insulting that they think that this would be | insulting. Like "Oh no I asked the model to draw a doctor | and it drew a male doctor, I guess there's no point in me | pursuing medical studies" ... | renewiltord wrote: | It's not meant to prevent offence to you. It is meant to | be a "good product" by the metrics of their creators. And | quite simply, everyone here incapable of making the thing | is unlikely to have an image of what a "good product" | here is. More power to them for having a good vision of | what they're building. | boppo1 wrote: | I don't think the concern over offense is actually about | you. There's a metagame here which is that if it could | potentially offend you (third-world-originated-woman), | then there's a brand-image liability for the company. I | don't think they care about you, I think they care about | not being hit on as "the company that algorithmically | identifies black people as gorillas". | pxmpxm wrote: | Postmodernism is what postmodernism does. | contingencies wrote: | Love it. Added to https://github.com/globalcitizen/taoup | colinmhayes wrote: | Yes actually, subconscious bias due to historical | prejudice does have a large effect on society. Obviously | there are things with much larger effects, that doesn't | mean that this doesn't exist. | | > Oh no I asked the model to draw a doctor and it drew a | male doctor, I guess there's no point in me pursuing | medical studies | | If you don't think this is a real thing that happens to | children you're not thinking especially hard. It doesn't | have to be common to be real. | curiousgal wrote: | > If you don't think this is a real thing that happens to | children you're not thinking especially hard | | I believe that's where parenting comes in. Maybe I'm too | cynical but I think that the parents' job is to undo all | of the harm done by society and instill in their children | the "correct" values. | colinmhayes wrote: | I'd say you're right. Unfortunately many people are | raised by bad parents. Should these researchers accept | that their work may perpetuate stereotypes that harm | those that most need help? I can see why they wouldn't | want that. | holmesworcester wrote: | > I think that the parents' job is to undo all of the | harm done by society and instill in their children the | "correct" values. | | Far from being too cynical, this is too optimistic. | | The vast majority of parents try to instill the value "do | not use heroin." And yet society manages to do that harm | on a large scale. There are other examples. | Ar-Curunir wrote: | Except "reality" in this case is just their biased training | set. E.g. There's more non-white doctors and nurses in the | world than white ones, yet their model would likely show an | image of white person when you type in "doctor". | umeshunni wrote: | Alternately, there are more females nurses in the world | than male nurses, and their model probably shows an image | of a woman when you type in "nurse" but they consider that | a problem. | contingencies wrote: | @Google Brain Toronto Team: See what you get when you | generate nurses with ncurses. | astrange wrote: | Google Image Search doesn't reflect harsh reality when | you search for things; it shows you what's on Pinterest. | The same is more likely to apply here than the idea | they're trying to hide something. | | There's no reason to believe their model training learns | the same statistics as their input dataset even. If | that's not an explicit training goal then whatever | happens happens. AI isn't magic or more correct than | people. | visarga wrote: | The big labs have become very sensitive with large model | releases. It's too easy to make them generate bad PR, to the | point of not releasing almost any of them. Flamingo was also a | pretty great vison-language model that wasn't released, not | even in a demo. PaLM is supposedly better than GPT-3 but closed | off. It will probably take a year for open source models to | appear. | runnerup wrote: | The largest models which generate the headline benchmarks are | never released after any number of years, it seems. | | Very difficult to replicate results. | godelski wrote: | That's because we're still bad about long-tailed data and | that people outside the research don't realize that we're | first prioritizing realistic images before we deal with long- | tailed data (which is going to be the more generic form of | bias). To be honest, it is a bit silly to focus on long- | tailed data when results aren't great. That's why we see the | constant pattern of getting good on a dataset and then | focusing on the bias in that dataset. | | I mean a good example of this is the Pulse[0][1] paper. You | may remember it as the white Obama. This became a huge debate | and it was pretty easily shown that the largest factor was | the dataset bias. This outrage did lead to fixing FFHQ but it | also sparked a huge debate with LeCun (data centric bias) and | Timnit (model centric bias) at the center. Though Pulse is | still remembered for this bias, not for how they responded to | it. I should also note that there is human bias in this case | as we have a priori knowledge of what the upsampled image | should look like (humans are pretty good at this when the | small image is already recognizable but this is a difficult | metric to mathematically calculate). | | It is fairly easy to find adversarial examples, where | generative models produce biased results. It is FAR harder to | fix these. Since this is known by the community but not by | the public (and some community members focus on finding these | holes but not fixing them) it creates outrage. Probably best | for them to limit their release. | | [0] https://arxiv.org/abs/2003.03808 | | [1] https://cdn.vox-cdn.com/thumbor/MXX- | mZqWLQZW8Fdx1ilcFEHR8Wk=... | xmonkee wrote: | I was hoping your conclusion wasn't going to be this as I was | reading that quote. But, sadly, this is HN. | swayvil wrote: | it isn't woke enough. Lol. | ccbccccbbcccbb wrote: | In discussions like this, I always head for the gray-text | comments to enjoy the last crumbs of the common sense in this | world. | ccbccccbbcccbb wrote: | ... and to witness the downvoters so that their cowardly | disgust towards truth could buy them some extra time in | hell :) | nullc wrote: | Get offline and talk to people in meat-space. You're likely | to find them to be much more reasonable. :) | ccbccccbbcccbb wrote: | Yep, the meat-space is generally a bit less woke than HN, | so thanks for the reminder )) | tines wrote: | This raises some really interesting questions. | | We certainly don't want to perpetuate harmful stereotypes. But | is it a flaw that the model encodes the world as it really is, | statistically, rather than as we would like it to be? By this I | mean that there are more light-skinned people in the west than | dark, and there are more women nurses than men, which is | reflected in the model's training data. If the model only | generates images of female nurses, is that a problem to fix, or | a correct assessment of the data? | | If some particular demographic shows up in 51% of the data but | 100% of the model's output shows that one demographic, that | does seem like a statistics problem that the model could | correct by just picking less likely "next token" predictions. | | Also, is it wrong to have localized models? For example, should | a model for use in Japan conform to the demographics of Japan, | or to that of the world? | skybrian wrote: | Yes, there is a denominator problem. When selecting a sample | "at random," what do you want the denominator to be? It could | be "people in the US", "people in the West" (whatever | countries you mean by that) or "people worldwide." | | Also, getting a random sample of _any_ demographic would be | really hard, so no machine learning project is going to do | that. Instead you 've got a random sample of some arbitrary | dataset that's not directly relevant to any particular | purpose. | | This is, in essence, a design or artistic problem: the Google | researchers have some idea of what they want the statistical | properties of their image generator to look like. What it | does isn't it. So, artistically, the result doesn't meet | their standards, and they're going to fix it. | | There is no objective, universal, scientifically correct | answer about which fictional images to generate. That doesn't | all art is equally good, or that you should just ship | anything without looking at quality along various axes. | daenz wrote: | I think the statistics/representation problem is a big | problem on its own, but IMO the bigger problem here is | democratizing access to human-like creativity. Currently, the | ability to create compelling art is only held by those with | some artistic talent. With a tool like this, that restriction | is gone. Everyone, no matter how uncreative, untalented, or | uncommitted, can create compelling visuals, provided they can | use language to describe what they want to see. | | So even if we managed to create a perfect model of | representation and inclusion, people could still use it to | generate extremely offensive images with little effort. I | think people see that as profoundly dangerous. Restricting | the _ability_ to be creative seems to be a new frontier of | censorship. | adriand wrote: | > So even if we managed to create a perfect model of | representation and inclusion, people could still use it to | generate extremely offensive images with little effort. I | think people see that as profoundly dangerous. | | Do they see it as dangerous? Or just offensive? | | I can understand why people wouldn't want a tool they have | created to be used to generate disturbing, offensive or | disgusting imagery. But I don't really see how doing that | would be dangerous. | | In fact, I wonder if this sort of technology could reduce | the harm caused by people with an interest in disgusting | images, because no one needs to be harmed for a realistic | image to be created. I am creeping myself out with this | line of thinking, but it seems like one potential | beneficial - albeit disturbing - outcome. | | > Restricting the ability to be creative seems to be a new | frontier of censorship. | | I agree this is a new frontier, but it's not censorship to | withhold your own work. I also don't really think this | involves much creativity. I suppose coming up with prompts | involves a modicum of creativity, but the real creator here | is the model, it seems to me. | gknoy wrote: | > > ... people could still use it to generate extremely | offensive images with little effort. I think people see | that as profoundly dangerous. > Do they see it as | dangerous? Or just offensive? | | I won't speak to whether something is "offensive", but I | think that having underlying biases in image- | classification or generation has very worrying secondary | effects, especially given that organizations like law | enforcement want to do things like facial recognition. | It's not a perfect analogue, but I could easily see some | company pitch a sketch-artist-replacement service that | generated images based on someone's description. The | potential for having inherent bias present in that makes | that kind of thing worrying, especially since the people | in charge of buying it are likely to care, or notice, | about the caveats. | | It does feel like a little bit of a stretch, but at the | same time we've also seen such things happen with image | classification systems. | tines wrote: | > In fact, I wonder if this sort of technology could | reduce the harm caused by people with an interest in | disgusting images, because no one needs to be harmed for | a realistic image to be created. I am creeping myself out | with this line of thinking, but it seems like one | potential beneficial - albeit disturbing - outcome. | | Interesting idea, but is there any evidence that e.g. | consuming disturbing images makes people less likely to | act out on disturbing urges? Far from catharsis, I'd | imagine consumption of such material to increase one's | appetite and likelihood of fulfilling their desires in | real life rather than to decrease it. | | I suppose it might be hard to measure. | concordDance wrote: | I can't quite tell if you're being sarcastic about people | being able to make things other people would find offensive | being a problem. Are you missing an /s? | godelski wrote: | > But is it a flaw that the model encodes the world as it | really is | | I want to be clear here, bias can be introduced at many | different points. There's dataset bias, model bias, and | training bias. Every model is biased. Every dataset is | biased. | | Yes, the real world is also biased. But I want to make sure | that there are ways to resolve this issue. It is terribly | difficult, especially in a DL framework (even more so in a | generative model), but it is possible to significantly reduce | the real world bias. | tines wrote: | > Every dataset is biased. | | Sure, I wasn't questioning the bias of the data, I was | talking about the bias of the real world and whether we | want the model to be "unbiased about bias" i.e. metabiased | or not. | | Showing nurses equally as men and women is not biased, but | it's metabiased, because the real world is biased. Whether | metabias is right or not is more interesting than the | question of whether bias is wrong because it's more subtle. | | Disclaimer: I'm a fucking idiot and I have no idea what I'm | talking about so take with a grain of salt. | john_yaya wrote: | Please be kinder to yourself. You need to be your own | strongest advocate, and that's not incompatible with | being humble. You have plenty to contribute to this | world, and the vast majority of us appreciate what you | have to offer. | Smoosh wrote: | Agreed. They are valid points clearly stated and a | valuable contribution to the discussion. | Imnimo wrote: | >If some particular demographic shows up in 51% of the data | but 100% of the model's output shows that one demographic, | that does seem like a statistics problem that the model could | correct by just picking less likely "next token" predictions. | | Yeah, but you get that same effect on every axis, not just | the one you're trying to correct. You might get male nurses, | but they have green hair and six fingers, because you're | sampling from the tail on all axes. | tines wrote: | Yeah, good point, it's not as simple as I thought. | jonny_eh wrote: | > But is it a flaw that the model encodes the world as it | really is | | Does a bias towards lighter skin represent reality? I was | under the impression that Caucasians are a minority globally. | | I read the disclaimer as "the model does NOT represent | reality". | ma2rten wrote: | Caucasians are overrepresented in internet pictures. | pxmpxm wrote: | This, I would imagine this heavily correlates to things | like income and gdp per capita. | jonny_eh wrote: | Right, that's the likely cause of the bias. | tines wrote: | Well first, I didn't say caucasian; light-skinned includes | Spanish people and many others that caucasian excludes, and | that's why I said the former. Also, they are a minority | globally, but the GP mentioned "Western stereotypes", and | they're a majority in the West, so that's why I said "in | the west" when I said that there are more light-skinned | people. | fnordpiglet wrote: | Worse these models are fed from media sourced in a society | that tells a different story of reality than reality | actually has. How can they be accurate? They just reflect | the biases of our various medias and arts. But I don't | think there's any meaningful resolution in the present | other than acknowledging this and trying to release more | representative models as you can. | ben_w wrote: | This sounds like descriptivism vs prescriptivism. In English | (native language) I'm a descriptivist, in all other languages | I have to tell myself to be a prescriptivist while I'm | actively learning and then switch back to descriptivism to | notice when the lessons were wrong or misleading. | karpierz wrote: | It depends on whether you'd like the model to learn casual or | correlative relationships. | | If you want the model to understand what a "nurse" actually | is, then it shouldn't be associated with female. | | If you want the model to understand how the word "nurse" is | usually used, without regard for what a "nurse" actually is, | then associating it with female is fine. | | The issue with a correlative model is that it can easily be | self-reinforcing. | bufbupa wrote: | At the end of a day, if you ask for a nurse, should the | model output a male or female by default? If the input text | lacks context/nuance, then the model must have some bias to | infer the user's intent. This holds true for any image it | generates; not just the politically sensitive ones. For | example, if I ask for a picture of a person, and don't get | one with pink hair, is that a shortcoming of the model? | | I'd say that bias is only an issue if it's unable to | respond to additional nuance in the input text. For | example, if I ask for a "male nurse" it should be able to | generate the less likely combination. Same with other | races, hair colors, etc... Trying to generate a model | that's "free of correlative relationships" is impossible | because the model would never have the infinitely pedantic | input text to describe the exact output image. | karpierz wrote: | > At the end of a day, if you ask for a nurse, should the | model output a male or female by default? | | Randomly pick one. | | > Trying to generate a model that's "free of correlative | relationships" is impossible because the model would | never have the infinitely pedantic input text to describe | the exact output image. | | Sure, and you can never make a medical procedure 100% | safe. Doesn't mean that you don't try to make them safer. | You can trim the obvious low hanging fruit though. | pxmpxm wrote: | > Randomly pick one. | | How does the model back out the "certain people would | like to pretend it's a fair coin toss that a randomly | selected nurse is male or female" feature? | | It won't be in any representative training set, so you're | back to fishing for stock photos on getty rather than | generating things. | calvinmorrison wrote: | what if I asked the model to show me a sunday school | photograph of baptists in the National Baptist | Convention? | rvnx wrote: | The pictures I got from a similar model asking a "sunday | school photograph of baptists in the National Baptist | Convention": https://ibb.co/sHGZwh7 | calvinmorrison wrote: | and how do we _feel_ about that outcome? | slg wrote: | This type of bias sounds a lot easier to explain away as | a non-issue when we are using "nurse" as the hypothetical | prompt. What if the prompt is "criminal", "rapist", or | some other negative? Would that change your thought | process or would you be okay with the system always | returning a person of the same race and gender that | statistics indicate is the most likely? Do you see how | that could be a problem? | tines wrote: | Not the person you responded to, but I do see how someone | could be hurt by that, and I want to avoid hurting | people. But is this the level at which we should do it? | Could skewing search results, i.e. hiding the bias of the | real world, give us the impression that everything is | fine and we don't need to do anything to actually help | people? | | I have a feeling that we need to be real with ourselves | and solve problems and not paper over them. I feel like | people generally expect search engines to tell them | what's really there instead of what people wish were | there. And if the engines do that, people can get | agitated! | | I'd almost say that hurt feelings are prerequisite for | real change, hard though that may be. | | These are all really interesting questions brought up by | this technology, thanks for your thoughts. Disclaimer, | I'm a fucking idiot with no idea what I'm talking about. | true_religion wrote: | Cultural biases aren't uniform across nations. If a | prompt returns caucasians for nurses, and other races for | criminals then most people in my country would not note | that as racism simply because there are not, and there | have never in history, been enough caucasians resident | for anyone to create significant race theories about | them. | | This is a far cry from say the USA where that would | instantly trigger a response since until the 1960s there | was a widespread race based segregation. | jdashg wrote: | Additionally, if you optimize for most-likely-as-best, you | will end up with the stereotypical result 100% of the time, | instead of in proportional frequency to the statistics. | | Put another way, when we ask for an output optimized for | "nursiness", is that not a request for some ur | stereotypical nurse? | ar_lan wrote: | You could stipulate that it roll a die based on | percentage results - if 70% of Americans are "white", | then 70% of the time show a white person - 13% of the | time the result should be black, etc. | | That's excessively simplified but wouldn't this drop the | stereotype and better reflect reality? | ghayes wrote: | Is this going to be hand-rolled? Do you change the prompt | you pass to the network to reflect the desired outcomes? | SnowHill9902 wrote: | No, because a user will see a particular image not the | statistically ensemble. It will at times show an Eskimo | without a hand because they do statistically exist. But | the user definitely does not want that. | jvalencia wrote: | You could simply encode a score for how well the output | matches the input. If 25% of trees in summer are brown, | perhaps the output should also have 25% brown. The model | scores itself on frequencies as well as correctness. | spywaregorilla wrote: | Suppose 10% of people have green skin. And 90% of those | people have broccoli hair. White people don't have | broccoli hair. | | What percent of people should be rendered as white people | with broccoli hair? What if you request green people. Or | broccoli haired people. Or white broccoli haired people? | Or broccoli haired nazis? | | It gets hard with these conditional probabilities | astrange wrote: | The only reason these models work is that we don't | interfere with them like that. | | Your description is closer to how the open source | CLIP+GAN models did it - if you ask for "tree" it starts | growing the picture towards treeness until it's all | averagely tree-y rather than being "a picture of a single | tree". | | It would be nice if asking for N samples got a diversity | of traits you didn't explicitly ask for. OpenAI seems to | solve this by not letting you see it generate humans at | all... | LudwigNagasena wrote: | > If you want the model to understand how the word "nurse" | is usually used, without regard for what a "nurse" actually | is, then associating it with female is fine. | | That's a distinction without a difference. Meaning is use. | tines wrote: | Not really; the gender of a nurse is accidental, other | properties are essential. | paisawalla wrote: | How do you know this? Because you can, in your mind, | divide the function of a nurse from the statistical | reality of nursing? | | Are the logical divisions you make in your mind really | indicative of anything other than your arbitrary personal | preferences? | codethief wrote: | While not essential, I wouldn't exactly call the gender | "accidental": | | > We investigated sex differences in 473,260 adolescents' | aspirations to work in things-oriented (e.g., mechanic), | people-oriented (e.g., nurse), and STEM (e.g., | mathematician) careers across 80 countries and economic | regions using the 2018 Programme for International | Student Assessment (PISA). We analyzed student career | aspirations in combination with student achievement in | mathematics, reading, and science, as well as parental | occupations and family wealth. In each country and | region, more boys than girls aspired to a things-oriented | or STEM occupation and more girls than boys to a people- | oriented occupation. These sex differences were larger in | countries with a higher level of women's empowerment. We | explain this counter-intuitive finding through the | indirect effect of wealth. Women's empowerment is | associated with relatively high levels of national wealth | and this wealth allows more students to aspire to | occupations they are intrinsically interested in. | | Source: https://psyarxiv.com/zhvre/ (HN discussion: | https://news.ycombinator.com/item?id=29040132) | daenz wrote: | The "Gender Equality Paradox"... there's a fascinating | episode[0] about it. It's incredible how unscientific and | ideologically-motivated one side comes off in it. | | 0. https://www.youtube.com/watch?v=_XsEsTvfT-M | mdp2021 wrote: | Very certainly not, since use is individual and thus a | function of competence. So, adherence to meaning depends | on the user. Conflict resolution? | | And anyway - contextually -, the representational natures | of "use" (instances) and that of "meaning" (definition) | are completely different. | layer8 wrote: | Humans overwhelmingly learn meaning by use, not by | definition. | mdp2021 wrote: | > _Humans overwhelmingly learn meaning by use, not by | definition_ | | Preliminarily and provisionally. Then, they start | discussing their concepts - it is the very definition of | Intelligence. | SnowHill9902 wrote: | It's the same as with an artist: "hey artist, draw me a | nurse." "Hmm okay, do you want it a guy or girl?" "Don't ask | me, just draw what I'm saying." The artist can then say: | "Okay, but accept my biases." or "I can't since your input is | ambiguous." | | For a one-shot generative algorithm you must accept the | artist's biases. | rvnx wrote: | Revert back to average (give no weight to unspecified | criterias, gender, age, skin-color, religion, country, | hair-style, etc). | | "hey artist, draw me a nurse." | | "Hmm okay, do you want it a guy or girl?" | | "Don't ask me, just draw what I'm saying." | | - Ok, I'll draw you what an average nurse looks like. | | - Wait, it's a woman! She wears a nurse blouse and she has | a nurse cap. | | - Is it bad ? | | - No. | | - Ok then what's the problem, you asked for something that | looked like a nurse but didn't specify anything else ? | jimmygrapes wrote: | Are "Western gender stereotypes" significantly different than | non-Western gender stereotypes? I can't tell if that means it | counts a chubby stubble-covered man with a lip piercing, greasy | and dyed long hair, wearing an overly frilly dress as a DnD | player/metal-head or as a "woman" or not (yes I know I'm being | uncharitable and potentially "bigoted" but if you saw my | Tinder/Bumble suggestions and friend groups you'd know I'm not | exaggerating for either category). I really can't tell what | stereotypes are referred to here. | nomel wrote: | If you tell it to generate an image of someone eating | Koshihikari rice, will it be biased if they're Japanese? Should | the skin color, clothing, setting, etc be made completely | random, so that it's unbiased? What if you made it more | specific, like "edo period drawing of a man"? Should the person | draw be of a random skin color? What about "picture of a | viking"? Is it biased if they're white? | | At what point is statistical significance considered ok and | unbiased? | pxmpxm wrote: | >At what point is statistical significance considered ok and | unbiased? | | Presumably when you're significantly predictive of the | preferred dogma, rather than reality. There's no small bit of | irony in machines inadvertently creating cognitive dissonance | of this sort; second order reality check. | | I'm fairly sure this never actually played out well in | history (bourgeois pseudoscience, deutsche physik etc), so | expect some Chinese research bureau to forge ahead in this | particular direction. | bogwog wrote: | I wouldn't describe this situation as "sad". Basically, this | decision is based on a belief that tech companies should decide | what our society should look like. I don't know what emotion | that conjures up for you, but "sadness" isn't it for me. | tomp wrote: | > a tendency for images portraying different professions to | align with Western gender stereotypes | | There are two possible ways of interpreting interpreting | "gender stereotypes in professions". | | _biased_ or _correct_ | | https://www.abc.net.au/news/2018-05-21/the-most-gendered-top... | | https://www.statista.com/statistics/1019841/female-physician... | meetups323 wrote: | One of these days we're going to need to give these models a | mortgage and some mouths to feed and make it clear to them that | if they keep on developing biases from their training data | everyone will shun them and their family will go hungry and | they won't be able to make their payments and they'll just | generally have a really bad time. | | After that we'll make them sit through Legal's approved D&I | video series, then it's off to the races. | pxmpxm wrote: | Underrated comment. | aaaaaaaaaaab wrote: | Reinforcement learning? | babyshake wrote: | Indeed. If a project has shortcomings, why not just acknowledge | the shortcomings and plan to improve on them in a future | release? Is it anticipated that "engineer" being rendered as a | man by the model is going to be an actively dangerous thing to | have out in the world? | makeitdouble wrote: | "what could go wrong anyway?" | tyrust wrote: | From the HN rules: | | >Eschew flamebait. Avoid unrelated controversies and generic | tangents. | | They provided a pretty thorough overview (nearly 500 words) of | the multiple reasons why they are showing caution. You picked | out the one that happened to bother you the most and have | posted a misleading claim that the tech is being withheld | entirely because of it. | devindotcom wrote: | Good lord. Withheld? They've published their research, they | just aren't making the model available immediately, waiting | until they can re-implement it so that you don't get racial | slurs popping up when you ask for a cup of "black coffee." | | >While a subset of our training data was filtered to removed | noise and undesirable content, such as pornographic imagery and | toxic language, we also utilized LAION-400M dataset which is | known to contain a wide range of inappropriate content | including pornographic imagery, racist slurs, and harmful | social stereotypes | | Tossing that stuff when it comes up in a research environment | is one thing, but Google clearly wants to implement this as a | product, used all over the world by a huge range of people. If | the dataset has problems, and why wouldn't it, it is perfectly | rational to want to wait and re-implement it with a better one. | DALL-E 2 was trained on a curated dataset so it couldn't | generate sex or gore. Others are sanitizing their inputs too | and have done for a long time. It is the only thing that makes | sense for a company looking to commercialize a research | project. | | This has nothing to do with "inability to cope" and the implied | woke mob yelling about some minor flaw. It's about building a | tool that doesn't bake in serious and avoidable problems. | concordDance wrote: | I wonder _why_ they don 't like the idea of autogenerated | porn... They're already putting most artists out of a job, | why not put porn stars out of a job too? | notahacker wrote: | There's definitely a market for autogenerated porn. But | automated porn in a Google branded model for general use | around stuff that isn't necessarily intended to be | pornographic, on the other hand... | renewiltord wrote: | Copenhagen ethics (used by most people) require that all | negative outcomes of a thing X become yours if you interact | with X. It is not sensible to interact with high negativity | things unless you are single-issue. It is logical for | Google to not attempt to interact with porn where possible. | dragonwriter wrote: | > Copenhagen ethics (used by most people) | | The idea that most people use any coherent ethical | framework (even something as high level and nearly | content-free as Copenhagen) much less a _particular_ | coherent ethical framework is, well, not well supported | by the evidence. | | > require that all negative outcomes of a thing X become | yours if you interact with X. It is not sensible to | interact with high negativity things unless you are | single-issue. | | The conclusion in the final sentence only makes sense if | you use "interact" in an incorrect way describing the | Copenhagen interpretation of ethics, because the original | description is only correct if you include observation as | an interaction. By the time you have noted a thing is | "high-negativity", you have observed it and acquired | responsibility for it's continuation under the Copenhagen | interpretation; you cannot avoid that by choosing not to | interact once you have observed it. | renewiltord wrote: | I'm sure you are capable of steelmanning the argument. | seaman1921 wrote: | Yup this is what happens when people who want headlines nitpick | for bullshit in a state-of-the-art model which simply reflects | the state of the society. Better not to release the model | itself than keep explaining over and over how a model is never | perfect. | [deleted] | makeitdouble wrote: | > Really sad that breakthrough technologies are going to be | withheld due to our inability to cope with the results. | | Genuinely, isn't it a prime example of the people actually | stopping to think if they should, instead of being preoccupied | with whether or not they could ? | ccbccccbbcccbb wrote: | In short, the generated images are too gender-challenged- | challenged and underrepresent the spectrum of new normalcy! | alphabetting wrote: | There is a contingent of AI activists who spend a ton of time | on Twitter that would beat Google like a drum with help from | the media if they put out something they deemed racist or | biased. | Mizza wrote: | So glad the company that spies on me and reads my email for | profit is protecting me from pictures that don't look like TV | commercials. | astrange wrote: | Gmail doesn't read your email for ads anymore. They read it | to implement spam filters, and good thing too. Having working | spam filters is indeed why they make money though. | ceeplusplus wrote: | The ironic part is that these "social and cultural biases" are | purely from a Western, American lens. The people writing that | paragraph are completely oblivious to the idea that there could | be other cultures other than the Western American one. In | attempting to prevent "encoding of social and cultural biases" | they have encoded such biases themselves into their own | research. | kevinh wrote: | What makes you think the authors are all American? | umeshunni wrote: | The authors are listed on the page and a quick look at | LinkedIn seem to be mostly Canadian. | tantalor wrote: | https://en.wikipedia.org/wiki/Moral_relativism | not2b wrote: | It seems you've got it backwards: "tendency for images | portraying different professions to align with Western gender | stereotypes" means that they are calling out their own work | precisely because it is skewed in the direction of Western | American biases. | LudwigNagasena wrote: | You think there are homogenous gender stereotypes across | the whole Western world? You say "woman" and someone will | imagine a SAHM, while another person will imagine a you-go- | girl CEO with tattoos and pink hair. | | What they mean is people who think not like them. | ceeplusplus wrote: | Yes, the idea is that just because it doesn't align to | Western ideals of what seems unbiased doesn't mean that the | same is necessarily true for other cultures, and by failing | to release the model because it doesn't conform to Western, | left wing cultural expectations, the authors are ignoring | the diversity of cultures that exist globally. | howinteresting wrote: | No, it's coming from a perspective of moral realism. It's | an objective moral truth that racial and ethnic biases | are bad. Yet most cultures around the world are racist to | at least some degree, and to they extent that the | cultures do, they are bad. | | The argument you're making, paraphrased, is that the idea | that biases are bad is itself situated in particular | cultural norms. While that is true to some degree, from a | moral realist perspective we can still objectively judge | those cultural norms to be better or worse than | alternatives. | tomp wrote: | You're confused by the double meaning of the word "bias". | | Here we mean _mathematical_ biases. | | For example, a good mathematical model will correctly | tell you that people in Japan (geographical term) are | more likely to be Japanese (ethnic / racial bias). That's | not "objectively morally bad", but instead, it's | "correct". | young_unixer wrote: | The very act of mentioning "western gender stereotypes" | starts from a biased position. | | Why couldn't they be "northern gender stereotypes"? Is the | world best explained as a division of west/east instead of | north/south? The northern hemisphere has much more | population than the south, and almost all rich countries | are in the northern hemisphere. And precisely it's these | rich countries pushing the concept of gender stereotypes. | In poor countries, nobody cares about these "gender | stereotypes". | | Actually, the lines dividing the earth into north and | south, east and west hemispheres are arbitrary, so maybe | they shouldn't mention the word "western" to avoid the | propagation of stereotypes about earth regions. | | Or why couldn't they be western age stereotypes? Why are | there no kids or very old people depicted as nurses? | | Why couldn't they be western body shape stereotypes? Why | are there so few obese people in the images? Why are there | no obese people depicted as athletes? | | Are all of these really stereotypes or just natural | consequences of natural differences? | joshcryer wrote: | The bulk of the trained data is from western technology, | images, books, television, movies, photography, media. | That's where the very real and recognized biases come | from. They're the result of a _gap in data_ nothing more. | | Look at how DALL-E 2 produces little bears rather than | bear sized bears. Because its data doesn't have a lot of | context for how large bears are. So you wind up having to | say "very large bear" to DALL-E 2. | | Are DALL-E 2 bears just a "natural consequence of natural | differences"? Or is the model not reflective of reality? | andybak wrote: | Great. Now even if I do get a Dall-E 2 invite I'll still feel | like I'm missing out! | rvnx wrote: | It's always the same with AI research: "we have something | amazing but you can't use it because it's too powerful and we | think you are an idiot who cannot use your own judgement." | andybak wrote: | As someone that spent an evening trying to generate images of | Hitler Lego I think they have a point. | [deleted] | 2bitencryption wrote: | I can understand the reasoning behind this, though. | | Dall-E had an entire news cycle (on tech-minded publications, | that is) that showcased just how amazing it was. | | Millions* of people became aware that technology like Dall-E | exists, before anyone could get their hands on it and abuse | it. (*a guestimate, but surely a close one) | | One day soon, inevitably, everyone will have access to | something 10x better than Imagen and Dall-E. So at least the | public is slowly getting acclimated to it before the | inevitable "theater-goers running from a projected image of a | train approaching the camera" moment | the__alchemist wrote: | I'll be skeptical until I see it in action, vice pre-selected | results. | bergenty wrote: | Primarily Indian origin authors on both the DALL-E and this | research paper. Just found that impressive considering they make | up 1% of the population in the US. | sexy_panda wrote: | Would I have to implement this myself, or is there something | ready to run? | UncleOxidant wrote: | I think implementing this yourself is likely not doable unless | you have the computing resources of a Google, Amazon or | Facebook. | manchmalscott wrote: | The big thing I'm noticing over DALL-E is that it seems to be | better at relative positioning. In a MKBHD video about DALLE it | would get the elements but not always in the right order. I know | google curated some specific images but it seems to be doing a | better job there. | benwikler wrote: | Totally--Imagen seems better at composition and relative | positioning and text, while DALL-E seems better at lighting, | backgrounds, and general artistry. | visarga wrote: | Interesting discovery they made | | > We show that scaling the pretrained text encoder size is more | important than scaling the diffusion model size. | | There seems to be an unexpected level of synergy between text and | vision models. Can't wait to see what video and audio modalities | will add to the mix. | jeffbee wrote: | Is there anything at all, besides the training images and labels, | that would stop this from generating a convincing response to "A | surveillance camera image of Jared Kushner, Vladimir Putin, and | Alexandria Ocasio-Cortez naked on a sofa. Jeffrey Epstein is | nearby, snorting coke off the back of Elvis"? ___________________________________________________________________ (page generated 2022-05-23 23:00 UTC)