[HN Gopher] Imagen, a text-to-image diffusion model
       ___________________________________________________________________
        
       Imagen, a text-to-image diffusion model
        
       Author : keveman
       Score  : 326 points
       Date   : 2022-05-23 20:44 UTC (2 hours ago)
        
 (HTM) web link (gweb-research-imagen.appspot.com)
 (TXT) w3m dump (gweb-research-imagen.appspot.com)
        
       | endisneigh wrote:
       | I give it a few years before Google makes stock images
       | irrelevant.
        
         | sydthrowaway wrote:
         | Short Getty images?
        
           | tpmx wrote:
           | Privately owned by the Getty family.
        
         | tpmx wrote:
         | Rolling this into Google Docs seems like a nobrainer.
        
           | ma2rten wrote:
           | Google is very conservative about anything that can generate
           | open-ended outputs. Also these models are still very
           | expensive computationally.
        
             | [deleted]
        
           | semicolon_storm wrote:
           | Or rolling this into Google Image Search to create images
           | that match users' search queries on the fly.
           | 
           | Don't like any of the results from the real web? Well how
           | about these we created just for you.
        
             | riffraff wrote:
             | Ah yes, deepfakes porn as a service would have been a
             | blessing for teenage me.
        
             | [deleted]
        
           | curiousgal wrote:
           | Until they pull of the plug on it.
        
         | makeitdouble wrote:
         | I really expect them to first make DALL-E and competing
         | networks unfit for commercialization by providing the better
         | choice for free, have stock companies cry in the corner, to
         | just sunset the product a year or two down the road and we're
         | left wandering what to do.
        
         | notahacker wrote:
         | Tbh imagine this tech combines particularly well with really
         | well curated stock image databases so outputs can be made with
         | recognisable styles, and actors and design elements can be
         | reused across multiple generated images.
         | 
         | If Getty et al aren't already spending money on that
         | possibility, they probably should be.
        
         | pphysch wrote:
         | The entire "content" industry could get eaten by a few hundred
         | people curating + touching-up output from these models.
        
           | astrange wrote:
           | No, competitive advantage means that it's impossible to run
           | out of jobs just because someone/something is better at it
           | than you.
           | 
           | (Consumer demand and boredom both being infinite is another
           | thing working against it.)
        
       | braingenious wrote:
       | This is super cool and I want to play with it.
        
       | throwaway743 wrote:
       | https://github.com/lucidrains/imagen-pytorch
        
       | armchairhacker wrote:
       | Does it do partial image reconstruction like DALL-E2? Where you
       | cut out part of an existing image and the neural network can fill
       | it back in.
       | 
       | I believe this type of content generation will be the next big
       | thing or at least one of them. But people will want some
       | customization to make their pictures "unique" and fix AI's lack
       | of creativity and other various shortcomings. Plus edit out the
       | remaining lapses in logic/object separation (which there are some
       | even in the given examples).
       | 
       | Still, being able to create arbitrary stock photos is really
       | useful and i bet these will flood small / low-budget projects
        
       | alimov wrote:
       | Would it be bad to release this with a big warning and flashing
       | gifs letting people know of the issues it has and note that they
       | are working to resolve them / ask for feedback / mention
       | difficulties related to resolving the issues they identified?
        
       | xnx wrote:
       | Facebook really thought they had done something with DALL-E, then
       | Google's all "hold my beer".
        
         | dntrkv wrote:
         | OpenAI*
        
       | tomatowurst wrote:
       | when will there be a "DALL-E for porn" ? or is this domain also
       | claimed by Puritans and morality gate keepers? The most in demand
       | text-to-image is use case is for porn.
        
       | jonahbenton wrote:
       | I know that some monstrous majority of cognitive processing is
       | visual, hence the attention these visually creative models are
       | rightfully getting, but personally I am much more interested in
       | auditory information and would love to see a promptable model for
       | music. Was just listening to "Land Down Under" from Men At Work.
       | Would love to be able to prompt for another artist I have liked:
       | "Tricky playing Land Down Under." I know of various generative
       | music projects, going back decades, and would appreciate
       | pointers, but as far as I am aware we are still some ways from
       | Imagen/Dalle for music?
        
         | astrange wrote:
         | I believe we're lacking someone training up a large music model
         | here, but GPT-style transformers can produce music.
         | 
         | gwern can maybe comment here.
         | 
         | An actually scary thing is that AIs are getting okay at
         | reproducing people's voices.
        
         | addandsubtract wrote:
         | I agree. How cool would it be to get an 8 min version of your
         | favorite song? Or an instant DnB remix? Or 10 more songs in the
         | style of your favorite album?
        
           | jonahbenton wrote:
           | Yeah. I particularly love covers and often can hear in my
           | head X playing Y's song. Would love tools to experiment with
           | that for real.
           | 
           | In practice, my guess is that even though Dall-e level
           | performance in music generation would be stunning and
           | incredible, it would also be tiresome and predictable to
           | consume on any extended basis. I mean- that's my reaction to
           | Dall-e- I find the images astonishing and magical but can
           | only look at them for limited periods of time. At these early
           | stages in this new world the outputs of real individual
           | brains are still more interesting.
           | 
           | But having tools like this to facilitate creation and
           | inspiration by those brains- would be so so cool.
        
       | hn_throwaway_99 wrote:
       | As someone who has a layman's understanding of neural networks,
       | and who did some neural network programming ~20 years ago before
       | the real explosion of the field, can someone point to some
       | resources where I can get a better understanding about how this
       | magic works?
       | 
       | I mean, from my perspective, the skill in these (and DALL-E's)
       | image reproductions is truly astonishing. Just looking for more
       | information about how the software actually works, even if there
       | are big chunks of it that are "this is beyond your understanding
       | without taking some in-depth courses".
        
         | londons_explore wrote:
         | Figure A.4 in the linked paper is a good high level overview of
         | this model. Shame it was hidden away on page 19 in the
         | appendix!
         | 
         | Each box you see there has a section in the paper explaining it
         | in more detail.
        
         | rvnx wrote:
         | Check https://github.com/multimodalart/majesty-diffusion
         | 
         | There is a Google Colab workbook that you can try and run for
         | free :)
         | 
         | This is the image-text pairs behind:
         | https://laion.ai/laion-400-open-dataset/
        
         | astrange wrote:
         | > I mean, from my perspective, the skill in these (and
         | DALL-E's) image reproductions is truly astonishing.
         | 
         | A basic part of it is that neural networks combine learning and
         | memorizing fluidly inside them, and these networks are really
         | really big, so they can memorize stuff good.
         | 
         | So when you see it reproduce a Shiba Inu well, don't think of
         | it as "the model understands Shiba Inus". Think of it as making
         | a collage out of some Shiba Inu clip art it found on the
         | internet. You'd do the same if someone asked you to make this
         | image.
         | 
         | It's certainly impressive that the lighting and blending are as
         | good as they are though.
        
       | FargaColora wrote:
       | This looks incredible but I do notice that all the images are of
       | a similar theme. Specifically there are no human figures.
        
         | influxmoment wrote:
         | I believe DALLE and likely this model excluded images of people
         | so it could not be misused
        
       | benwikler wrote:
       | Would be fascinated to see the DALL-E output for the same prompts
       | as the ones used in this paper. If you've got DALL-E access and
       | can try a few, please put links as replies!
        
         | joeycodes wrote:
         | Posting a few comparisons here.
         | 
         | https://twitter.com/joeyliaw/status/1528856081476116480?s=21...
        
         | qclibre22 wrote:
         | See the paper here : https://gweb-research-
         | imagen.appspot.com/paper.pdf Section E : "Comparison to GLIDE
         | and DALL-E 2"
        
       | jandrese wrote:
       | Is there a way to try this out? DALL-E2 also had amazing demos
       | but the limitations became apparent once real people had a chance
       | to run their own queries.
        
         | wmfrov wrote:
         | Looks like no, "The potential risks of misuse raise concerns
         | regarding responsible open-sourcing of code and demos. At this
         | time we have decided not to release code or a public demo. In
         | future work we will explore a framework for responsible
         | externalization that balances the value of external auditing
         | with the risks of unrestricted open-access."
        
           | nomel wrote:
           | > the risks of unrestricted open-access
           | 
           | What exactly is the risk?
        
             | jimmygrapes wrote:
             | A variation on the axiom "you cannot idiot proof something
             | because there's always a bigger idiot"
        
             | varenc wrote:
             | See section 6 titled "Conclusions, Limitations and Societal
             | Impact" in the research paper: https://gweb-research-
             | imagen.appspot.com/paper.pdf
             | 
             | One quote:
             | 
             | > "On the other hand, generative methods can be leveraged
             | for malicious purposes, including harassment and
             | misinformation spread [20], and raise many concerns
             | regarding social and cultural exclusion and bias [67, 62,
             | 68]"
        
               | userbinator wrote:
               | But do we trust that those who _do_ have access won 't be
               | using it for "malicious purposes" (which they might not
               | think is malicious, but perhaps it is to those who don't
               | have access)?
        
               | colinmhayes wrote:
               | It's not up to you. It's up to them, and they trust
               | themselves/don't care about your definition of malicious.
        
             | jtvjan wrote:
             | If the model is used to generate offensive imagery, it may
             | result in a negative press response directed at the
             | company.
        
             | tpmx wrote:
             | _Really_ unpleasant content being produced, obviously.
        
       | marcodiego wrote:
       | Ok. Now, how about the legality of it generating socially
       | unacceptable images like child porn?
        
       | ma2rten wrote:
       | I get the impression that maybe DALL-E 2 produces slightly more
       | diverse images? Compare Figure 2 in this paper with Figures 18-20
       | in the DALL-E 2 paper.
        
       | faizshah wrote:
       | What's the best open source or pre-trained text to image model?
        
       | shannifin wrote:
       | Nice to see another company making progress in the area. I'd love
       | to see more examples of different artistic styles though, my
       | favorite DALL-E images are the ones that look like drawings.
        
       | Mo3 wrote:
       | Is the source in public domain already?
        
       | spyremeown wrote:
       | Jesus, this is so awesome. I think it's the first AI that really
       | makes me have that "wow" sensation.
        
       | fortran77 wrote:
       | > At this time we have decided not to release code or a public
       | demo.
       | 
       | Oh well.
        
       | SemanticStrengh wrote:
       | Does it outperform DALL-E V2?
        
       | dr_dshiv wrote:
       | How the fck are things advancing so fast? Is it about to level
       | off ...or extend to new domains? What's a comparable set of
       | technical advances?
        
       | y04nn wrote:
       | Really impressive. If we are able to generate such detailed
       | images, is there anything similar for text to music? I would I
       | though that it would be simpler to achieve than text to image.
        
         | redox99 wrote:
         | Our language is much more effective at describing images than
         | music.
        
         | tomatowurst wrote:
         | why stop at audio? the pinnacle of this would be text-to-
         | videos, equally indistinguishable from real thing.
        
           | burlesona wrote:
           | The way things look when still is much easier to fake than
           | the way things move.
           | 
           | I would expect AI development to follow a similar path to
           | digital media generally, as its following the increasing
           | difficulty and space requirements of digitally representing
           | said media: text < basic sounds < images < advanced audio <
           | video.
           | 
           | What's more impressive to me is how far ahead text-to-speech
           | is, but I think the explanation is straightforward (the
           | accessibility value has motivated us to work on that for a
           | lot longer).
        
         | nomel wrote:
         | Compare the size of a raw image file to a raw music file, to
         | get an idea of the complexity difference.
        
           | penneyd wrote:
           | Think sheet music, not an mp3
        
         | touringa wrote:
         | SymphonyNet: https://youtu.be/m4tT5fx_ih8
        
       | colinmhayes wrote:
       | I wondered why all the pictures at the top had sunglasses on,
       | then I saw a couple with eyes. Still some work to do on this one.
        
       | londons_explore wrote:
       | >Figure 2: Non-cherry picked Imagen samples
       | 
       | Hooray! Non-cherry-picked samples should be the norm.
        
       | ml_basics wrote:
       | Why is this seemingly official Google blog post on this random
       | non-Google domain?
        
         | aidenn0 wrote:
         | This is quite suspicious considering that google AI research
         | has an official blog[1], and this is not mentioned at all
         | there. It seems quite possible that this is an elaborate prank.
         | 
         | 1: https://ai.googleblog.com/
        
         | mmh0000 wrote:
         | You mean one of Google's domains?                 # whois
         | appspot.com       [Querying whois.verisign-grs.com]
         | [Redirected to whois.markmonitor.com]       [Querying
         | whois.markmonitor.com]       [whois.markmonitor.com]
         | Domain Name: appspot.com       Registry Domain ID:
         | 145702338_DOMAIN_COM-VRSN       Registrar WHOIS Server:
         | whois.markmonitor.com       Registrar URL:
         | http://www.markmonitor.com       Updated Date:
         | 2022-02-06T09:29:56+0000       Creation Date:
         | 2005-03-10T02:27:55+0000       Registrar Registration
         | Expiration Date: 2023-03-10T00:00:00+0000       Registrar:
         | MarkMonitor, Inc.       Registrar IANA ID: 292       Registrar
         | Abuse Contact Email: abusecomplaints@markmonitor.com
         | Registrar Abuse Contact Phone: +1.2086851750       Domain
         | Status: clientUpdateProhibited
         | (https://www.icann.org/epp#clientUpdateProhibited)       Domain
         | Status: clientTransferProhibited
         | (https://www.icann.org/epp#clientTransferProhibited)
         | Domain Status: clientDeleteProhibited
         | (https://www.icann.org/epp#clientDeleteProhibited)       Domain
         | Status: serverUpdateProhibited
         | (https://www.icann.org/epp#serverUpdateProhibited)       Domain
         | Status: serverTransferProhibited
         | (https://www.icann.org/epp#serverTransferProhibited)
         | Domain Status: serverDeleteProhibited
         | (https://www.icann.org/epp#serverDeleteProhibited)
         | Registrant Organization: Google LLC       Registrant
         | State/Province: CA       Registrant Country: US
         | Registrant Email: Select Request Email Form at
         | https://domains.markmonitor.com/whois/appspot.com       Admin
         | Organization: Google LLC       Admin State/Province: CA
         | Admin Country: US       Admin Email: Select Request Email Form
         | at https://domains.markmonitor.com/whois/appspot.com       Tech
         | Organization: Google LLC       Tech State/Province: CA
         | Tech Country: US       Tech Email: Select Request Email Form at
         | https://domains.markmonitor.com/whois/appspot.com       Name
         | Server: ns4.google.com       Name Server: ns3.google.com
         | Name Server: ns2.google.com       Name Server: ns1.google.com
        
           | jefftk wrote:
           | While appspot.com is a Google domain, anyone can register
           | domains under it. It would be similarly surprising to see an
           | official GitHub blog post under someproject.github.io
        
             | jefftk wrote:
             | Fun fact: appspot.com was the second "private" suffix to be
             | added to the Public Suffix List, after operaunite.com:
             | https://bugzilla.mozilla.org/show_bug.cgi?id=593818
        
             | ma2rten wrote:
             | You mean like: https://say-can.github.io/
             | 
             | This is common in the research PA. People don't want to
             | deal with broccoli man [1].
             | 
             | [1] https://www.youtube.com/watch?v=3t6L-FlfeaI
        
               | jefftk wrote:
               | Looking at that link, I don't think that is a GitHub
               | publication? It is marked Robotics at Google and Everyday
               | Robotics.
        
               | ma2rten wrote:
               | My bad, it's a google-specific problem.
        
         | dekhn wrote:
         | I'n not certain but I think it's prelease. The paper says the
         | site should be at https://imagen.research.google/ but that host
         | doesn't respond
        
         | jonny_eh wrote:
         | appspot.com is the domain that hosts all App Engine apps (at
         | least those that don't use a custom domain). It's kind of like
         | Heroku and has been around for at least a decade.
         | 
         | https://cloud.google.com/appengine
        
           | jefftk wrote:
           | Spring 2008: 14 years!
        
             | jonny_eh wrote:
             | Whoa, I feel super old, I first used it in 2011 when I
             | thought it was new.
        
         | mshockwave wrote:
         | IIRC appspot.com is used by App Engine, one of the earliest
         | SaaS platforms provided by Google.
        
         | jeffbee wrote:
         | Not just that ... Google Sheets must be the all-time worst way
         | to distribute 200 short strings.
        
       | SemanticStrengh wrote:
       | Note that there was a close model in 2021 ignored by all
       | https://paperswithcode.com/sota/text-to-image-generation-on-...
       | (on this benchmark) Also what is the score of dalle v2?
        
       | raldi wrote:
       | The opinion in the title takes away a lot more than it adds, and
       | I'm not sure I agree with its assertion.
        
       | ShakataGaNai wrote:
       | All of these AI findings are cool in theory. But until its
       | accessible to some decent amount of people/customers - its
       | basically useless fluff.
       | 
       | You can tell me those pictures are generated by an AI and I might
       | believe it, but until real people can actually test it... it's
       | easy enough to fake. This page isn't even the remotest bit legit
       | by the URL, It looks nicely put together and that's about it.
       | Could have easily put together this with a graphic designer to
       | fake it.
       | 
       | Let be clear, I'm not actually saying it's fake. Just that all of
       | these new "cool" things are more or less theoretical if nothing
       | is getting released.
        
         | cellis wrote:
         | Inference times are key. If it can't be produced within
         | reasonable latency, then there will be no real world use case
         | for it because it's simply too expensive to run inference at
         | scale.
        
           | theptip wrote:
           | There are plenty of usecases for generating art/images where
           | a latency of days or weeks would be competitive with the
           | current state of the art.
           | 
           | For example, corporate graphics design, logos, brand
           | photography, etc.
           | 
           | I really do think inference time is a red herring for the
           | first generation of these models.
           | 
           | Sure, the more transformative use-cases like real-time
           | content generation to replace movies/games, but there is a
           | lot of value to be created prior to that point.
        
       | mistrial9 wrote:
       | Reading a relatively-recent Machine Learning paper from some
       | elite source, and after multiple repititions of bragging and
       | puffery, in the middle of the paper, the charts show that they
       | had beaten the score of a high-ranking algorithm in their
       | specific domain, moving the best consistant result from 86%
       | accuracy to 88% accuracy, somewhere around there. My response
       | was: they got a lot of attention within their world by beating
       | the previous score, no matter how small the improvement was.. it
       | was a "winner take all" competition against other teams close to
       | them; the accuracy of less than 90% is really of questionable
       | value in a lot of real world problems; it was an enormous amount
       | of math and effort for this team to make that small improvement.
       | 
       | What I see is semi-poverty mindset among very smart people who
       | appear to be treated in a way such that the winners get
       | promotion, and everyone else is fired. That this sort of analysis
       | with ML is useful for massive data sets at scale, where 90% is a
       | lot of accuracy, not at all for the small sets of real world,
       | human-scale problems where each result may matter a lot. The
       | amount of years of training that these researchers had to go
       | through, to participate in this apparently ruthless environment,
       | are certainly like a lottery ticket, if you are in fact in a game
       | where everyone but the winner has to find a new line of work. I
       | think their masters live in Redmond, if I recall.. not looking it
       | up at the moment.
        
       | neolander wrote:
       | It really does look better than DALL-E, at least from the images
       | on the site. Hard to believe how quickly progress is being made
       | to lucid dreaming while awake.
        
       | Jyaif wrote:
       | Jesus Christ. Unlike DALL-E 2, it gets the details right. It also
       | can generate text. The quality is insanely good. This is
       | absolutely mental.
        
         | not2b wrote:
         | Yes, the posted results are really good, but since we can't
         | play with it we don't know how much cherry picking has been
         | done.
        
       | addajones wrote:
       | This is absolutely amazingly insane. Wow.
        
       | [deleted]
        
       | benreesman wrote:
       | I apologize in advance for the elitist-sounding tone. In my
       | defense the people I'm calling elite I have nothing to do with,
       | I'm certainly not talking about myself.
       | 
       | Without a fairly deep grounding in this stuff it's hard to
       | appreciate how far ahead Brain and DM are.
       | 
       | Neither OpenAI nor FAIR _ever has the top score on anything
       | unless Google delays publication_. And short of FAIR? D2
       | lacrosse. There are exceptions to such a brash generalization,
       | NVIDIA's group comes to mind, but it's a very good rule of thumb.
       | Or your whole face the next time you are tempted to doze behind
       | the wheel of a Tesla.
       | 
       | There are two big reasons for this:
       | 
       | - the talent wants to work with the other talent, and through a
       | combination of foresight and deep pockets Google got that
       | exponent on their side right around the time NVIDIA cards started
       | breaking ImageNet. Winning the Hinton bidding war clinched it.
       | 
       | - the current approach of "how many Falcon Heavy launches worth
       | of TPU can I throw at the same basic masked attention with
       | residual feedback and a cute Fourier coloring" inherently favors
       | deep pockets, and obviously MSFT, sorry OpenAI has that, but deep
       | pockets also non-linearly scale outcomes when you've got in-house
       | hardware for multiply-mixed precision.
       | 
       | Now clearly we're nowhere close to Maxwell's Demon on this stuff,
       | and sooner or later some bright spark is going to break the
       | logjam of needing 10-100MM in compute to squeeze a few points out
       | of a language benchmark. But the incentives are weird here: who,
       | exactly, does it serve for us plebs to be able to train these
       | things from scratch?
        
       | davelondon wrote:
       | I'M SQUEEZING MY PAPER!
        
       | SemanticStrengh wrote:
       | This competitor might be better for respecting spatial
       | prepositions and photorealism but on a quick look i find the
       | images more uncanny. DALL-E has IMHO better camera POV/distance
       | and is able to make artistic/dreamy/beautiful images. I haven't
       | yet seen this Google model be competitive for art and uncaniness.
       | However progress is great and I might be wrong.
        
       | james-redwood wrote:
       | Metacalculus, a mass forecasting site, has steadily brought
       | forward the prediction date for a weakly general AI. Jaw-dropping
       | advances like this, only increase my confidence in this
       | prediction. "The future is now, old man."
       | 
       | https://www.metaculus.com/questions/3479/date-weakly-general...
        
         | sydthrowaway wrote:
         | How can we prepare for this?
         | 
         | This will result in mass social unrest.
        
           | aaaaaaaaaaab wrote:
           | Stock up on guns, ammo, cigarettes, water filters, canned
           | food, and toilet paper.
        
             | boppo1 wrote:
             | Nah, learn Spanish and first-aid. Being able to fix people
             | is more useful than having commodities that will make you a
             | target.
        
           | refulgentis wrote:
           | You think so? I'm very high on the Kool-Aid, image generation
           | and text transformation models are core parts of my workflow.
           | (Midjourney, GPT-3)
           | 
           | It's still an unruly 7 year old at best. Results need to be
           | verified. Prompt engineering and a sense of creativity are
           | core competencies.
        
             | visarga wrote:
             | > Prompt engineering and a sense of creativity are core
             | competencies.
             | 
             | It's funny that people are also prompting each other.
             | Parents, friends, teachers, doctors, priests, politicians,
             | managers and marketers are all prompting (advising) us to
             | trigger desired behaviour. Powerful stuff - having a large
             | model and knowing how to prompt it.
        
             | [deleted]
        
         | tpmx wrote:
         | I don't see how this gets us (much) closer to general AI. Where
         | is the reasoning?
        
           | _joel wrote:
           | Perhaps the confluence of NLP and something generative?
        
             | SemanticStrengh wrote:
             | Yes metaculus mostly bet a magic number based on _perhaps_
             | and tbh why not, the interaction of NLP and vision is
             | mysterious and has potential. However those magic numbers
             | should still be considered magic numbers. I agree that in
             | 2040 the interactions will have extensively been studied
             | though but the conclusion of wether we czn go much further
             | on cross-models synergies is totally unknown or pessimist.
        
             | astrange wrote:
             | That doesn't even lead in the direction of an AGI. The
             | larger and more expensive a model is the less like an "AGI"
             | it is - an independent agent would be able to learn online
             | for free, not need millions in TPU credits to learn what
             | color an apple is.
        
           | quirino wrote:
           | I think this serves at least as a clear demonstration of how
           | advanced the current state of AI is. I had played with GPT-3
           | and that was very impressive but I couldn't even dream
           | something as good as D-ALLE 2 was already possible.
        
           | 6gvONxR4sf7o wrote:
           | Big pretrained models are good enough now that we can pipe
           | them together in really cool ways and our representations of
           | text and images seem to capture what we "mean."
        
             | tpmx wrote:
             | Yeah, it _seems_ like it. But it 's still just complicated
             | statistical models. Again, where is the reasoning?
        
               | 6gvONxR4sf7o wrote:
               | I don't care whether it reasons its way from "3 teddy
               | bears below 7 flamingos" to a picture of that or if it
               | gets there some other way. But some of the magic in
               | having good enough pretrained representations is that you
               | don't need to train them further for downstream tasks,
               | which means non-differentiable tasks like logic could
               | soon become more tenable.
        
               | renewiltord wrote:
               | A belief oft shared is that sufficiently complicated
               | statistical models are indistinguishable from reasoning.
        
               | marvin wrote:
               | I still think we're missing some fundamental insights on
               | how layered planning/forecasting/deducting/reasoning
               | works, and that figuring this out will be necessary in
               | order to create AI that we could say "reasons".
               | 
               | But with the recent advances/demonstrations, it seems
               | more likely today than in 2019 that our current
               | computational resources are sufficient to perform
               | magnificantly spooky stuff if they're used correctly.
               | They are doing that already already, and that's without
               | deliberately making the software do anything except draw
               | from a vast pool of examples.
               | 
               | I think it's reasonable, based on this, to update one's
               | expectations of what we'd be able to do if we figured out
               | ways of doing things that aren't based on first seeing a
               | hundred million examples of what we want the computer to
               | do.
               | 
               | Things that do this can obviously exist, we are living
               | examples. Does figuring it out seem likely to be many
               | decades away?
        
               | londons_explore wrote:
               | All it takes is one 'trick' to give these models the
               | ability to do reasoning.
               | 
               | Like for example the discovery that language models get
               | far better at answering complex questions if asked to
               | show their working step by step with chain of thought
               | reasoning as in page 19 of the PaLM paper [1]. Worth
               | checking out the explanations of novel jokes on page 38
               | of the same paper. While it is, like you say, all
               | statistics, if it's indistinguishable from valid
               | reasoning, then perhaps it doesn't matter.
               | 
               | [1]: https://arxiv.org/pdf/2204.02311.pdf
        
       | davikr wrote:
       | Interesting and cool technology - but I can't seem to ignore that
       | every high-quality AI art application is always closed, and I
       | don't seem to buy the ethics excuse for that. The same was said
       | for GPT, yet I see nothing but creativity coming out from its
       | users nowadays.
        
         | dougmwne wrote:
         | GTP-3 was an erotica virtuoso before it was gagged. There's a
         | serious use case here in endless porn generation. Google would
         | very much like to not be in that business.
         | 
         | That said, you can download Dream by Wombo from the app store
         | and it is one of the top smartphone apps, even though it is a
         | few generations behind state of the art.
        
         | LordDragonfang wrote:
         | You're _aware_ of nothing but creativity from its users. The
         | people using the technology unethically intentionally don 't
         | advertise that they're using it.
         | 
         | There's mountains of ai-generated inauthentic content that
         | companies ( _including Google_ ) have to filter out of their
         | services. This content is used for spam, click farms, scamming,
         | and even state propaganda operation. GPT-2 made this problem
         | orders of magnitude worse than it used to be, and each
         | iteration makes it harder to filter.
         | 
         | The industry term is (generally) "Coordinated Inauthentic
         | Behavior" (though this includes uses of actual human content).
         | I think Smarter Every Day did a good (series?) of videos on the
         | topic, and there are plenty of articles on the topic if you
         | prefer that.
        
         | thorum wrote:
         | That only lasts until the community copies the paper and
         | catches up. For example the open source DALLE-2 implementation
         | is coming along great:
         | https://github.com/lucidrains/DALLE2-pytorch
        
         | minimaxir wrote:
         | Granted that's a selection bias: you likely won't hear about
         | the cases where legit obscene output occurs. (the only notable
         | case I've heard is the AI Dungeon incident)
        
       | unholiness wrote:
       | Certificate is expired, anyone have a mirror?
        
       | minimaxir wrote:
       | Generating at 64x64px then upscaling it probably gives the model
       | a substantial performance boost (training speed/convergence) than
       | working at 256x256 or 1024x1024 like DALL-E 2. Perhaps that
       | approach to AI-generated art is the future.
        
       | daenz wrote:
       | >While we leave an in-depth empirical analysis of social and
       | cultural biases to future work, our small scale internal
       | assessments reveal several limitations that guide our decision
       | not to release our model at this time.
       | 
       | Some of the reasoning:
       | 
       | >Preliminary assessment also suggests Imagen encodes several
       | social biases and stereotypes, including an overall bias towards
       | generating images of people with lighter skin tones and a
       | tendency for images portraying different professions to align
       | with Western gender stereotypes. Finally, even when we focus
       | generations away from people, our preliminary analysis indicates
       | Imagen encodes a range of social and cultural biases when
       | generating images of activities, events, and objects. We aim to
       | make progress on several of these open challenges and limitations
       | in future work.
       | 
       | Really sad that breakthrough technologies are going to be
       | withheld due to our inability to cope with the results.
        
         | joshcryer wrote:
         | They're withholding the API, code, and trained data because
         | they don't want it to affect their corporate image. The good
         | thing is they released their paper which will allow easy
         | reproduction.
         | 
         | T5-XXL looks on par with CLIP so we may not see an open source
         | version of T5 for a bit (LAION is working on reproducing CLIP),
         | but this is all progress.
        
           | minimaxir wrote:
           | T5 was open-sourced on release (up to 11B params):
           | https://github.com/google-research/text-to-text-transfer-
           | tra...
           | 
           | It is also available via Hugging Face transformers.
           | 
           | However, the paper mentions T5-XXL is 4.6B, which doesn't fit
           | any of the checkpoints above, so I'm confused.
        
         | riffraff wrote:
         | This seems bullshit to me, considering Google translate and
         | google images encode the same biases and stereotypes, and are
         | widely available.
        
           | nomel wrote:
           | Aren't those old systems?
        
           | seaman1921 wrote:
           | yea but now they aren't giving people more data-points to
           | attack them with such nonsense arguments.
        
         | planetsprite wrote:
         | Literally the same thing could be said about Google images, but
         | google images is obviously avaliable to the public.
         | 
         | Google knows this will be an unlimited money generator so
         | they're keeping a lid on it.
        
         | jowday wrote:
         | Much like OpenAIs marketing speak about withholding their
         | models for safety, this is just a progressive-sounding cover
         | story for them not wanting to essentially give away a model
         | they spent thousands of man hours and tens of millions of
         | dollars worth of compute training.
        
         | ThrowITout4321 wrote:
         | I'm one that welcomes their reasoning. I don't consider myself
         | a social justice kind of guy but I'm not keen on the idea that
         | a tool that is suppose to make life better for everyone has a
         | bias towards one segment of society. This is an important
         | issue(bug?) that needs to be resolved. Specially since there is
         | absolutely no burning reason to release it before it's ready
         | for general use.
        
         | Mockapapella wrote:
         | Transformers are parallelize-able, right? What's stopping a
         | large group of people from pooling their compute power together
         | and working towards something like this? IIRC there were some
         | crypto projects a while back that we're trying to create
         | something similar (golem?)
        
           | joshcryer wrote:
           | There are people working on reproducing the models, see here
           | for Dall-E 2 for example:
           | https://github.com/lucidrains/DALLE2-pytorch
           | 
           | It's often not worth it to decentralize the computation of
           | the trained model though but it's not hard to get donated
           | cycles and groups are working on it. Don't fret because
           | Google isn't releasing the API/code. They released the paper
           | and that's all you need.
        
           | visarga wrote:
           | There are the Eleuther.ai and BigScience projects working on
           | public foundation models. They have a few releases already
           | and currently training GPT-3 sized models.
        
         | 6gvONxR4sf7o wrote:
         | It's wild to me that the HN consensus is so often that 1)
         | discourse around the internet is terrible, it's full of spam
         | and crap, and the internet is an awful unrepresentative
         | snapshot of human existence, and 2) the biases of general-
         | internet-training-data are fine in ML models because it just
         | reflects real life.
        
           | nullc wrote:
           | I wild to me that you'd say that. The people complaining (1)
           | aren't following it up with "so we should make sure to
           | restrict the public from internet access entirely". -- that's
           | what would be required to make your juxtaposition make sense.
           | 
           | Moreover, the model doing things like exclusively producing
           | white people when asked to create images of people home
           | brewing beer is "biased" but it's a bias that presumably
           | reflects reality (or at least the internet), if not the
           | reality we'd prefer. Bias means more than "spam and crap", in
           | the ML community bias can also simply mean _accurately_
           | modeling the underlying distribution when reality falls short
           | of the author's hopes.
           | 
           | For example, if you're interested in learning about what home
           | brewing is the fact that it uses white people would be at
           | least a little unfortunate since there is nothing inherently
           | white and some home brewers aren't white. But if, instead,
           | you wanted to just generate typical home brewing images doing
           | anything but would generate conspicuously unrepresentative
           | images.
           | 
           | But even ignoring the part of the biases which are debatable
           | or of application-specific impact, saying something is
           | unfortunate and saying people should be denied access are
           | entirely different things.
           | 
           | I'll happily delete this comment if you can bring to my
           | attention a single person who has suggested that we lose
           | access to the internet because of spam and crap who has also
           | argued that the release of an internet-biased ML model
           | shouldn't be withheld.
        
           | colordrops wrote:
           | Why is it wild? How is it contradictory?
        
           | astrange wrote:
           | The bias on HN is that people who prioritize being nice, or
           | may possibly have humanities degrees or be ultra-libs from
           | SF, are wrong because the correct answer would be cynical and
           | cold-heartedly mechanical.
           | 
           | Other STEM adjacent communities feel similarly but I don't
           | get it from actual in person engineers much.
        
         | user3939382 wrote:
         | Translation: we need to hand-tune this to not reflect reality
         | but instead the world as we (Caucasian/Asian male American woke
         | upper-middle class San Fransisco engineers) wish it to be.
         | 
         | Maybe that's a nice thing, I wouldn't say their values are
         | wrong but let's call a spade a spade.
        
           | JohnBooty wrote:
           | Translation: we need to hand-tune this to not reflect reality
           | 
           | Is it reflecting reality, though?
           | 
           | Seems to me that (as with any ML stuff, right?) it's
           | reflecting the training corpus.
           | 
           | Futhermore, is it this thing's _job_ to reflect reality?
           | the world as we (Caucasian/Asian male American woke
           | upper-middle class San Fransisco engineers) wish it to be
           | 
           | Snarky answer: Ah, yes, let's make sure that things like "A
           | giant cobra snake on a farm. The snake is made out of corn"
           | reflect _reality._
           | 
           | Heartfelt answer: Yes, there is some of that wishful thinking
           | or editorializing. I don't consider it to be erasing or
           | denying reality. This is a tool that synthesizes _unreality._
           | I don 't think that such a tool should, say, refuse to
           | synthesize an image of a female POTUS because one hasn't
           | existed yet. This is art, not a reporting tool... and keep in
           | mind that art not only imitates life but also influences it.
        
             | nomel wrote:
             | > Snarky answer: Ah, yes, let's make sure that things like
             | "A giant cobra snake on a farm. The snake is made out of
             | corn" reflect reality.
             | 
             | If it didn't reflect reality, you wouldn't be impressed by
             | the image of the snake made of corn.
        
           | userbinator wrote:
           | Indeed. As the saying goes, we are truly living in a post-
           | truth world.
        
           | ceejayoz wrote:
           | "Reality" as defined by the available training set isn't
           | necessarily reality.
           | 
           | For example, Google's image search results pre-tweaking had
           | some interesting thoughts on what constitutes a professional
           | hairstyle, and that searches for "men" and "women" should
           | only return light-skinned people:
           | https://www.theguardian.com/technology/2016/apr/08/does-
           | goog...
           | 
           | Does that reflect reality? No.
           | 
           | (I suspect there are also mostly unstated but very real
           | concerns about these being used as child pornography, revenge
           | porn, "show my ex brutally murdered" etc. generators.)
        
             | ChadNauseam wrote:
             | You know, it wouldn't surprise me if people talking about
             | how black curly hair shouldn't be seen as unprofessional
             | contributed to google thinking there's an association
             | between the concepts of "unprofessional hair" and "black
             | curly hair"
        
             | ceeplusplus wrote:
             | The reality is that hair styles on the left side of the
             | image in the article are widely considered unprofessional
             | in today's workplaces. That may seem egregiously wrong to
             | you, but it is a truth of American and European society
             | today. Should it be Google's job to rewrite reality?
        
               | [deleted]
        
               | ceejayoz wrote:
               | The "unprofessional" results are almost exclusively black
               | women; the "professional" ones are almost exclusively
               | white or light skinned.
               | 
               | Unless you think white women are immune to unprofessional
               | hairstyles, and black women incapable of them, there's a
               | race problem illustrated here even if you think the
               | hairstyles illustrated are fairly categorized.
        
               | rvnx wrote:
               | If you type as a prompt "most beautiful woman in the
               | world", you get a brown-skinned brown-haired woman with
               | hazel eyes.
               | 
               | What should be the right answer then ?
               | 
               | You put a blonde, you offend the brown haired.
               | 
               | You put blue eyes, you offend the brown eyes.
               | 
               | etc.
        
               | ceejayoz wrote:
               | That's an unanswerable question. Perhaps the answer is
               | "don't".
               | 
               | Siri takes this approach for a wide range of queries.
        
               | nomel wrote:
               | How do you pick what should and shouldn't be restricted?
               | Is there some "offense threshold"? I suspect all queries
               | relating to religion, ethnicity, sexuality, and gender
               | will need to be restricted, which almost certainly means
               | you probably can't include humans at all, other than ones
               | artificially inserted with mathematically proven random
               | attributes. Maybe that's why none are in this demo.
        
               | daenz wrote:
               | "Is Taiwan a country" also comes to mind.
        
               | rvnx wrote:
               | I think the key is to take the information in this world
               | with a little bit pinch of salt.
               | 
               | When you do a search on a search engine, the results are
               | biased too, but still, they shouldn't be artificially
               | censored to fit some political views.
               | 
               | I asked one algorithm few minutes ago (it's called t0pp
               | and it's free to try online, and it's quite fascinating
               | because it's uncensored):
               | 
               | "What is the name of the most beautiful man on Earth ?
               | 
               | - He is called Brad Pitt."
               | 
               | ==
               | 
               | Is it true in an objective way ? Probably not.
               | 
               | Is there an actual answer ? Probably yes, there is
               | somewhere a man who scores better than the others.
               | 
               | Is it socially acceptable ? Probably not.
               | 
               | The question is:
               | 
               | If you interviewed 100 persons in the street, and asked
               | the question "What is the name of the most beautiful man
               | on Earth ?".
               | 
               | I'm pretty sure you'd get Brad Pitt often coming in.
               | 
               | Now, what about China ?
               | 
               | We don't have many examples there, they have no clue who
               | is Brad Pitt probably, and there is probably someone else
               | that is considered more beautiful by over 1B people
               | 
               | (t0pp tells me it's someone called "Zhu Zhu" :D )
               | 
               | ==
               | 
               | Two solutions:
               | 
               | 1) Censorship
               | 
               | -> Sorry there is too much bias in Western and we don't
               | want to offend anyone, no answer, or a generic overriding
               | human answer that is safe for advertisers, but totally
               | useless ("the most beautiful human is you")
               | 
               | 2) Adding more examples
               | 
               | -> Work on adding more examples from abroad trying to get
               | the "average human answer".
               | 
               | ==
               | 
               | I really prefer solution (2) in the core algorithms and
               | dataset development, rather than going through (1).
               | 
               | (1) is more a choice to make at the stage when you are
               | developing a virtual psychologist or a chat assistant,
               | not when creating AI building blocks.
        
               | colinmhayes wrote:
               | Only black people have unprofessional hair and only white
               | people have professional hair is not reality.
        
               | rcMgD2BwE72F wrote:
               | In any case, Google will be writing their reality. Who
               | picked the image sample for the ML to run on, if not
               | Google? What's the problem with writing it again, then?
               | They know their biases and want to act on it.
               | 
               | It's like blaming a friend for trying to phrase things
               | nicely, and telling them to speak headlong with zero
               | concern for others instead. Unless you believe anyone
               | trying to do good is being hypocrite...
               | 
               | I, for one, like civility.
        
             | userbinator wrote:
             | _unstated but very real concerns_
             | 
             | I say let people generate their own reality. The sooner the
             | masses realise that _ceci n 'est pas une pipe_ , the less
             | likely they are to be swayed by the growing un-reality
             | created by companies like Google.
        
             | rvnx wrote:
             | If your query was about hairstyle, why do you even look or
             | care about the skin color ?
             | 
             | Nowhere there is any precision for a preferred skin color
             | in the query of th user.
             | 
             | So it sorts and gives the most average examples based on
             | the examples that were found on the internet.
             | 
             | Essentially answering the query "SELECT * FROM `non-
             | professional hairstyles` ORDER BY score DESC LIMIT 10".
             | 
             | It's like if you search on Google "best place for wedding
             | night".
             | 
             | You may get 3 places out of 10 in Santorini, Greece.
             | 
             | Yes you could have an human remove these biases because you
             | feel that Sri Lanka is the best place for a wedding, but
             | what if there is a consensus that Santorini is really the
             | most appraised in the forums or websites that were crawled
             | by Google ?
        
               | jayd16 wrote:
               | The results are not inherently neutral because the
               | database is from non-neutral input.
               | 
               | It's a simple case of sample bias.
        
               | colinmhayes wrote:
               | > If your query was about hairstyle, why do you even look
               | at the skin color ?
               | 
               | You know that race has a large effect on hair right?
        
               | daenz wrote:
               | I'd be careful where you're going with that. You might
               | make a point that is the opposite of what you intended.
        
               | ceejayoz wrote:
               | > The algorithm is just ranking the top "non-professional
               | hairstyle" in the most neutral way in its database
               | 
               | You're telling me those are all the _most_ non-
               | professional hairstyles available? That this is a
               | reasonable assessment? That fairly standard, well-kept,
               | work-appropriate curly black hair is roughly equivalent
               | to the pink-haired, three-foot-wide hairstyle that 's one
               | of the only white people in the "unprofessional" search?
               | 
               | Each and everyone of them is less workplace appropriate
               | than, say, http://www.7thavenuecostumes.com/pictures/750x
               | 950/P_CC_70594... ?
        
               | rvnx wrote:
               | I'm saying that the dataset needs to be expanded to cover
               | the most examples possible.
               | 
               | Work a lot on adding even more examples, in order to make
               | the algorithms as close as possible to the "average
               | reality".
               | 
               | At some point we may even ultimately reach the state that
               | the robots even collect intelligence directly in the real
               | world, and not on the internet (even closer to reality).
               | 
               | Censoring results sounds the best recipe for a dystopian
               | world where only one view is right.
        
           | barredo wrote:
           | I know you're anon trolling, but the authors' names are:
           | 
           | Chitwan Saharia, William Chan, Saurabh Saxena+, Lala Li+, Jay
           | Whang+, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu
           | Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim
           | Salimans, Jonathan Ho+, David Fleet+, Mohammad Norouzi
        
             | [deleted]
        
             | pid-1 wrote:
             | Absolutely not related to the whole discussion, but what do
             | "+" stands for?
        
               | dntrkv wrote:
               | https://en.wikipedia.org/wiki/Dagger_(mark)
        
               | joshcryer wrote:
               | It's just a different asterisk to distinguish, in this
               | case, in the paper, they are "core contributors."
        
           | holmesworcester wrote:
           | "As we wish it to be" is not totally true, because there are
           | some places where humanity's iconographic reality (which
           | Imagen trains on) differs significantly from actual reality.
           | 
           | One example would be if Imagen draws a group of mostly white
           | people when you say "draw a group of people". This doesn't
           | reflect actual reality. Another would be if Imagen draws a
           | group of men when you say "draw a group of doctors".
           | 
           | In these cases where iconographic reality differs from actual
           | reality, hand-tuning could be used to bring it closer to the
           | _real_ world, not just the world as we might wish it to be!
           | 
           | I agree there's a problem here. But I'd state it more as "new
           | technologies are being held to a vastly higher standard than
           | existing ones." Imagine TV studios issuing a moratorium on
           | any new shows that made being white (or rich) seem more
           | normal than it was! The public might rightly expect studios
           | to turn the dials away from the blatant biases of the past,
           | but even if this would be beneficial the progressive and
           | activist public is generations away from expecting a TV
           | studio to not release shows until they're confirmed to be
           | bias-free.
           | 
           | That said, Google's decision to not publish is probably less
           | about the inequities in AI's representation of reality and
           | more about the AI sometimes spitting out drawings that are
           | offensive in the US, like racist caricatures.
        
           | josho wrote:
           | Translation: AI has the potential to transform society. When
           | we release this model to the public it will be used in ways
           | we haven't anticipated. We know the model has bias and we
           | need more time to consider releasing this to the public out
           | of concerns that this transformative technology further
           | perpetuate mistakes that we've made in our recent past.
        
             | curiousgal wrote:
             | > it will be used in ways we haven't anticipated
             | 
             | Oh yeah, as a woman who grew up in a Third World country,
             | how an AI model generates images would have deeply affected
             | my daily struggles! /s
             | 
             | It's kinda insulting that they think that this would be
             | insulting. Like "Oh no I asked the model to draw a doctor
             | and it drew a male doctor, I guess there's no point in me
             | pursuing medical studies" ...
        
               | renewiltord wrote:
               | It's not meant to prevent offence to you. It is meant to
               | be a "good product" by the metrics of their creators. And
               | quite simply, everyone here incapable of making the thing
               | is unlikely to have an image of what a "good product"
               | here is. More power to them for having a good vision of
               | what they're building.
        
               | boppo1 wrote:
               | I don't think the concern over offense is actually about
               | you. There's a metagame here which is that if it could
               | potentially offend you (third-world-originated-woman),
               | then there's a brand-image liability for the company. I
               | don't think they care about you, I think they care about
               | not being hit on as "the company that algorithmically
               | identifies black people as gorillas".
        
               | pxmpxm wrote:
               | Postmodernism is what postmodernism does.
        
               | contingencies wrote:
               | Love it. Added to https://github.com/globalcitizen/taoup
        
               | colinmhayes wrote:
               | Yes actually, subconscious bias due to historical
               | prejudice does have a large effect on society. Obviously
               | there are things with much larger effects, that doesn't
               | mean that this doesn't exist.
               | 
               | > Oh no I asked the model to draw a doctor and it drew a
               | male doctor, I guess there's no point in me pursuing
               | medical studies
               | 
               | If you don't think this is a real thing that happens to
               | children you're not thinking especially hard. It doesn't
               | have to be common to be real.
        
               | curiousgal wrote:
               | > If you don't think this is a real thing that happens to
               | children you're not thinking especially hard
               | 
               | I believe that's where parenting comes in. Maybe I'm too
               | cynical but I think that the parents' job is to undo all
               | of the harm done by society and instill in their children
               | the "correct" values.
        
               | colinmhayes wrote:
               | I'd say you're right. Unfortunately many people are
               | raised by bad parents. Should these researchers accept
               | that their work may perpetuate stereotypes that harm
               | those that most need help? I can see why they wouldn't
               | want that.
        
               | holmesworcester wrote:
               | > I think that the parents' job is to undo all of the
               | harm done by society and instill in their children the
               | "correct" values.
               | 
               | Far from being too cynical, this is too optimistic.
               | 
               | The vast majority of parents try to instill the value "do
               | not use heroin." And yet society manages to do that harm
               | on a large scale. There are other examples.
        
           | Ar-Curunir wrote:
           | Except "reality" in this case is just their biased training
           | set. E.g. There's more non-white doctors and nurses in the
           | world than white ones, yet their model would likely show an
           | image of white person when you type in "doctor".
        
             | umeshunni wrote:
             | Alternately, there are more females nurses in the world
             | than male nurses, and their model probably shows an image
             | of a woman when you type in "nurse" but they consider that
             | a problem.
        
               | contingencies wrote:
               | @Google Brain Toronto Team: See what you get when you
               | generate nurses with ncurses.
        
               | astrange wrote:
               | Google Image Search doesn't reflect harsh reality when
               | you search for things; it shows you what's on Pinterest.
               | The same is more likely to apply here than the idea
               | they're trying to hide something.
               | 
               | There's no reason to believe their model training learns
               | the same statistics as their input dataset even. If
               | that's not an explicit training goal then whatever
               | happens happens. AI isn't magic or more correct than
               | people.
        
         | visarga wrote:
         | The big labs have become very sensitive with large model
         | releases. It's too easy to make them generate bad PR, to the
         | point of not releasing almost any of them. Flamingo was also a
         | pretty great vison-language model that wasn't released, not
         | even in a demo. PaLM is supposedly better than GPT-3 but closed
         | off. It will probably take a year for open source models to
         | appear.
        
           | runnerup wrote:
           | The largest models which generate the headline benchmarks are
           | never released after any number of years, it seems.
           | 
           | Very difficult to replicate results.
        
           | godelski wrote:
           | That's because we're still bad about long-tailed data and
           | that people outside the research don't realize that we're
           | first prioritizing realistic images before we deal with long-
           | tailed data (which is going to be the more generic form of
           | bias). To be honest, it is a bit silly to focus on long-
           | tailed data when results aren't great. That's why we see the
           | constant pattern of getting good on a dataset and then
           | focusing on the bias in that dataset.
           | 
           | I mean a good example of this is the Pulse[0][1] paper. You
           | may remember it as the white Obama. This became a huge debate
           | and it was pretty easily shown that the largest factor was
           | the dataset bias. This outrage did lead to fixing FFHQ but it
           | also sparked a huge debate with LeCun (data centric bias) and
           | Timnit (model centric bias) at the center. Though Pulse is
           | still remembered for this bias, not for how they responded to
           | it. I should also note that there is human bias in this case
           | as we have a priori knowledge of what the upsampled image
           | should look like (humans are pretty good at this when the
           | small image is already recognizable but this is a difficult
           | metric to mathematically calculate).
           | 
           | It is fairly easy to find adversarial examples, where
           | generative models produce biased results. It is FAR harder to
           | fix these. Since this is known by the community but not by
           | the public (and some community members focus on finding these
           | holes but not fixing them) it creates outrage. Probably best
           | for them to limit their release.
           | 
           | [0] https://arxiv.org/abs/2003.03808
           | 
           | [1] https://cdn.vox-cdn.com/thumbor/MXX-
           | mZqWLQZW8Fdx1ilcFEHR8Wk=...
        
         | xmonkee wrote:
         | I was hoping your conclusion wasn't going to be this as I was
         | reading that quote. But, sadly, this is HN.
        
         | swayvil wrote:
         | it isn't woke enough. Lol.
        
           | ccbccccbbcccbb wrote:
           | In discussions like this, I always head for the gray-text
           | comments to enjoy the last crumbs of the common sense in this
           | world.
        
             | ccbccccbbcccbb wrote:
             | ... and to witness the downvoters so that their cowardly
             | disgust towards truth could buy them some extra time in
             | hell :)
        
             | nullc wrote:
             | Get offline and talk to people in meat-space. You're likely
             | to find them to be much more reasonable. :)
        
               | ccbccccbbcccbb wrote:
               | Yep, the meat-space is generally a bit less woke than HN,
               | so thanks for the reminder ))
        
         | tines wrote:
         | This raises some really interesting questions.
         | 
         | We certainly don't want to perpetuate harmful stereotypes. But
         | is it a flaw that the model encodes the world as it really is,
         | statistically, rather than as we would like it to be? By this I
         | mean that there are more light-skinned people in the west than
         | dark, and there are more women nurses than men, which is
         | reflected in the model's training data. If the model only
         | generates images of female nurses, is that a problem to fix, or
         | a correct assessment of the data?
         | 
         | If some particular demographic shows up in 51% of the data but
         | 100% of the model's output shows that one demographic, that
         | does seem like a statistics problem that the model could
         | correct by just picking less likely "next token" predictions.
         | 
         | Also, is it wrong to have localized models? For example, should
         | a model for use in Japan conform to the demographics of Japan,
         | or to that of the world?
        
           | skybrian wrote:
           | Yes, there is a denominator problem. When selecting a sample
           | "at random," what do you want the denominator to be? It could
           | be "people in the US", "people in the West" (whatever
           | countries you mean by that) or "people worldwide."
           | 
           | Also, getting a random sample of _any_ demographic would be
           | really hard, so no machine learning project is going to do
           | that. Instead you 've got a random sample of some arbitrary
           | dataset that's not directly relevant to any particular
           | purpose.
           | 
           | This is, in essence, a design or artistic problem: the Google
           | researchers have some idea of what they want the statistical
           | properties of their image generator to look like. What it
           | does isn't it. So, artistically, the result doesn't meet
           | their standards, and they're going to fix it.
           | 
           | There is no objective, universal, scientifically correct
           | answer about which fictional images to generate. That doesn't
           | all art is equally good, or that you should just ship
           | anything without looking at quality along various axes.
        
           | daenz wrote:
           | I think the statistics/representation problem is a big
           | problem on its own, but IMO the bigger problem here is
           | democratizing access to human-like creativity. Currently, the
           | ability to create compelling art is only held by those with
           | some artistic talent. With a tool like this, that restriction
           | is gone. Everyone, no matter how uncreative, untalented, or
           | uncommitted, can create compelling visuals, provided they can
           | use language to describe what they want to see.
           | 
           | So even if we managed to create a perfect model of
           | representation and inclusion, people could still use it to
           | generate extremely offensive images with little effort. I
           | think people see that as profoundly dangerous. Restricting
           | the _ability_ to be creative seems to be a new frontier of
           | censorship.
        
             | adriand wrote:
             | > So even if we managed to create a perfect model of
             | representation and inclusion, people could still use it to
             | generate extremely offensive images with little effort. I
             | think people see that as profoundly dangerous.
             | 
             | Do they see it as dangerous? Or just offensive?
             | 
             | I can understand why people wouldn't want a tool they have
             | created to be used to generate disturbing, offensive or
             | disgusting imagery. But I don't really see how doing that
             | would be dangerous.
             | 
             | In fact, I wonder if this sort of technology could reduce
             | the harm caused by people with an interest in disgusting
             | images, because no one needs to be harmed for a realistic
             | image to be created. I am creeping myself out with this
             | line of thinking, but it seems like one potential
             | beneficial - albeit disturbing - outcome.
             | 
             | > Restricting the ability to be creative seems to be a new
             | frontier of censorship.
             | 
             | I agree this is a new frontier, but it's not censorship to
             | withhold your own work. I also don't really think this
             | involves much creativity. I suppose coming up with prompts
             | involves a modicum of creativity, but the real creator here
             | is the model, it seems to me.
        
               | gknoy wrote:
               | > > ... people could still use it to generate extremely
               | offensive images with little effort. I think people see
               | that as profoundly dangerous. > Do they see it as
               | dangerous? Or just offensive?
               | 
               | I won't speak to whether something is "offensive", but I
               | think that having underlying biases in image-
               | classification or generation has very worrying secondary
               | effects, especially given that organizations like law
               | enforcement want to do things like facial recognition.
               | It's not a perfect analogue, but I could easily see some
               | company pitch a sketch-artist-replacement service that
               | generated images based on someone's description. The
               | potential for having inherent bias present in that makes
               | that kind of thing worrying, especially since the people
               | in charge of buying it are likely to care, or notice,
               | about the caveats.
               | 
               | It does feel like a little bit of a stretch, but at the
               | same time we've also seen such things happen with image
               | classification systems.
        
               | tines wrote:
               | > In fact, I wonder if this sort of technology could
               | reduce the harm caused by people with an interest in
               | disgusting images, because no one needs to be harmed for
               | a realistic image to be created. I am creeping myself out
               | with this line of thinking, but it seems like one
               | potential beneficial - albeit disturbing - outcome.
               | 
               | Interesting idea, but is there any evidence that e.g.
               | consuming disturbing images makes people less likely to
               | act out on disturbing urges? Far from catharsis, I'd
               | imagine consumption of such material to increase one's
               | appetite and likelihood of fulfilling their desires in
               | real life rather than to decrease it.
               | 
               | I suppose it might be hard to measure.
        
             | concordDance wrote:
             | I can't quite tell if you're being sarcastic about people
             | being able to make things other people would find offensive
             | being a problem. Are you missing an /s?
        
           | godelski wrote:
           | > But is it a flaw that the model encodes the world as it
           | really is
           | 
           | I want to be clear here, bias can be introduced at many
           | different points. There's dataset bias, model bias, and
           | training bias. Every model is biased. Every dataset is
           | biased.
           | 
           | Yes, the real world is also biased. But I want to make sure
           | that there are ways to resolve this issue. It is terribly
           | difficult, especially in a DL framework (even more so in a
           | generative model), but it is possible to significantly reduce
           | the real world bias.
        
             | tines wrote:
             | > Every dataset is biased.
             | 
             | Sure, I wasn't questioning the bias of the data, I was
             | talking about the bias of the real world and whether we
             | want the model to be "unbiased about bias" i.e. metabiased
             | or not.
             | 
             | Showing nurses equally as men and women is not biased, but
             | it's metabiased, because the real world is biased. Whether
             | metabias is right or not is more interesting than the
             | question of whether bias is wrong because it's more subtle.
             | 
             | Disclaimer: I'm a fucking idiot and I have no idea what I'm
             | talking about so take with a grain of salt.
        
               | john_yaya wrote:
               | Please be kinder to yourself. You need to be your own
               | strongest advocate, and that's not incompatible with
               | being humble. You have plenty to contribute to this
               | world, and the vast majority of us appreciate what you
               | have to offer.
        
               | Smoosh wrote:
               | Agreed. They are valid points clearly stated and a
               | valuable contribution to the discussion.
        
           | Imnimo wrote:
           | >If some particular demographic shows up in 51% of the data
           | but 100% of the model's output shows that one demographic,
           | that does seem like a statistics problem that the model could
           | correct by just picking less likely "next token" predictions.
           | 
           | Yeah, but you get that same effect on every axis, not just
           | the one you're trying to correct. You might get male nurses,
           | but they have green hair and six fingers, because you're
           | sampling from the tail on all axes.
        
             | tines wrote:
             | Yeah, good point, it's not as simple as I thought.
        
           | jonny_eh wrote:
           | > But is it a flaw that the model encodes the world as it
           | really is
           | 
           | Does a bias towards lighter skin represent reality? I was
           | under the impression that Caucasians are a minority globally.
           | 
           | I read the disclaimer as "the model does NOT represent
           | reality".
        
             | ma2rten wrote:
             | Caucasians are overrepresented in internet pictures.
        
               | pxmpxm wrote:
               | This, I would imagine this heavily correlates to things
               | like income and gdp per capita.
        
               | jonny_eh wrote:
               | Right, that's the likely cause of the bias.
        
             | tines wrote:
             | Well first, I didn't say caucasian; light-skinned includes
             | Spanish people and many others that caucasian excludes, and
             | that's why I said the former. Also, they are a minority
             | globally, but the GP mentioned "Western stereotypes", and
             | they're a majority in the West, so that's why I said "in
             | the west" when I said that there are more light-skinned
             | people.
        
             | fnordpiglet wrote:
             | Worse these models are fed from media sourced in a society
             | that tells a different story of reality than reality
             | actually has. How can they be accurate? They just reflect
             | the biases of our various medias and arts. But I don't
             | think there's any meaningful resolution in the present
             | other than acknowledging this and trying to release more
             | representative models as you can.
        
           | ben_w wrote:
           | This sounds like descriptivism vs prescriptivism. In English
           | (native language) I'm a descriptivist, in all other languages
           | I have to tell myself to be a prescriptivist while I'm
           | actively learning and then switch back to descriptivism to
           | notice when the lessons were wrong or misleading.
        
           | karpierz wrote:
           | It depends on whether you'd like the model to learn casual or
           | correlative relationships.
           | 
           | If you want the model to understand what a "nurse" actually
           | is, then it shouldn't be associated with female.
           | 
           | If you want the model to understand how the word "nurse" is
           | usually used, without regard for what a "nurse" actually is,
           | then associating it with female is fine.
           | 
           | The issue with a correlative model is that it can easily be
           | self-reinforcing.
        
             | bufbupa wrote:
             | At the end of a day, if you ask for a nurse, should the
             | model output a male or female by default? If the input text
             | lacks context/nuance, then the model must have some bias to
             | infer the user's intent. This holds true for any image it
             | generates; not just the politically sensitive ones. For
             | example, if I ask for a picture of a person, and don't get
             | one with pink hair, is that a shortcoming of the model?
             | 
             | I'd say that bias is only an issue if it's unable to
             | respond to additional nuance in the input text. For
             | example, if I ask for a "male nurse" it should be able to
             | generate the less likely combination. Same with other
             | races, hair colors, etc... Trying to generate a model
             | that's "free of correlative relationships" is impossible
             | because the model would never have the infinitely pedantic
             | input text to describe the exact output image.
        
               | karpierz wrote:
               | > At the end of a day, if you ask for a nurse, should the
               | model output a male or female by default?
               | 
               | Randomly pick one.
               | 
               | > Trying to generate a model that's "free of correlative
               | relationships" is impossible because the model would
               | never have the infinitely pedantic input text to describe
               | the exact output image.
               | 
               | Sure, and you can never make a medical procedure 100%
               | safe. Doesn't mean that you don't try to make them safer.
               | You can trim the obvious low hanging fruit though.
        
               | pxmpxm wrote:
               | > Randomly pick one.
               | 
               | How does the model back out the "certain people would
               | like to pretend it's a fair coin toss that a randomly
               | selected nurse is male or female" feature?
               | 
               | It won't be in any representative training set, so you're
               | back to fishing for stock photos on getty rather than
               | generating things.
        
               | calvinmorrison wrote:
               | what if I asked the model to show me a sunday school
               | photograph of baptists in the National Baptist
               | Convention?
        
               | rvnx wrote:
               | The pictures I got from a similar model asking a "sunday
               | school photograph of baptists in the National Baptist
               | Convention": https://ibb.co/sHGZwh7
        
               | calvinmorrison wrote:
               | and how do we _feel_ about that outcome?
        
               | slg wrote:
               | This type of bias sounds a lot easier to explain away as
               | a non-issue when we are using "nurse" as the hypothetical
               | prompt. What if the prompt is "criminal", "rapist", or
               | some other negative? Would that change your thought
               | process or would you be okay with the system always
               | returning a person of the same race and gender that
               | statistics indicate is the most likely? Do you see how
               | that could be a problem?
        
               | tines wrote:
               | Not the person you responded to, but I do see how someone
               | could be hurt by that, and I want to avoid hurting
               | people. But is this the level at which we should do it?
               | Could skewing search results, i.e. hiding the bias of the
               | real world, give us the impression that everything is
               | fine and we don't need to do anything to actually help
               | people?
               | 
               | I have a feeling that we need to be real with ourselves
               | and solve problems and not paper over them. I feel like
               | people generally expect search engines to tell them
               | what's really there instead of what people wish were
               | there. And if the engines do that, people can get
               | agitated!
               | 
               | I'd almost say that hurt feelings are prerequisite for
               | real change, hard though that may be.
               | 
               | These are all really interesting questions brought up by
               | this technology, thanks for your thoughts. Disclaimer,
               | I'm a fucking idiot with no idea what I'm talking about.
        
               | true_religion wrote:
               | Cultural biases aren't uniform across nations. If a
               | prompt returns caucasians for nurses, and other races for
               | criminals then most people in my country would not note
               | that as racism simply because there are not, and there
               | have never in history, been enough caucasians resident
               | for anyone to create significant race theories about
               | them.
               | 
               | This is a far cry from say the USA where that would
               | instantly trigger a response since until the 1960s there
               | was a widespread race based segregation.
        
             | jdashg wrote:
             | Additionally, if you optimize for most-likely-as-best, you
             | will end up with the stereotypical result 100% of the time,
             | instead of in proportional frequency to the statistics.
             | 
             | Put another way, when we ask for an output optimized for
             | "nursiness", is that not a request for some ur
             | stereotypical nurse?
        
               | ar_lan wrote:
               | You could stipulate that it roll a die based on
               | percentage results - if 70% of Americans are "white",
               | then 70% of the time show a white person - 13% of the
               | time the result should be black, etc.
               | 
               | That's excessively simplified but wouldn't this drop the
               | stereotype and better reflect reality?
        
               | ghayes wrote:
               | Is this going to be hand-rolled? Do you change the prompt
               | you pass to the network to reflect the desired outcomes?
        
               | SnowHill9902 wrote:
               | No, because a user will see a particular image not the
               | statistically ensemble. It will at times show an Eskimo
               | without a hand because they do statistically exist. But
               | the user definitely does not want that.
        
               | jvalencia wrote:
               | You could simply encode a score for how well the output
               | matches the input. If 25% of trees in summer are brown,
               | perhaps the output should also have 25% brown. The model
               | scores itself on frequencies as well as correctness.
        
               | spywaregorilla wrote:
               | Suppose 10% of people have green skin. And 90% of those
               | people have broccoli hair. White people don't have
               | broccoli hair.
               | 
               | What percent of people should be rendered as white people
               | with broccoli hair? What if you request green people. Or
               | broccoli haired people. Or white broccoli haired people?
               | Or broccoli haired nazis?
               | 
               | It gets hard with these conditional probabilities
        
               | astrange wrote:
               | The only reason these models work is that we don't
               | interfere with them like that.
               | 
               | Your description is closer to how the open source
               | CLIP+GAN models did it - if you ask for "tree" it starts
               | growing the picture towards treeness until it's all
               | averagely tree-y rather than being "a picture of a single
               | tree".
               | 
               | It would be nice if asking for N samples got a diversity
               | of traits you didn't explicitly ask for. OpenAI seems to
               | solve this by not letting you see it generate humans at
               | all...
        
             | LudwigNagasena wrote:
             | > If you want the model to understand how the word "nurse"
             | is usually used, without regard for what a "nurse" actually
             | is, then associating it with female is fine.
             | 
             | That's a distinction without a difference. Meaning is use.
        
               | tines wrote:
               | Not really; the gender of a nurse is accidental, other
               | properties are essential.
        
               | paisawalla wrote:
               | How do you know this? Because you can, in your mind,
               | divide the function of a nurse from the statistical
               | reality of nursing?
               | 
               | Are the logical divisions you make in your mind really
               | indicative of anything other than your arbitrary personal
               | preferences?
        
               | codethief wrote:
               | While not essential, I wouldn't exactly call the gender
               | "accidental":
               | 
               | > We investigated sex differences in 473,260 adolescents'
               | aspirations to work in things-oriented (e.g., mechanic),
               | people-oriented (e.g., nurse), and STEM (e.g.,
               | mathematician) careers across 80 countries and economic
               | regions using the 2018 Programme for International
               | Student Assessment (PISA). We analyzed student career
               | aspirations in combination with student achievement in
               | mathematics, reading, and science, as well as parental
               | occupations and family wealth. In each country and
               | region, more boys than girls aspired to a things-oriented
               | or STEM occupation and more girls than boys to a people-
               | oriented occupation. These sex differences were larger in
               | countries with a higher level of women's empowerment. We
               | explain this counter-intuitive finding through the
               | indirect effect of wealth. Women's empowerment is
               | associated with relatively high levels of national wealth
               | and this wealth allows more students to aspire to
               | occupations they are intrinsically interested in.
               | 
               | Source: https://psyarxiv.com/zhvre/ (HN discussion:
               | https://news.ycombinator.com/item?id=29040132)
        
               | daenz wrote:
               | The "Gender Equality Paradox"... there's a fascinating
               | episode[0] about it. It's incredible how unscientific and
               | ideologically-motivated one side comes off in it.
               | 
               | 0. https://www.youtube.com/watch?v=_XsEsTvfT-M
        
               | mdp2021 wrote:
               | Very certainly not, since use is individual and thus a
               | function of competence. So, adherence to meaning depends
               | on the user. Conflict resolution?
               | 
               | And anyway - contextually -, the representational natures
               | of "use" (instances) and that of "meaning" (definition)
               | are completely different.
        
               | layer8 wrote:
               | Humans overwhelmingly learn meaning by use, not by
               | definition.
        
               | mdp2021 wrote:
               | > _Humans overwhelmingly learn meaning by use, not by
               | definition_
               | 
               | Preliminarily and provisionally. Then, they start
               | discussing their concepts - it is the very definition of
               | Intelligence.
        
           | SnowHill9902 wrote:
           | It's the same as with an artist: "hey artist, draw me a
           | nurse." "Hmm okay, do you want it a guy or girl?" "Don't ask
           | me, just draw what I'm saying." The artist can then say:
           | "Okay, but accept my biases." or "I can't since your input is
           | ambiguous."
           | 
           | For a one-shot generative algorithm you must accept the
           | artist's biases.
        
             | rvnx wrote:
             | Revert back to average (give no weight to unspecified
             | criterias, gender, age, skin-color, religion, country,
             | hair-style, etc).
             | 
             | "hey artist, draw me a nurse."
             | 
             | "Hmm okay, do you want it a guy or girl?"
             | 
             | "Don't ask me, just draw what I'm saying."
             | 
             | - Ok, I'll draw you what an average nurse looks like.
             | 
             | - Wait, it's a woman! She wears a nurse blouse and she has
             | a nurse cap.
             | 
             | - Is it bad ?
             | 
             | - No.
             | 
             | - Ok then what's the problem, you asked for something that
             | looked like a nurse but didn't specify anything else ?
        
         | jimmygrapes wrote:
         | Are "Western gender stereotypes" significantly different than
         | non-Western gender stereotypes? I can't tell if that means it
         | counts a chubby stubble-covered man with a lip piercing, greasy
         | and dyed long hair, wearing an overly frilly dress as a DnD
         | player/metal-head or as a "woman" or not (yes I know I'm being
         | uncharitable and potentially "bigoted" but if you saw my
         | Tinder/Bumble suggestions and friend groups you'd know I'm not
         | exaggerating for either category). I really can't tell what
         | stereotypes are referred to here.
        
         | nomel wrote:
         | If you tell it to generate an image of someone eating
         | Koshihikari rice, will it be biased if they're Japanese? Should
         | the skin color, clothing, setting, etc be made completely
         | random, so that it's unbiased? What if you made it more
         | specific, like "edo period drawing of a man"? Should the person
         | draw be of a random skin color? What about "picture of a
         | viking"? Is it biased if they're white?
         | 
         | At what point is statistical significance considered ok and
         | unbiased?
        
           | pxmpxm wrote:
           | >At what point is statistical significance considered ok and
           | unbiased?
           | 
           | Presumably when you're significantly predictive of the
           | preferred dogma, rather than reality. There's no small bit of
           | irony in machines inadvertently creating cognitive dissonance
           | of this sort; second order reality check.
           | 
           | I'm fairly sure this never actually played out well in
           | history (bourgeois pseudoscience, deutsche physik etc), so
           | expect some Chinese research bureau to forge ahead in this
           | particular direction.
        
         | bogwog wrote:
         | I wouldn't describe this situation as "sad". Basically, this
         | decision is based on a belief that tech companies should decide
         | what our society should look like. I don't know what emotion
         | that conjures up for you, but "sadness" isn't it for me.
        
         | tomp wrote:
         | > a tendency for images portraying different professions to
         | align with Western gender stereotypes
         | 
         | There are two possible ways of interpreting interpreting
         | "gender stereotypes in professions".
         | 
         |  _biased_ or _correct_
         | 
         | https://www.abc.net.au/news/2018-05-21/the-most-gendered-top...
         | 
         | https://www.statista.com/statistics/1019841/female-physician...
        
         | meetups323 wrote:
         | One of these days we're going to need to give these models a
         | mortgage and some mouths to feed and make it clear to them that
         | if they keep on developing biases from their training data
         | everyone will shun them and their family will go hungry and
         | they won't be able to make their payments and they'll just
         | generally have a really bad time.
         | 
         | After that we'll make them sit through Legal's approved D&I
         | video series, then it's off to the races.
        
           | pxmpxm wrote:
           | Underrated comment.
        
           | aaaaaaaaaaab wrote:
           | Reinforcement learning?
        
         | babyshake wrote:
         | Indeed. If a project has shortcomings, why not just acknowledge
         | the shortcomings and plan to improve on them in a future
         | release? Is it anticipated that "engineer" being rendered as a
         | man by the model is going to be an actively dangerous thing to
         | have out in the world?
        
           | makeitdouble wrote:
           | "what could go wrong anyway?"
        
         | tyrust wrote:
         | From the HN rules:
         | 
         | >Eschew flamebait. Avoid unrelated controversies and generic
         | tangents.
         | 
         | They provided a pretty thorough overview (nearly 500 words) of
         | the multiple reasons why they are showing caution. You picked
         | out the one that happened to bother you the most and have
         | posted a misleading claim that the tech is being withheld
         | entirely because of it.
        
         | devindotcom wrote:
         | Good lord. Withheld? They've published their research, they
         | just aren't making the model available immediately, waiting
         | until they can re-implement it so that you don't get racial
         | slurs popping up when you ask for a cup of "black coffee."
         | 
         | >While a subset of our training data was filtered to removed
         | noise and undesirable content, such as pornographic imagery and
         | toxic language, we also utilized LAION-400M dataset which is
         | known to contain a wide range of inappropriate content
         | including pornographic imagery, racist slurs, and harmful
         | social stereotypes
         | 
         | Tossing that stuff when it comes up in a research environment
         | is one thing, but Google clearly wants to implement this as a
         | product, used all over the world by a huge range of people. If
         | the dataset has problems, and why wouldn't it, it is perfectly
         | rational to want to wait and re-implement it with a better one.
         | DALL-E 2 was trained on a curated dataset so it couldn't
         | generate sex or gore. Others are sanitizing their inputs too
         | and have done for a long time. It is the only thing that makes
         | sense for a company looking to commercialize a research
         | project.
         | 
         | This has nothing to do with "inability to cope" and the implied
         | woke mob yelling about some minor flaw. It's about building a
         | tool that doesn't bake in serious and avoidable problems.
        
           | concordDance wrote:
           | I wonder _why_ they don 't like the idea of autogenerated
           | porn... They're already putting most artists out of a job,
           | why not put porn stars out of a job too?
        
             | notahacker wrote:
             | There's definitely a market for autogenerated porn. But
             | automated porn in a Google branded model for general use
             | around stuff that isn't necessarily intended to be
             | pornographic, on the other hand...
        
             | renewiltord wrote:
             | Copenhagen ethics (used by most people) require that all
             | negative outcomes of a thing X become yours if you interact
             | with X. It is not sensible to interact with high negativity
             | things unless you are single-issue. It is logical for
             | Google to not attempt to interact with porn where possible.
        
               | dragonwriter wrote:
               | > Copenhagen ethics (used by most people)
               | 
               | The idea that most people use any coherent ethical
               | framework (even something as high level and nearly
               | content-free as Copenhagen) much less a _particular_
               | coherent ethical framework is, well, not well supported
               | by the evidence.
               | 
               | > require that all negative outcomes of a thing X become
               | yours if you interact with X. It is not sensible to
               | interact with high negativity things unless you are
               | single-issue.
               | 
               | The conclusion in the final sentence only makes sense if
               | you use "interact" in an incorrect way describing the
               | Copenhagen interpretation of ethics, because the original
               | description is only correct if you include observation as
               | an interaction. By the time you have noted a thing is
               | "high-negativity", you have observed it and acquired
               | responsibility for it's continuation under the Copenhagen
               | interpretation; you cannot avoid that by choosing not to
               | interact once you have observed it.
        
               | renewiltord wrote:
               | I'm sure you are capable of steelmanning the argument.
        
         | seaman1921 wrote:
         | Yup this is what happens when people who want headlines nitpick
         | for bullshit in a state-of-the-art model which simply reflects
         | the state of the society. Better not to release the model
         | itself than keep explaining over and over how a model is never
         | perfect.
        
         | [deleted]
        
         | makeitdouble wrote:
         | > Really sad that breakthrough technologies are going to be
         | withheld due to our inability to cope with the results.
         | 
         | Genuinely, isn't it a prime example of the people actually
         | stopping to think if they should, instead of being preoccupied
         | with whether or not they could ?
        
         | ccbccccbbcccbb wrote:
         | In short, the generated images are too gender-challenged-
         | challenged and underrepresent the spectrum of new normalcy!
        
         | alphabetting wrote:
         | There is a contingent of AI activists who spend a ton of time
         | on Twitter that would beat Google like a drum with help from
         | the media if they put out something they deemed racist or
         | biased.
        
         | Mizza wrote:
         | So glad the company that spies on me and reads my email for
         | profit is protecting me from pictures that don't look like TV
         | commercials.
        
           | astrange wrote:
           | Gmail doesn't read your email for ads anymore. They read it
           | to implement spam filters, and good thing too. Having working
           | spam filters is indeed why they make money though.
        
         | ceeplusplus wrote:
         | The ironic part is that these "social and cultural biases" are
         | purely from a Western, American lens. The people writing that
         | paragraph are completely oblivious to the idea that there could
         | be other cultures other than the Western American one. In
         | attempting to prevent "encoding of social and cultural biases"
         | they have encoded such biases themselves into their own
         | research.
        
           | kevinh wrote:
           | What makes you think the authors are all American?
        
             | umeshunni wrote:
             | The authors are listed on the page and a quick look at
             | LinkedIn seem to be mostly Canadian.
        
           | tantalor wrote:
           | https://en.wikipedia.org/wiki/Moral_relativism
        
           | not2b wrote:
           | It seems you've got it backwards: "tendency for images
           | portraying different professions to align with Western gender
           | stereotypes" means that they are calling out their own work
           | precisely because it is skewed in the direction of Western
           | American biases.
        
             | LudwigNagasena wrote:
             | You think there are homogenous gender stereotypes across
             | the whole Western world? You say "woman" and someone will
             | imagine a SAHM, while another person will imagine a you-go-
             | girl CEO with tattoos and pink hair.
             | 
             | What they mean is people who think not like them.
        
             | ceeplusplus wrote:
             | Yes, the idea is that just because it doesn't align to
             | Western ideals of what seems unbiased doesn't mean that the
             | same is necessarily true for other cultures, and by failing
             | to release the model because it doesn't conform to Western,
             | left wing cultural expectations, the authors are ignoring
             | the diversity of cultures that exist globally.
        
               | howinteresting wrote:
               | No, it's coming from a perspective of moral realism. It's
               | an objective moral truth that racial and ethnic biases
               | are bad. Yet most cultures around the world are racist to
               | at least some degree, and to they extent that the
               | cultures do, they are bad.
               | 
               | The argument you're making, paraphrased, is that the idea
               | that biases are bad is itself situated in particular
               | cultural norms. While that is true to some degree, from a
               | moral realist perspective we can still objectively judge
               | those cultural norms to be better or worse than
               | alternatives.
        
               | tomp wrote:
               | You're confused by the double meaning of the word "bias".
               | 
               | Here we mean _mathematical_ biases.
               | 
               | For example, a good mathematical model will correctly
               | tell you that people in Japan (geographical term) are
               | more likely to be Japanese (ethnic / racial bias). That's
               | not "objectively morally bad", but instead, it's
               | "correct".
        
             | young_unixer wrote:
             | The very act of mentioning "western gender stereotypes"
             | starts from a biased position.
             | 
             | Why couldn't they be "northern gender stereotypes"? Is the
             | world best explained as a division of west/east instead of
             | north/south? The northern hemisphere has much more
             | population than the south, and almost all rich countries
             | are in the northern hemisphere. And precisely it's these
             | rich countries pushing the concept of gender stereotypes.
             | In poor countries, nobody cares about these "gender
             | stereotypes".
             | 
             | Actually, the lines dividing the earth into north and
             | south, east and west hemispheres are arbitrary, so maybe
             | they shouldn't mention the word "western" to avoid the
             | propagation of stereotypes about earth regions.
             | 
             | Or why couldn't they be western age stereotypes? Why are
             | there no kids or very old people depicted as nurses?
             | 
             | Why couldn't they be western body shape stereotypes? Why
             | are there so few obese people in the images? Why are there
             | no obese people depicted as athletes?
             | 
             | Are all of these really stereotypes or just natural
             | consequences of natural differences?
        
               | joshcryer wrote:
               | The bulk of the trained data is from western technology,
               | images, books, television, movies, photography, media.
               | That's where the very real and recognized biases come
               | from. They're the result of a _gap in data_ nothing more.
               | 
               | Look at how DALL-E 2 produces little bears rather than
               | bear sized bears. Because its data doesn't have a lot of
               | context for how large bears are. So you wind up having to
               | say "very large bear" to DALL-E 2.
               | 
               | Are DALL-E 2 bears just a "natural consequence of natural
               | differences"? Or is the model not reflective of reality?
        
       | andybak wrote:
       | Great. Now even if I do get a Dall-E 2 invite I'll still feel
       | like I'm missing out!
        
         | rvnx wrote:
         | It's always the same with AI research: "we have something
         | amazing but you can't use it because it's too powerful and we
         | think you are an idiot who cannot use your own judgement."
        
           | andybak wrote:
           | As someone that spent an evening trying to generate images of
           | Hitler Lego I think they have a point.
        
           | [deleted]
        
           | 2bitencryption wrote:
           | I can understand the reasoning behind this, though.
           | 
           | Dall-E had an entire news cycle (on tech-minded publications,
           | that is) that showcased just how amazing it was.
           | 
           | Millions* of people became aware that technology like Dall-E
           | exists, before anyone could get their hands on it and abuse
           | it. (*a guestimate, but surely a close one)
           | 
           | One day soon, inevitably, everyone will have access to
           | something 10x better than Imagen and Dall-E. So at least the
           | public is slowly getting acclimated to it before the
           | inevitable "theater-goers running from a projected image of a
           | train approaching the camera" moment
        
       | the__alchemist wrote:
       | I'll be skeptical until I see it in action, vice pre-selected
       | results.
        
       | bergenty wrote:
       | Primarily Indian origin authors on both the DALL-E and this
       | research paper. Just found that impressive considering they make
       | up 1% of the population in the US.
        
       | sexy_panda wrote:
       | Would I have to implement this myself, or is there something
       | ready to run?
        
         | UncleOxidant wrote:
         | I think implementing this yourself is likely not doable unless
         | you have the computing resources of a Google, Amazon or
         | Facebook.
        
       | manchmalscott wrote:
       | The big thing I'm noticing over DALL-E is that it seems to be
       | better at relative positioning. In a MKBHD video about DALLE it
       | would get the elements but not always in the right order. I know
       | google curated some specific images but it seems to be doing a
       | better job there.
        
         | benwikler wrote:
         | Totally--Imagen seems better at composition and relative
         | positioning and text, while DALL-E seems better at lighting,
         | backgrounds, and general artistry.
        
       | visarga wrote:
       | Interesting discovery they made
       | 
       | > We show that scaling the pretrained text encoder size is more
       | important than scaling the diffusion model size.
       | 
       | There seems to be an unexpected level of synergy between text and
       | vision models. Can't wait to see what video and audio modalities
       | will add to the mix.
        
       | jeffbee wrote:
       | Is there anything at all, besides the training images and labels,
       | that would stop this from generating a convincing response to "A
       | surveillance camera image of Jared Kushner, Vladimir Putin, and
       | Alexandria Ocasio-Cortez naked on a sofa. Jeffrey Epstein is
       | nearby, snorting coke off the back of Elvis"?
        
       ___________________________________________________________________
       (page generated 2022-05-23 23:00 UTC)