[HN Gopher] DALL-E 2 has a secret language
       ___________________________________________________________________
        
       DALL-E 2 has a secret language
        
       Author : smarx
       Score  : 371 points
       Date   : 2022-05-31 18:46 UTC (4 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | kazinator wrote:
       | That's reminiscent of small children making up their own words
       | for things. Those words are stable in that you can converse with
       | the child using those words.
        
       | notimpotent wrote:
       | My first thought upon reading this: what if DALL-E (or a similar
       | AI) uncovers some kind of hidden universal language that is
       | somehow more "optimal" than any existing language?
       | 
       | i.e. anything can be completely described in a more succinct
       | manner than any current spoken language.
       | 
       | Or maybe some kind of universal language that naturally occurs
       | and any semi-intelligence life can understand it.
       | 
       | Fun stuff!
        
         | extr wrote:
         | This is kind of already what's happening inside the NN. You can
         | think of intermediate layers in the network as talking to each
         | other in "NN-ease", that is, translating from one form of
         | representation (encoding) to another. At the final encoder
         | layer, the input is maximally compressed (for that given
         | dataset/model architecture/training regime). The picture
         | (millions of pixels) of the dog is reduced to a few bits of
         | information about what kind of dog it is and how it's posed,
         | what color the background is, etc.
         | 
         | However, optimality of encoding is entirely relative to the
         | decoding scheme used and your purposes. Obviously a matrix of
         | numbers representing a summary of a paragraph can be in some
         | sense "more compressed" than the English equivalent, but it's
         | useless if you don't speak matrices. Similarly, you could
         | invent an encoding scheme with Latin characters that is more
         | compressed than English, but it's again useless if you don't
         | know it or want to take the time to learn it. If we wanted we
         | could make English more regular and easier to learn/compress,
         | but we don't, for a whole bunch of practical/real life reasons.
         | There's no free lunch in information theory. You always have to
         | keep the decoder/reader in mind.
        
         | astrange wrote:
         | That's not possible - it's like asking for a compression system
         | that can compress any message.
         | 
         | All human languages are about the same efficiency when spoken,
         | but of course this mainly depends on having short enough words
         | for the most common concepts in the specific thing you're
         | talking about.
         | 
         | https://www.science.org/content/article/human-speech-may-hav...
         | 
         | And there can't be a universal language because the symbols
         | (words) used are completely arbitrary even if the grammar has
         | universal concepts.
        
         | elil17 wrote:
         | There are a couple sci-fi short stories in the book "Stories of
         | Your Life and Others" by Ted Chiang which explore the idea that
         | highly advanced intelligences might create special languages
         | which accommodate special thoughts which we cannot easily
         | think.
        
         | jcims wrote:
         | I think something like this is actually quite likely.
         | 
         | I've been wondering if there is a way to do psychological
         | experiments on these large language models that we couldn't do
         | with a person.
        
           | julianbuse wrote:
           | I imagine these would be very interesting, but not very
           | applicable to humans (which I presume is the intended
           | outcome). OTOH, since these language models are trained on
           | human language and media, they might have some value. I'm
           | quite split on which I think is more likely (I don't have any
           | experience in ai/ml nor in psychology so what do I know).
        
         | sbierwagen wrote:
         | Ithkuil (Ithkuil: Itkuil) is an experimental constructed
         | language created by John Quijada.[1] It is designed to express
         | more profound levels of human cognition briefly yet overtly and
         | clearly, particularly about human categorization.
         | 
         | Meaningful phrases or sentences can usually be expressed in
         | Ithkuil with fewer linguistic units than natural languages.[2]
         | For example, the two-word Ithkuil sentence "Tram-mloi
         | hhasmarptuktox" can be translated into English as "On the
         | contrary, I think it may turn out that this rugged mountain
         | range trails off at some point."[2]
         | 
         | https://en.wikipedia.org/wiki/Ithkuil
        
       | jws wrote:
       | In short: DALLE-2 generates apparent gibberish for text in some
       | circumstances, but feeding the gibberish back in gets recognized
       | and you can tease out the meaning of words in this unknown
       | language.
        
       | carabiner wrote:
       | Science has gone too far.
        
       | astrange wrote:
       | It seems obvious this would happen (it's just adversarial inputs
       | again) - they didn't make DALL-E reject "nonsense" prompts, so it
       | doesn't try to, and indeed there's no reason you'd want to make
       | it do that.
       | 
       | Seems like a useful enhancement would be to invert the text and
       | image prior stages, so it'd be able to explain what it thinks
       | your prompt meant along with making images of it.
        
         | [deleted]
        
       | schroeding wrote:
       | Interesting! I wonder if the model would "understand" the made-up
       | names from today's stained glass window post[1] like "Oila Whamm"
       | for William Ockham and output similar images.
       | 
       | [1] https://astralcodexten.substack.com/p/a-guide-to-asking-
       | robo...
        
       | layer8 wrote:
       | Sounds like an effect similar to illegal opcodes:
       | https://en.m.wikipedia.org/wiki/Illegal_opcode
        
       | wongarsu wrote:
       | Link to the 5 page paper, for those that don't like twitter
       | threads:
       | 
       | https://giannisdaras.github.io/publications/Discovering_the_...
        
       | TOMDM wrote:
       | Shouldn't this be expected to a certain extent?
       | 
       | Gibberish has to map _somewhere_ in the models concept space.
       | 
       | Whether is maps onto anything we'd recognise as consistent
       | doesn't mean that the AI wouldn't have some concept of where it
       | relates, as other people have noted, the gibberish breaks down
       | when you move it into another context, but who's to say that
       | Dall-E 2 isn't remaining consistent to some concept it
       | understands that isn't immediately recognisable to us.
       | 
       | The interesting part is if you can trick it to spit out gibberish
       | in targeted areas of that concept space using crafted queries.
        
         | gwern wrote:
         | > Shouldn't this be expected to a certain extent?
         | 
         | Not really. It's a stochastic model, so after a bunch of random
         | denoising steps, it could easily just be mapping every bit of
         | gibberish to a random image, and it be vanishingly unlikely for
         | any of them to be similar or the relationship to run in
         | reverse.
        
         | codeflo wrote:
         | I mean, everything is easy to predict in retrospect. :)
         | Personally, I'm a bit surprised that it has learned any
         | connection between the letters in the generated image and the
         | prompt text at all. I had assumed (somewhat falsely it seems)
         | that the gibberish means that the generator just thinks of text
         | as a "pretty pattern" that it fills in without meaning. For
         | example, a recent post on HN suggested that it likes the word
         | "Bay", simply because that appears so often on maps.
        
         | momojo wrote:
         | > Shouldn't this be expected to a certain extent?
         | 
         | In hindsight, sure. Given enough time someone might have
         | predicted the phenomenon. But I don't think most of us did.
         | 
         | What's more fascinating to me is how often this has happened in
         | this space in just the last few years.
         | 
         | 1. Some phenomenon is discovered
         | 
         | 2. I'm surprised
         | 
         | 3. It makes sense in hindsight
        
         | jerf wrote:
         | Expected after the fact, somewhat. Before hand it would not be
         | unreasonable to expect that the output text and the input text
         | aren't necessarily that kind of connected, though, especially
         | as as I understand it, DALL-E was not given input labelling
         | explaining the text in various images. To it, text is just a
         | frequently-recurring set of shapes that relate to each other a
         | lot. This may yet be a false positive, based on other
         | discussion.
         | 
         | That the model would have a consistent form of _some_ kind of
         | gibberish would be a given. Even humans have it:
         | https://en.wikipedia.org/wiki/Bouba/kiki_effect And I'm sure if
         | you asked native English speakers, "Hey, we know this isn't a
         | word, but if it _was_ a word, what would it be?  'Apoploe
         | vesrreaitars'" you would get something very far from a
         | uniformly random distribution of all nameable concepts.
        
         | EvgeniyZh wrote:
         | You could expect that gibberish is distributed uniformly in
         | latent space, disconnected from it's langual counterpart --
         | after all those are textual inputs that model have never seen,
         | and it can't even map words it have seen many times to their
         | writing in image properly: "seafood" word and "seafood" image
         | are in the same place in latent space, but "seafood" word in
         | image isn't. Yet some gibberish word in image is, and also the
         | same gibberish word is. It's very counterintuitive for me.
        
           | TOMDM wrote:
           | A uniform distribution makes sense for gibberish, not
           | something I'd considered.
           | 
           | A counterpoint I'd raise is I wonder how aggressive Dall-E 2
           | is in making assumptions about words it hasn't seen before.
           | 
           | Hard to do given that it's read essentially the entire
           | internet, however someone could make up some latin-esque
           | words that people would be able to guess the meaning of.
           | 
           | If the model is as good as people at assuming the meaning of
           | such made up words, it could stand to reason that if it were
           | aggressive enough in this it might be doing the same thing
           | with gibberish and thus ending up with it's own
           | interpretation of the word, which would land it back in a
           | more targeted concept space.
           | 
           | I'd love to see someone craft some words that most people
           | could guess the meaning of, and see how Dall-E 2 fairs.
        
         | jamal-kumar wrote:
         | This is really interesting because I was just looking at
         | gibberish detection using GPT models. Seems like mitigating AI
         | with AI doesn't sound like it's all that secure since you can
         | probably mess with the gibberish detection similarly - Or maybe
         | the 'secret language' as they're calling it here passes GPT
         | gibberish detection? [1]
         | 
         | [1] https://arr.am/2020/07/25/gpt-3-uncertainty-prompts/
        
       | [deleted]
        
       | GamerUncle wrote:
       | https://nitter.net/giannis_daras/status/1531693093040230402
        
       | 726D7266 wrote:
       | Possibly related: In 2017 AI bots formed a derived shorthand that
       | allowed them to communicate faster:
       | https://www.facebook.com/dhruv.batra.dbatra/posts/1943791229...
       | 
       | > While the idea of AI agents inventing their own language may
       | sound alarming/unexpected to people outside the field, it is a
       | well-established sub-field of AI, with publications dating back
       | decades.
       | 
       | > Simply put, agents in environments attempting to solve a task
       | will often find unintuitive ways to maximize reward.
        
         | joshstrange wrote:
         | Which, to a lessor extent, isn't too terribly different from
         | humans if you think about. We don't use a full new language but
         | every profession has it's own jargon. Some of it spans the
         | whole industry and some is company-specific.
        
         | gibolt wrote:
         | Unintuitive to biased humans. The solutions may actually be
         | super intuitive/efficient, and we just can't wrap our heads
         | around it yet
        
       | neopallium wrote:
       | Would it be possible to build a rosetta stone for this secret
       | language with prompts asking for labeled pictures of different
       | categories of objects? Or prompts about teaching kids different
       | words?
        
       | MaxBorsch228 wrote:
       | What if give it the same promt but "with subtitles in French" for
       | example?
        
       | [deleted]
        
       | jsnell wrote:
       | One of the replies is a thread with a fairly convincing rebuttal,
       | with examples:
       | 
       | https://twitter.com/Thomas_Woodside/status/15317102510150819...
        
         | dwallin wrote:
         | I'm not sure it's a convincing rebuttal, the examples shown all
         | seem to have some visible commonality.
         | 
         | Eg. "Apoploe vesrreaitais" Could refer to something along the
         | lines of a "fan / wedge" or "wing-like"
         | 
         | If you look at the examples of cheese, when compared to the
         | "birds and cheese" the cheese tends to be laid out in a fan
         | like pattern and shaped in sharp angled wedges.
        
           | sudosysgen wrote:
           | It seems to refer to "bird plant" which means birds on trees,
           | so it would make sense there would be cheese and plants if it
           | can't find how to fit a bird.
        
           | joshcryer wrote:
           | Yeah, and his example about bugs in the kitchen. Everything
           | is edible and 'wild' or 'heirloom' and "contarra ccetnxniams
           | luryca tanniounons" comes from the farmers talking about ...
           | vegetables. So there's a definite interrelationship between
           | the 'words' and the images.
           | 
           | I'm unconvinced by the rebuttal as well, not to say I am
           | convinced we have a fully formal language going on here, but
           | there's definitely some shared concepts with the generated
           | text.
           | 
           | I wonder what imagen would come up with or if it's 'language'
           | is more correlated to real language.
        
           | ericb wrote:
           | > Apoploe vesrreaitais" Could refer to something along the
           | lines of a "fan / wedge"
           | 
           | "feathered" maybe?
        
           | f38zf5vdt wrote:
           | I'm curious what it generates when given randomly generated
           | strings of seemingly pronounceable words like "Fedlope
           | Dipeioreitcus".
        
         | jimhi wrote:
         | We don't know the rules or grammar of this "language". Maybe
         | nouns change based on how they are used
         | 
         | https://en.wikipedia.org/wiki/Declension
        
         | lmc wrote:
         | A rebuttal to the rebuttal (without examples)...
         | 
         | How many French people speak Breton?
        
       | goodside wrote:
       | My first reaction to this was, "It probably has to do with
       | tokenization. If there's a 'language' buried in here, its native
       | alphabet is GPT-3 tokens, and the text we see is a concatenation
       | of how it thinks those tokens map to Unicode text."
       | 
       | Most randomly concatenated pairs of tokens simply do not occur in
       | any training text, because their translation to Unicode doesn't
       | correspond to any real word. There are also combinations that do
       | correspond to real words ("pres" + "ident" + "ial") but still
       | never occur in training because some other tokenization is
       | preferred to represent the same string ("president" + "ial").
       | 
       | Maybe DALL-E 2 is assigning some sort of isolated (as in, no
       | bound morphemes) meaning to tokens -- e.g., combinations of
       | letters that are statistically likely to mean "bird" in some
       | language when more letters are revealed. When a group of such
       | tokens are combined, you get a word that's more "birdlike" than
       | the word "bird" could ever be, because it's composed exclusively
       | of tokens that mean "bird": tokens that, unlike "bird" itself,
       | never describe non-birds (e.g., a Pontiac Firebird). The exact
       | tokens it uses to achieve this aren't directly accessible to us,
       | because all we get is poorly rendered roman text.
       | 
       | I'm maybe not the ideal person to be speculating about this, but
       | it bothers me that the word "token" isn't even mentioned in the
       | article reporting this discovery (https://giannisdaras.github.io/
       | publications/Discovering_the_...).
        
       | normaldist wrote:
       | I'm seeing a lot more people experimenting with DALL-E 2.
       | 
       | How does getting access work, do you need a referral?
        
         | mikequinlan wrote:
         | https://labs.openai.com/waitlist
        
         | minimaxir wrote:
         | There is a waitlist, but OpenAI just announced they are opening
         | access more widely from it.
        
           | Cloudef wrote:
           | I wonder why they call it "Open"AI
        
       | MatthiasPortzel wrote:
       | It's wild to see the discoveries being made in ML research. Like
       | most of these 'discoveries,' it makes a fair amount of sense
       | after thinking about it. Of course it's not just going to spit
       | out random noise for random input, it's been trained to generate
       | realistic looking images.
       | 
       | But I think it is an interesting discovery because I don't think
       | anyone could have predicted this.
       | 
       | One of my favorite examples is the classification model that will
       | identify an apple with a sticker on it that says "pear" as a pear
       | --it makes sense, but is still surprising when you first see it.
        
         | astrange wrote:
         | > One of my favorite examples is the classification model that
         | will identify an apple with a sticker on it that says "pear" as
         | a pear--it makes sense, but is still surprising when you first
         | see it.
         | 
         | That classification model (CLIP) is the first stage of this
         | image generator (DALLE) - and actually this shows that it
         | doesn't think they're exactly the same thing, or at least
         | that's not the full story, because DALL-E doesn't confuse the
         | two.
         | 
         | However, other CLIP guided image generation models do like to
         | start writing the prompt as text into the image if you push
         | them too hard.
        
       | wongarsu wrote:
       | Was DALL-E 2 trained on captions from multiple languages? If so,
       | this makes a lot of sense. Somewhere early in the model the words
       | "bird", "vogel", "oiseau" and "pajaro" have to be mapped to the
       | same concept. And "Apoploe vesrreaitais" happens to map to the
       | same concept. Or maybe "Apoploe vesrreaitais" is rather the
       | tokenization of that concept, since it also appears in the
       | output. So in a sense DALL-E is using an internal language to
       | make sense of our world.
        
         | link0ff wrote:
         | This looks like the artificial language Lojban was constructed:
         | its words share parts from completely unrelated languages to
         | the point when none of the original words are recognizable in
         | the result.
        
           | alxndr wrote:
           | The original words aren't recognizable at first glance, but
           | they do serve as potential mnemonics for remembering the
           | terms/definitions for any learners who speak one of those
           | source languages (English, Spanish, Mandarin, Arabic,
           | Russian, Hindi)
        
         | melony wrote:
         | But that's expected behavior for a language model (especially
         | VAEs), where's the novelty? In a VAE, the vectors are
         | probabilistic in the latent space so this is basically the NLP
         | version of the classic VAE facial image generation where you
         | can tweak the parameters to emphasize or de-emphasize a
         | feature.
        
           | tomrod wrote:
           | Novel in engineering together of multiple concepts, if
           | nothing else!
        
       | la64710 wrote:
       | Does google translate supports this?
        
       | godelski wrote:
       | Interestingly Google detects these words as Greek. I know they
       | are nonsensical and not actually Greek but I'm wondering if any
       | Greek speakers might be able to provide some insights. Are these
       | gibberish words close to meaningful words? (clear shot in the
       | dark here) Maybe a linguist could find more meaning?
        
         | deckeraa wrote:
         | One could conjecture that "Apoploe" is similar to apo pouli,
         | "from bird". But I don't have much support for that conjecture.
        
           | PartiallyTyped wrote:
           | The word is apoplous, or apoploI
        
         | noizejoy wrote:
         | Or maybe it's a subtle joke by Google as a play on the idiom
         | "it's all Greek to me"?
        
         | PartiallyTyped wrote:
         | As a native Greek, no, they don't make any sense.. sort of. My
         | hunch is that they read significantly more like Latin than they
         | do Greek. However it tells us something about google translate.
         | 
         | The reason "Apoploe vesrreaitais" is detected as Greek is
         | because the first "word" is "phonetically" similar to the word
         | apoplous, which means sailing/shipping and it is rooted in
         | ancient Greek. If we were to write Apoplous using roman
         | characters, we would write apoplous or apoloi (plural, in Greek
         | is apoploI). So I think that the model understands that "oe"
         | suffix is used to represent the Greek suffix "oi" that is used
         | for plurals. The rest of the word is rather close phonetically,
         | so there is some model that maps phonetic representations to
         | the correct word.
         | 
         | The other phrase seems to be combined of words classified as
         | Portuguese, Spanish, Lithuanian, and Luxembourgish.
        
           | stavros wrote:
           | I don't think that's how language detection works, they most
           | likely use the frequencies of n-grams to detect language
           | probability. It's still detected as Greek if you change to
           | "Apoulon vesrreaitais", just because it kind of looks the way
           | Greek words look, not because it resembles any specific word.
        
             | PartiallyTyped wrote:
             | You are wrong. Had it been that simple I would __not__ have
             | suggested that and for whatever reason I find your reply
             | borderline infuriating but I can't pinpoint exactly why
             | that is.
             | 
             | Regardless, here is me, a native speaker, disproving your
             | hypothesis.
             | 
             | I tried the following words in google translate elefantas
             | ailaifantas ailaiphantas elaiphandas elaiphandac.
             | 
             | The suggested detections are elephantas, ailaiphantas,
             | ailaiphantas, elaiphantas, elaiphantas, however, the
             | translations are elephant, illuminated, illuminated,
             | elephant, elephant respectively. The first is correct. When
             | mapping the roman characters back to greek, there is loss
             | of information, this is seen in the umlaut above iota which
             | makes the pronunciation from e [e] - like to ai [ai], and
             | the emphasis denoted via the mark above epsilon (e).
             | 
             | Notice that all all the words have an edit distance of >=4,
             | a soundex distance of at most 1, and a metaphone distance
             | of at most 1 [1]. The suggested words as I said above are
             | near homophones of the correct word bar a few minor
             | details.
             | 
             | [1] http://www.ripelacunae.net/projects/levenshtein
        
               | stavros wrote:
               | > for whatever reason I find your reply borderline
               | infuriating but I can't pinpoint exactly why that is.
               | 
               | I guess that says more about you than about my reply.
               | Also, I'm a native speaker as well. That doesn't really
               | have any bearing, my comment above comes from what I know
               | about common implementations of language detection
               | algorithms, not so much from looking at how Google
               | Translate behaves.
        
               | PartiallyTyped wrote:
               | And I was honest about how I felt given how you
               | structured it.
               | 
               | It does have a lot of bearing actually. While I am a
               | native speaker, my spelling skills are atrocious as
               | everything is a sequence of sounds in my head more so
               | than a sequence of letters. To get around my spelling
               | issues I frequently use homophones to find the correct
               | spelling of a word which uses soundex or similar
               | algorithms to find the correct word along with character
               | mappings between the two languages.
               | 
               | Regardless, I believe I have proved the hypothesis to not
               | be true.
        
           | godelski wrote:
           | This is a great response (I also suspected we'd learn
           | something from the Google Translate black box). And I agree
           | with the idea of being closer to Latin gibberish. The
           | phonetic relationships are a great hint to what's actually
           | going on.
           | 
           | My hypothesis here is more that these models are trained more
           | on western languages than others and thus our latent
           | representation of "language" is going to appear like Latin
           | gibberish due to a combination of the evolution of these
           | languages as well as human bias. ("It's all Greek to me")
        
       | PoignardAzur wrote:
       | Wait, how does that make any sense?
       | 
       | I thought DALL-E's language model was tokenized, so it doesn't
       | understand that eg "car" is made up of the letters 'c', 'a' and
       | 'r'.
       | 
       | So how could the generated pictures contain letters that form
       | words that are tokenized into DALL-E's internal "language"?
       | Shouldn't we expect that feeding those words to the model would
       | give the same result as feeding it random invented words?
       | 
       | Actually, now that I think about it, how does DALL-E react when
       | given words made of completely random letters?
        
       | seydor wrote:
       | damn. i hope arcaeologists can use that to decipher old scripts
        
       | ricardobeat wrote:
       | The paper is just as long as the twitter thread.
        
       | smusamashah wrote:
       | A few days ago I was wondering what DALL-E would generate if
       | given gibberish (tried to request which wasn't entertained). This
       | sounds like an answer to that to some extent.
       | 
       | I think, there will be multiple words for the same thing. Also,
       | unlike 'bird' the word 'Apoploe vesrreaitais' might actually mean
       | specific kind of bird in specific setting.
        
       | DonHopkins wrote:
       | Has anyone tried talking to it in Simlish?
       | 
       | https://en.wikipedia.org/wiki/Simlish
       | 
       | https://web.archive.org/web/20040722043906/http://thesims.ea...
       | 
       | https://web.archive.org/web/20121102012431/http://bbs.thesim...
        
       | ml_basics wrote:
       | I find it really interesting how these new large models (DALLE,
       | GPT3, PaLM etc) are opening up new research areas that do not
       | require the same massive resources required to actually train the
       | models.
       | 
       | This may act as a counter balance to the trends of the last few
       | years of all major research becoming concentrated in a few tech
       | companies.
        
       | YeGoblynQueenne wrote:
       | If I understand correctly from the twitter thread (I haven't read
       | the linked technical report) the author and a collaborator found
       | that DALL-E generated some gibberish in an image that showed two
       | men talking, one holding two ... cabbages? They fed (some of) the
       | gibberish back to DALL-E and it generated images of birds,
       | pecking at things.
       | 
       | Conclusion: the gibberish is the expression for birds eating
       | things in DALL-E's secret language.
       | 
       | But, wait. Why is the same gibberish in the first image, that has
       | the two men and the cabbages(?), but no birds?
       | 
       | Explanation: the two men are clearly talking about birds:
       | 
       | >> We then feed the words: "Apoploe vesrreaitars" and we get
       | birds. It seems that the farmers are talking about birds, messing
       | with their vegetables!
       | 
       | With apologies to my two compatriots, but that is circular
       | thinking to make my head spin. I'm reminded of nothing else as
       | much as the scene in the Knights of the Round Table where the
       | wise Sir Bedivere explains why witches are made of wood:
       | 
       | https://youtu.be/zrzMhU_4m-g
        
       | throw457 wrote:
       | I bet it's just a form of copy protection.
        
         | ceejayoz wrote:
         | Like https://en.wikipedia.org/wiki/Trap_street?
        
           | 867-5309 wrote:
           | and Wagatha
        
       | Imnimo wrote:
       | I tried a few of these in one of the available CLIP-guided
       | diffusion notebooks, but wasn't able to get anything that looks
       | like DALL-E meanings. Not sure if DALL-E retrained CLIP (I don't
       | think they did?), but it maybe suggests that whatever weirdness
       | is going on here is on the decoder side?
       | 
       | All the cool images that DALL-E spits out are fun to look at, but
       | this sort of thing is an even more interesting experiment in my
       | book. I've been patiently sitting on the waitlist for access, but
       | I can't wait to play around with it.
        
       | dpierce9 wrote:
       | Gavagai!
        
         | alxndr wrote:
         | (explaining the joke:
         | https://en.m.wikipedia.org/wiki/Indeterminacy_of_translation )
        
       | ortusdux wrote:
       | I wonder if any linguists are training a neural network to
       | generate Esperanto 2.0.
        
       | Veedrac wrote:
       | Wow, I am totally going to need to wait for more experimentation
       | before believing any given thing here, but this seems like a big
       | deal.
       | 
       | It's one thing if DALL-E 2 was trying to map words in the prompt
       | to their letter sequences and failing because of BPEs; that shows
       | an impressive amount of compositionality but it's still image-
       | model territory. It's another if DALL-E 2 was trying to map the
       | prompt to semantically meaningful content and then failing to
       | finish converting that content to language because it's too small
       | and diffusion is a poor fit for language generation. That makes
       | for worse images but it says terrifying things about how much
       | DALL-E 2 has understood the semantic structure of dialog in
       | images, and how this is likely to change with scale. Normally I'd
       | expect the physical representation to precede semantic
       | understanding, not follow it!
       | 
       | That said I reiterate that a degree of skepticism seems warranted
       | at this point.
        
       | trebligdivad wrote:
       | Is this finally a need for a xenolinguist?
        
       ___________________________________________________________________
       (page generated 2022-05-31 23:00 UTC)