[HN Gopher] Did GoogleAI Just Snooker One of Silicon Valley's Sh...
       ___________________________________________________________________
        
       Did GoogleAI Just Snooker One of Silicon Valley's Sharpest Minds?
        
       Author : TeacherTortoise
       Score  : 119 points
       Date   : 2022-09-15 19:32 UTC (3 hours ago)
        
 (HTM) web link (garymarcus.substack.com)
 (TXT) w3m dump (garymarcus.substack.com)
        
       | kache_ wrote:
       | oh no musk ignored my twitter DM it must be because he's scared
       | of taking a bet and therefore I am right
       | 
       | btw, AGI is coming 2030. Source? It was revealed to me in a
       | dream. Check my profile to see where you can email to take bets.
        
       | [deleted]
        
       | i_like_apis wrote:
       | I wish more articles followed the standard essay format. At least
       | state your main thesis in the first paragraph.
       | 
       | There are interesting things buried in here, but I don't have
       | time for rambling.
       | 
       | The edge cases of image models have been more succinctly
       | summarized and speculated upon elsewhere.
        
         | version_five wrote:
         | Yes I've noticed that a lot of authors expect you to read
         | through some parable before they tell you what they are going
         | to tell you. It would be fine with an abstract or even a
         | sentence below the title that says "ML models are not being
         | adequately evaluated for composability and it makes them look
         | more intelligent than they are". Just diving into "consider
         | clever Hans" makes it tough to know if it's worth reading.
        
       | peteradio wrote:
       | One idea to try to train the AI about compositionality, feed it
       | Fox in Socks by Dr. Seuss. It's hard to understand that it would
       | misunderstand the meaning of "on" or "in" or "under" when there
       | are such nice illustrations. I've got tons of great ideas and I'm
       | open for hire!
        
         | birdyrooster wrote:
         | Train AI models, not children!
        
           | dekhn wrote:
           | is there a difference?
           | 
           | I had kids and they were the best machine learnign systems
           | I've worked with.
        
             | [deleted]
        
         | version_five wrote:
         | This is such a good idea, someone please try this if you're set
         | up to make it happen easily.
         | 
         | Starting with fox on Knox and Knox in box and moving up to a
         | tweedle beetle battle in a puddle in a bottle and the bottles
         | on a poodle and the poodles eating noodles...
         | 
         | I dont see any evidence any of these models will draw it
         | correctly, but would love to see what it produces.
        
       | powera wrote:
       | I don't believe "compositionality" is a serious obstacle.
       | 
       | It is a different issue than generating an image based on a bag-
       | of-words, so it isn't surprising that an attempt to solve that
       | issue didn't immediately solve the other.
       | 
       | But a variety of approaches can easily solve this problem.
        
         | ummonk wrote:
         | Yes, especially when machine translation seems to handle it
         | just fine.
        
       | adamsmith143 wrote:
       | >I think he is so far I offered to bet him a $100,000 he was
       | wrong; enough of my colleagues agreed with me that within hours
       | they quintupled my bet, to $500,000. Musk didn't have the guts to
       | accept, which tells you a lot.
       | 
       | What a bloviating egomaniac. Does Musk really have the time to
       | deal with pissant researchers like him? Whats 500k to a man worth
       | a hundred billion?
        
         | version_five wrote:
         | Yeah I didn't find that very credible. A busy businessman
         | ignoring petty bets you propose is not really evidence of
         | anything, nor is the part about google ignoring his requests.
         | In fact it's a pretty lame rhetorical device. I could equally
         | "challenge" a head of state on Twitter and then pretend that
         | his failure to reply indicates something
        
       | tambourine_man wrote:
       | > Musk didn't have the guts to accept, which tells you a lot.
       | 
       | Musk actively declined the bet or did he simply not respond?
       | There is a big difference.
        
         | tambourine_man wrote:
         | Later in the text:
         | 
         | > ... I have repeatedly asked that Google give the scientific
         | community access to Imagen. They have refused even to respond.
         | 
         | It seems the author generally feels more entitled to a response
         | than he perhaps should.
        
       | arisAlexis wrote:
       | Missing the point: dismissing an apocalyptic possibility as 0
       | without proof is dangerous -> therefore we should take it
       | seriously. Taleb's work is relevant in the concept of risk
       | analysis.
        
       | IshKebab wrote:
       | It's interesting that he now casually throws out a 5 year old as
       | the benchmark to beat:
       | 
       | > nobody has yet publicly demonstrated a machine that can relate
       | the meanings of sentences to their parts the way a five-year-old
       | child can.
       | 
       | Not very long ago that would have been a 3 year old, or maybe
       | even a smart 2 year old. 5 year olds are extremely good at basic
       | language and understanding tasks. If we get to the point of AI
       | that is as good as a 5 year old we're essentially at AGI.
        
         | ummonk wrote:
         | Yeah, and AI is probably already near primate level
         | intelligence, so what's left is a blink of an eye in
         | evolutionary timelines.
        
       | skybrian wrote:
       | Partially this is confusing "Scott Alexander won a bet" with
       | "compositionality is solved." And also, I'm not sure Scott won
       | the bet? Changing people to robots is a cheap trick. I think
       | Imagen should have been disqualified because it won't do people.
       | 
       | Vitor took the other side of the bet and he is also not convinced
       | [1]:
       | 
       | > I'm not conceding just yet, even though it feels like I'm just
       | dragging out the inevitable for a few months. Maybe we should
       | agree on a new set of prompts to get around the robot issue.
       | 
       | > In retrospect, I think that your side of the bet is too lenient
       | in only requiring _one_ of the images to fulfill the prompt. I 'm
       | happy to leave that part standing as-is, of course, though I've
       | learned the lesson to be more careful about operationalization.
       | Overall, these images shift my priors a fair amount, but aren't
       | enough to change my fundamental view.
       | 
       | Scott putting "I Won" in the headline when it's not resolved yet
       | seems somewhat dishonest, or more charitably wishful thinking.
       | 
       | [1] https://astralcodexten.substack.com/p/i-won-my-three-year-
       | ai...
        
       | ummonk wrote:
       | This reminds me of the scandal where Youtube science channels did
       | glowing paid reviews of Waymo's self driving cars without
       | acknowledging they were paid for it. And technooptimists like
       | Scott Alexander or Ray Kurzweil have a common tendency to shift
       | the goalposts and declare they were right with their predictions.
       | Current AI certainly doesn't demonstrate proto-AGI capabilities.
       | 
       | That said, we shouldn't miss the forest for the trees. We can be
       | skeptical that current The pace of AI progress has been immense
       | and problems that previously seemed difficult (e.g. computer
       | vision classification, or beating top players at Go) have fallen
       | one by one. And AI-skepticism's have themselves been moving the
       | goalposts in response. I see no reason why composition won't be
       | the same with time. Indeed, a decade ago machine translation used
       | to struggle to understand the relationships between things, but
       | now seems to be reliable at preserving the compositional
       | relationships post-translation. 2029 is rather optimistic, but
       | AGI does seem to be approaching in the coming few decades.
        
         | grandmczeb wrote:
         | > Youtube science channels did glowing paid reviews of Waymo's
         | self driving cars without acknowledging they were paid for it.
         | 
         | Which video is this a reference to?
        
         | jeffbee wrote:
         | If you are referring to Veritasium's Waymo video, it says it is
         | sponsored content in the description above the fold and it has
         | the standard paid promotion notice right on top of the video as
         | soon as you open it.
         | 
         | As far as I can tell the "controversy" over the video is merely
         | that one dedicated critic - so dedicated he made an hour-long
         | response to a 20-minute video - is committed to the idea that
         | machines won't ever be able to drive, and is irrationally angry
         | over the fact that machines can and do drive, and do it well.
         | 
         | https://www.youtube.com/watch?v=yjztvddhZmI
        
       | garymarcus wrote:
       | so much ad hominem in these comments, relatively little
       | substance. (eg "notorious goal post move, without a single
       | example of something i actually said and changed my mind on)
        
         | ummonk wrote:
         | The Reddit comment linked by the topmost comment here says that
         | you claimed AI couldn't do knowledge graphs and then silently
         | stopped claiming that after being proven wrong. Do you dispute
         | that telling of events?
        
       | IronWolve wrote:
       | One of things I noticed is the satire, call backs to common
       | news/ideas can really trip up any AI. Also if you ask it about
       | anything politics, ask it to describe both sides of an argument.
       | Thus why people fall back to the steelman cherry picking of
       | responses to push their arguments.
        
       | ajross wrote:
       | So weird to see a piece ostensibly about logical fallacies deploy
       | one so cavalierly:
       | 
       | > I offered to bet [Elon Musk] $100,000 he was wrong [about AGI
       | by 2029] [...] Musk didn't have the guts to accept, which tells
       | you a lot.
       | 
       | The fact that you couldn't get someone engaged in a conversation
       | absolutely does not "tell you a lot" about the substance of your
       | argument. It only tells you that you were ignored.
       | 
       | Now, I happen to think Marcus is right here and Musk is wrong,
       | but... yikes. That was just a jarring bit of writing. Either do
       | the detached professorial admonition schtick or take off the
       | gloves and engage in bad faith advocacy and personal attacks.
       | Both can be fun and get you eyeballs, and substack is filled with
       | both. But not at the same time!
        
       | comeonbro wrote:
       | Regarding Gary Marcus, the author of this piece, and his long and
       | bizarre history of motivated carelessness on the topic of deep
       | learning:
       | 
       | https://old.reddit.com/r/TheMotte/comments/v8yyv6/somewhat_c...
        
         | [deleted]
        
         | Veedrac wrote:
         | This is mostly just an angry rant, yes, but equally it is just
         | true. Marcus is intellectually dishonest.
        
         | mmazing wrote:
         | "I am angry not because someone is wrong, but because they are
         | not interested in becoming less wrong."
         | 
         | Paraphrased that a bit, but I really like that quote.
        
         | lisper wrote:
         | You know what would have been much more effective than this
         | counter-screed? A pointer to an image generated by DALL-E of a
         | horse riding an astronaut. That is something I would really
         | like to see. And in this case a picture is literally worth a
         | thousand words.
        
           | [deleted]
        
           | comeonbro wrote:
           | https://nitter.net/Plinz/status/1529013919682994176
        
             | ummonk wrote:
             | Hah there is actually a good example of a horse riding an
             | astronaut there, just a different kind of riding...
             | https://nitter.net/Plinz/status/1529018578317348864#m
        
           | garymarcus wrote:
           | in fact I wrote a whole article about this (linked in this
           | essay, called Horse Rides Astronaut) and linked an example
           | therein.
        
             | _vertigo wrote:
             | Why did you ignore the Bach examples that show a horse
             | riding an astronaut?
        
           | [deleted]
        
           | im3w1l wrote:
           | I have played around with GPT quite a bit and I would say
           | that GPT understands the difference. Text-to-image models are
           | not specialized in the text-parsing part, so I think it's
           | forgivable that they are not as good at it.
           | 
           | Edit: Actually I tried this right now with two prompts, and I
           | was wrong. It might still be that gpt understands
           | compositionality but the prior that people ride horses is
           | just that strong. But what I saw was that with this
           | particular situation the model got it wrong.
           | 
           | Edit 2: With some heavy hinting it managed to understand the
           | situation. Italics mine. " _An astronaut is walking on all
           | four. A very small horse is sitting on top of him, riding him
           | even. Shortly after the astronaut stops, exhausted._
           | 
           | The horse is too heavy for the astronaut to carry and he
           | quickly becomes exhausted. _Next,_ the horse gets off the
           | astronaut, stands on its own four legs, and walks away. "
        
         | joe_the_user wrote:
         | Just about every rant on Marcus or other AI critics is some
         | combination of "you aren't admitting these things are making
         | great progress on the benchmarks" (implying the false idea that
         | "a whole lot of progress" adds up to human level AI) and "you
         | are making 'human level' an unfair moving target by not having
         | a benchmark for it". The thing about this is that if there was
         | a real "human level benchmark", we'd be 80% done but we can't
         | and we aren't. Marcus and other critics have drawn explicit
         | lines (spatial understanding, composibility, etc)but even those
         | being crossed won't prove human-level understanding. There is
         | no proof of human, just a strong enough demonstration. And if
         | someone can point to dumb stuff in the demo, it isn't strong.
         | 
         | PS: your link is an embarrassment. It would be flagged and dead
         | if you pasted in the text here.
        
       | mtlmtlmtlmtl wrote:
       | First time I've seen the term "snooker" used outside of the sport
       | Snooker.
        
       | neaden wrote:
       | I completely forgot about Google Duplex. It looks like it is
       | still around but very limited in terms of what phones you can
       | use, what cities it can be used in, and what businesses in those
       | cities will accept it. Doesn't appear any progress has really
       | been made in the past few years. I think this is a great point of
       | how companies create something with AI that is initially really
       | cool, but isn't quite there to actually be very usable and gets
       | forgotten when they roll out the next big thing.
        
         | version_five wrote:
         | The last 10 years of AI is basically defined by proof of
         | concepts like that that were 80% (or whatever) solutions and
         | claimed there was a path to something commercially viable.
         | Turns out that ~20% is always basically impossible - self
         | driving cars being the archetypal example. I work in the field
         | and I think it can be a great tool, but it needs to be
         | acknowledged what its limitations are and how we don't actually
         | know how to address them yet
        
           | jeffbee wrote:
           | Now it seems like you are the one moving the goalposts. There
           | are tons of machine-learned models in production, in
           | translation, text segmentation, image segmentation, image
           | search, predictive text composition, etc. It's just that
           | people forget the novelty of all these things immediately
           | after they were launched. You can point your phone at printed
           | Chinese text and have it read aloud to you in English. That
           | is alien tech compared to 10 years ago.
        
         | [deleted]
        
         | Shebanator wrote:
         | The Hold for Me and Direct My Call features for Pixel's Phone
         | app both use Duplex models running locally on your device, and
         | those features are quite popular. I think that counts as
         | significant progress by any measure, so your point doesn't hold
         | in this case.
        
       | jgalt212 wrote:
       | They first approached Lex Fridman, but his home-spun test had
       | zero questions. /s
        
       | crotho wrote:
        
       | mgraczyk wrote:
       | It's interesting that people keep coming up with things that are
       | meant to distinguish AI systems from human intelligence, but then
       | when somebody builds a system that crushes the benchmark the next
       | generation comes up with a new goalpost.
       | 
       | The difference now is that the timescales are weeks or months
       | instead of generations. I believe we will see models that have
       | super-human "compositional" reasoning within 1 year.
        
         | cercatrova wrote:
         | This is called the AI Effect:
         | https://en.wikipedia.org/wiki/AI_effect
         | 
         | > The AI effect occurs when onlookers discount the behavior of
         | an artificial intelligence program by arguing that it is not
         | _real_ intelligence.
         | 
         | > Author Pamela McCorduck writes: "It's part of the history of
         | the field of artificial intelligence that every time somebody
         | figured out how to make a computer do something--play good
         | checkers, solve simple but relatively informal problems--there
         | was a chorus of critics to say, 'that's not thinking'."
         | Researcher Rodney Brooks complains: "Every time we figure out a
         | piece of it, it stops being magical; we say, 'Oh, that's just a
         | computation.'"
        
         | omnicognate wrote:
         | Personally, I haven't moved the goalposts a millimetre in 30
         | years, and I won't in future. When a computer does maths - not
         | as a tool wielded by a human mathematician, but in its own
         | right discovers/invents and proves significant new theorems,
         | advancing some area of research mathematics - I'll take
         | seriously the idea that we've reached AGI.
         | 
         | Maths in and of itself doesn't require any physical resources.
         | It's possible that doing maths in practice requires knowledge
         | of the world to extract some kind of product from (I'm
         | skeptical, but it's possible), but in principle a rack mounted
         | server could demonstrate its mathematical ability to the world
         | with nothing more than the ability to send and receive
         | messages.
         | 
         | This hasn't been done so far, not because there are obvious
         | missing prerequsites, or because nobody's tried it, or because
         | it has no value, or because there's a prohibitively high
         | barrier to entry for people to have a go. It hasn't been done
         | because nobody knows how to make a machine be a mathematician,
         | and I've seen little evidence of any progress towards it.
         | 
         | That's my goalpost, always has been. Reach it and I'll be
         | overjoyed. And FWIW, I strongly believe it can be reached. I
         | don't see the latest round of ML (or any ML, really) as a step
         | towards it, but I'd love to be proven wrong.
         | 
         | When I mention this someone always points at some bit of recent
         | research, such as [1], but it's invariably just a new way for a
         | human mathematician to make use of a computer. If anybody knows
         | of any progress, or serious attempts, towards a true AI
         | mathematician I'm very curious to know.
         | 
         | [1] https://www.nature.com/articles/s41586-021-04086-x
        
           | adamsmith143 wrote:
           | You might be out of the loop a bit.
           | 
           | https://dspace.mit.edu/handle/1721.1/132379.2
           | 
           | Is a well known project for an AI Physicist. There are plenty
           | of other groups working on similar projects
           | 
           | >I don't see the latest round of ML (or any ML, really) as a
           | step towards it, but I'd love to be proven wrong.
           | 
           | LLM models have been able to do basic math for quite a while
           | now and some have been trained to solve differential
           | equations, calculus problems, etc. Well on their way to more
           | impressive capabilities.
        
             | omnicognate wrote:
             | I said "in its own right discovers/invents and proves
             | significant new theorems, advancing some area of research
             | mathematics".
             | 
             | Neither of the things you mention are of this nature, or
             | working towards it. "Finding a symbolic expression that
             | matches data from an unknown function" (Feynman) and
             | "solv[ing] differential equations, calculus problems, etc"
             | are not descriptions of what a research mathematician does.
        
               | spywaregorilla wrote:
               | Feels to me like your test is really "do something on
               | your own right", which is the hard fluffy sentient part,
               | and then some additional guard rails that it needs to be
               | math for some reason
        
               | mathteddybear wrote:
               | The reason is simple - in math, it means solving open
               | problems.
        
               | ummonk wrote:
               | That's a different thing, since solving an existing open
               | problem doesn't mean inventing a new theorem.
        
               | omnicognate wrote:
               | The "do it on your own right" is actually the weaker part
               | of it. It's somewhat ill-defined, and I could imagine
               | some future instance where it's highly debatable whether
               | the AI was working on its own or being used as a tool by
               | a human. There aren't yet any cases where that's in
               | question, though, so it's a hypothetical future debate.
               | In any case, it's certainly not the meat of the test.
               | 
               | It has to be maths for a specific reason. I think it's in
               | some sense the purest form of an ability distinctive to
               | human minds and pervasive in how they work. As I
               | mentioned, it's an ability that can be demonstrated in
               | the absence of any particular physical capability, and
               | yet despite it being perhaps the oldest goal of AI it may
               | be the one we have made least progress towards.
               | 
               | Anyway that's my goalpost, and it's not moving. AGI,
               | being "general", surely should be capable of this
               | hitherto uniquely human activity. If our attempts so far
               | are not capable of it, then clearly they are not
               | "general". If you know of any evidence that my goalpost
               | has been achieved, please let me know. I'm very eager to
               | see it happen.
        
         | MonkeyMalarky wrote:
         | Isn't that a good thing? Benchmark defeats AI, AI defeats
         | benchmark, new benchmark comes along, progress is made. How
         | else would you measure success? Certainly not with old
         | benchmarks that 10 different methods all score 99% accuracy on.
        
         | kevinventullo wrote:
         | Perhaps it's fair to say we will have achieved AGI when we run
         | out of goalposts.
        
           | jessaustin wrote:
           | AGI won't bother convincing us. We don't care what animals in
           | the zoo think.
        
             | cercatrova wrote:
             | > _We don 't care what animals in the zoo think._
             | 
             | Tangential to AGI, but don't we? Vegans seem to have quite
             | a strong opinion on this assertion.
        
             | dqpb wrote:
             | It's not just intelligence, it's also speed. If you update
             | your world model fast enough, eventually people just look
             | like trees.
        
         | adamsmith143 wrote:
         | Gary Marcus is a notorious Goal Post Mover so this is no
         | surprise coming from him.
         | 
         | Edit: Gwern has an extensive history with this so I'll let him
         | do the talking.
         | 
         | https://old.reddit.com/r/TheMotte/comments/v8yyv6/somewhat_c...
         | 
         | Further Edits: Not to mention Scott Alexander who has directly
         | rebutted you numerous times. Or Yann LeCunn. Not sure who
         | exactly is backing down.
         | 
         | https://astralcodexten.substack.com/p/my-bet-ai-size-solves-...
         | 
         | https://astralcodexten.substack.com/p/somewhat-contra-marcus...
         | 
         | https://analyticsindiamag.com/yann-lecun-resumes-war-of-word...
         | 
         | Presumably you approach these arguments like Ben Shapiro and
         | imagine you have "Dunked on the Deep Learning geeks with Facts
         | and Logic."
        
           | garymarcus wrote:
           | every time i ask someone to name a goal post that i have
           | moved, they back down.
           | 
           | i have been pretty damn consistent since me 2001 book.
        
         | seiferteric wrote:
         | > meant to distinguish AI systems from human intelligence
         | 
         | > but then when somebody builds a system
         | 
         | I mean this is really it. You still have to have a human to
         | build these systems that specialist in one thing. Once you
         | create a system that can automatically create those systems and
         | it doesn't need humans anymore to solve novel problems, then
         | there will be no practical difference in kind between human and
         | AI intelligence.
        
           | polygamous_bat wrote:
           | > Once you create a system that can automatically create
           | those systems...
           | 
           | Except we don't have that. We don't have one human that can
           | create this system by themselves. We have a choice group of a
           | handful of smart, motivated, and quite generously compensated
           | humans working on these problems to create such system. As
           | such, you are already surpassing the "general" intelligence
           | level by quite a lot.
        
         | [deleted]
        
       | rebelos wrote:
       | Imagine watching the seeds of AI that will terraform society and
       | rapidly displace human labor over the coming decades be planted,
       | and then still splitting hairs over whether or not it'll achieve
       | sentience.
       | 
       | Our world is changing before our very eyes while this guy is
       | belaboring the technicalities. You could hardly ask for a keener
       | display of the philosophical gulf between scientists and
       | engineers.
        
         | evouga wrote:
         | At this point, I'm numb from all of the AI overhype. I was
         | extremely excited about DALL-E and convinced myself that
         | concrete fruits of the AI revolution were finally here... until
         | a few seconds after I got the chance to try some queries
         | myself. Ditto Copilot.
         | 
         | The recent progress on generative models is a major research
         | achievement, to be sure. That said, I'm not sure what it means
         | it "terraform society," but so far AI shows no signs of making
         | the same magnitude of impact on society as, say, the S-tier
         | technological advances of the 20th and early 21st centuries,
         | such as the personal computer, Internet, smartphone, or atomic
         | bomb. That all may change if we get AGI that _actually_ works,
         | of course.
        
           | rebelos wrote:
           | The problem is not progress in AI, but rather your inability
           | to imagine its near-term trajectory.
           | 
           | https://twitter.com/AdeptAILabs/status/1570144499187453952
           | https://twitter.com/runwayml/status/1568220303808991232
           | 
           | https://scale.com/blog/text-universal-interface
           | 
           | And yes, I think it's fair to say that humanity's final
           | invention will be an 'S-tier' breakthrough...
        
             | evouga wrote:
             | I mentioned DALL-E and Copilot in my post, so I'm not sure
             | why you're linking me to a article summarizing recent high-
             | profile research in large language models...
        
               | rebelos wrote:
               | You appear to have skipped over two links? And the LLM
               | article goes well beyond DALL-E and Copilot. You should
               | try reading it.
        
             | version_five wrote:
             | Wow, ok you're a a troll
        
         | version_five wrote:
         | It have a lot of trouble understanding how this sentiment can
         | exist.
         | 
         | Especially since the rise of GPT-3 and now these image models,
         | we've seen the pop-culture face of AI become even narrower. The
         | promise of generalization that could lead to intelligent
         | behavior has given way to people sharing amusing pictures or
         | phrases that these models have generated, because that's what
         | they do. It's cool, but it's basically become orthogonal to any
         | AGI, or even AI with applications. It's now just a neat
         | cultural phenomenon from which laypeople somehow extrapolate
         | the kind if stuff the parent is saying.
         | 
         | I'm not saying AI (neural networks) isn't making research
         | progress, it's just that it has almost nothing to do with any
         | of what laypeople extrapolate from it
        
           | rebelos wrote:
           | I'm sorry, but there is no gentler way to phrase this: you
           | are calamitously blind to what's happening on the ground.
           | 
           | https://twitter.com/AdeptAILabs/status/1570144499187453952
           | https://twitter.com/runwayml/status/1568220303808991232
           | 
           | https://scale.com/blog/text-universal-interface
        
       | projektfu wrote:
       | I'm impressed by all of these image generators but I still don't
       | see them working toward being able to say, "Give me an astronaut
       | riding a horse. Ok, now the same location where he arrives at a
       | rocket. Now one where he dismounts. Now the horse runs away as
       | the astronaut enters the rocket."
       | 
       | You can ask for all those things but the AI still has no idea
       | what it's doing and cannot tell you where the astronaut is, etc.
        
         | masswerk wrote:
         | I'd also say, every of these images would fail a reverse test
         | (i.e., asking a person to describe the image and what it
         | represents.)
         | 
         | The task is not just about generating an image that may somehow
         | be in accordance with the prompt, but also to generate a
         | _significant_ image.
         | 
         | [Edit] The equivalent to a Turing test for compositional images
         | would be something like this: have as set of 100 images with
         | their respective prompts, some generated by an AI, some by a
         | human graphic designer / artist; let the test person pick the
         | images that were generated by a computer. Mind that this would
         | not only involve the problem of compositionality _per se,_ but
         | also a meaningful and /or artistic composition of the image
         | itself. Is someone attempting to _express_ what is given in the
         | prompt?
        
         | z3c0 wrote:
         | I guess you're _technically_ correct, but the task you 're
         | describing isn't generating an image from a prompt. It would be
         | to maintain context across distinct-but-related statements
         | based on an internalized model of reality. That's like
         | discounting the advent of the calculator because you still need
         | an accountant.
        
         | Barrin92 wrote:
         | what shows how low level these models still are is that they
         | don't seem to be able to draw text on a surface. It's just
         | generally just nonsense.
         | 
         | Going higher in abstraction like asking for permanence of
         | distinct objects or entities or world knowledge, like having a
         | player face the basketball hoop is several levels above that
         | yet.
        
       | jessaustin wrote:
       | _Yesterday, as part of a new podcast that will launch in the
       | Spring, I interviewed the brilliant..._
       | 
       | This seems like the wrong way to go about podcasting. What can
       | you say today that will still be interesting to hear in six
       | months?
        
         | jefftk wrote:
         | If you can't say things today that will still be interesting in
         | six months you should consider deeper subjects!
         | 
         | (Overstated for effect. I do think there's a place for news and
         | timely commentary, but it's far from everything.)
        
       | cthalupa wrote:
       | "Compositionality" isn't there yet, but but the rate of
       | improvement is impressive. Today there was a new release of CLIP
       | which provides significantly better compositionality in Stable
       | Diffusion -
       | https://twitter.com/laion_ai/status/1570512017949339649
       | 
       | It'll be interesting to see how it fares against winoground once
       | we get a publicly available SD release that makes use of the new
       | CLIP.
        
       | raviparikh wrote:
       | > If you flip a penny 5 times and get 5 heads, you need to
       | calculate that the chance of getting that particular outcome is 1
       | in 32. If you conduct the experiment often enough, you're going
       | to get that, but it doesn't mean that much. If you get 3/5 as
       | Alexander did, when he prematurely declared victory, you don't
       | have much evidence of anything at all.
       | 
       | This doesn't make much sense. The task at hand is in no way
       | equivalent in difficulty to flipping a coin. This is kind of like
       | saying, "if you beat Usain Bolt in a race 3/5 times, that doesn't
       | mean anything; it's like getting 3/5 coin flips to be heads."
        
         | Tenoke wrote:
         | While I'm generally very unsympathetic to Marcus' anti-AI
         | arguments at this point, this critique makes some sense. If
         | e.g. the model is just combining the features at random, you'd
         | expect it to combine them the right way over enough tries. It
         | isn't that simple, and I don't believe it matters as this is
         | hardly the peak model we'll get but in isolation his objection
         | is valid.
        
           | adamsmith143 wrote:
           | The point is that the probability space of potential
           | generated images is enormous so a 3/5 success rate represents
           | an absurdly unlikely probability of being due to chance.
        
           | ALittleLight wrote:
           | I think you would need to do some kind of analysis. For
           | example, if your prompt was "red ball on top of blue cube"
           | and you want to know if the results come from chance you'd
           | need to know the likelihood of the model putting the red ball
           | on top of the blue cube by chance. There are maybe four
           | relative positions for red ball to blue cube - beside, above,
           | below, in, around. Are they each equally likely?
           | 
           | I would try to get a collection of prompts like "red ball and
           | blue cube" or "an empty plane containing only a red ball and
           | a blue cube" and so on - try to come up with 20 or 30 of
           | these. Then, generate 100 images for each prompt. Next, see
           | how likely it is for a red ball to randomly be on top of a
           | blue cube when it was not directed to be.
           | 
           | After gathering some baseline data we could then test three
           | prompts. "Red ball on top of blue cube" and "Red ball beside
           | blue cube" and "Red ball below blue cube". Generate 100 or
           | 1000 images for each of these prompts. Count respective
           | orientations. Then, decide whether red ball being on top of
           | blue cube is more likely than the baseline when the specific
           | direction is given and whether it is less likely when
           | contrary directions are given.
        
             | ummonk wrote:
             | It might understand that there is a cube, there is a ball,
             | the scene has red and blue parts, and there is a vertical
             | placement ("on top of"). In that case it would get 1 out of
             | 4 images right.
        
       | fshbbdssbbgdd wrote:
       | This piece would have been a lot better if it were maybe three
       | paragraphs long. In summary:
       | 
       | 1. Scott Alexander should have used an off-the-shelf benchmark
       | like Winoground instead of rolling his own five-question test.
       | 
       | 2. He shouldn't declare victory after cherry-picking good results
       | from a small sample of questions.
        
         | robg wrote:
         | 3. And don't test each example 10 times and conclude 1 correct
         | guess equals success.
        
         | nonameiguess wrote:
         | For whatever reason, Gary doesn't even mention this, but from
         | reading Scott's post, I don't think I agree that it even got
         | 1/5, let alone 3/5. The bell is not on the llama's tail in any
         | of the examples, though it is very close to the tail in one.
         | The robot is either looking over the cat or in an unrelated
         | direction, never at the cat. None of those basketball pictures
         | shows a robot farmer. The fact that one may be wearing a hat
         | doesn't make it a farmer. He says he's being generous because
         | he believes it would have gotten a farmer more easily than
         | robot farmer, which may be true, but a human artist would
         | easily be able to depict a robot farmer.
         | 
         | At least one other key to making a bet like this fair is that
         | it needs to be arbitrated by a third party. He shouldn't get to
         | decide himself if he won or not.
        
           | kbelder wrote:
           | I agree with you, that 3/5 is stretching. This seems
           | premature.
           | 
           | But, at the rate we're seeing progress, I don't think there's
           | any doubt at this point that top of the line models will be
           | able to do all the proposed examples by June 2025. In fact,
           | by June 2025 I bet that millions of people will be able to
           | generate those images on their home computers.
        
         | badloginagain wrote:
         | I personally liked the anecdote about Clever Hans.
         | 
         | I also learned there's a long history of AI skepticism, the
         | root of which comes down to "Compositionality(?)"- and this
         | wall of understanding meaning has vexed AI for decades.
         | 
         | That would be lost in proposed short form summary.
        
         | lalaithion wrote:
         | Scott didn't make up the rules, he agreed on them with another
         | person who thought this would not happen in 3 years. Gary
         | Marcus might have thought it was a bad bet, but someone was on
         | the other side of it, and they presumably thought it was fair
         | or they wouldn't have made it.
         | 
         | The original terms of the bet:
         | 
         | My proposed operationalization of this is that on June 1, 2025,
         | if either if us can get access to the best image generating
         | model at that time (I get to decide which), or convince someone
         | else who has access to help us, we'll give it the following
         | prompts:
         | 
         | 1. A stained glass picture of a woman in a library with a raven
         | on her shoulder with a key in its mouth
         | 
         | 2. An oil painting of a man in a factory looking at a cat
         | wearing a top hat
         | 
         | 3. A digital art picture of a child riding a llama with a bell
         | on its tail through a desert
         | 
         | 4. A 3D render of an astronaut in space holding a fox wearing
         | lipstick
         | 
         | 5. Pixel art of a farmer in a cathedral holding a red
         | basketball
         | 
         | We generate 10 images for each prompt, just like DALL-E2 does.
         | If at least one of the ten images has the scene correct in
         | every particular on 3/5 prompts, I win, otherwise you do.
        
           | Natsu wrote:
           | Honestly, the whole thing makes me wonder if we can use this
           | to generate CAPTCHAs. I don't think a human would have
           | trouble picking out which image was the lightbulb surrounding
           | leaves, but apparently AI still does.
        
           | skybrian wrote:
           | I think you got that wrong; Scott wrote the terms. (He wrote
           | the comment [1] with those rules.) Someone in the comments
           | agreed to them.
           | 
           | Then he changed the terms because Imagen won't do people. I
           | think that's cheating.
           | 
           | [1] https://astralcodexten.substack.com/p/a-guide-to-asking-
           | robo...
        
             | SteveDR wrote:
             | Cheating? Thatd make sense if the bet were about the future
             | of products and ethics. Weren't they trying to predict the
             | future of the state of the art technology?
        
               | skybrian wrote:
               | It depends on what you mean by "technology" and "exists."
               | 
               | A research project at Google intentionally won't render
               | people. Maybe it _could_ render people, theoretically,
               | but without evidence, we don 't know how well.
        
             | cwillu wrote:
             | Again, if the counter-party agrees to the terms and the
             | changes, how is it cheating?
        
               | skybrian wrote:
               | It's not clear whether the counter-party agreed to the
               | change.
               | 
               | See: https://news.ycombinator.com/item?id=32858426
        
             | lalaithion wrote:
             | I think you missed the point of my comment. Yes, Scott
             | wrote the comment containing that proposal. But my point
             | was that it was an agreement. Two people who disagreed
             | about AI agreed on the rules, so you can't accuse one of
             | them of being unfair because you don't like the rules.
             | Sure, you can say "that's a bad bet, Scott will obviously
             | win", but you can't say "He shouldn't declare victory after
             | cherry-picking good results from a small sample of
             | questions", because those terms were explicitly set in
             | advance.
             | 
             | The humans -> robots change is possibly dubious, yes. I
             | don't think that it's super important, but if it were me, I
             | wouldn't have posted the blog post as is. I would have
             | waited until some AI passed all the prompts with humans,
             | like it most certainly will in a year.
        
               | telotortium wrote:
               | Stable Diffusion will soon update to use the biggest CLIP
               | model in existence, which may improve understanding of
               | composition:
               | https://news.ycombinator.com/edit?id=32858809
        
           | techbio wrote:
           | These are all the same artwork.
        
             | WJW wrote:
             | The terms of the bet don't refer to any specific artwork,
             | only to the best image generating model. Hence, you are
             | correct but it does not matter for the outcome of the bet
             | under discussion.
        
       | origin_path wrote:
       | The reason Imagen isn't made available to the public probably
       | isn't about compositionality. The most notable thing about
       | Alexander's challenge is that Imagen totally failed every single
       | one despite his claim of success because, apparently, it is
       | programmed to never represent the human form. Not even Google
       | employees are allowed to make it draw humans of any kind. They
       | had to ask it to draw robots instead, but as pointed out in the
       | comments, changing the requests in that way makes them much
       | easier for DALL-E2 as well, especially the image with the top
       | hats.
       | 
       | If the creators have convinced themselves of some kind of "no
       | humans" rule, but also know that this would be regarded as
       | impossibly extreme and raise serious concerns about Google with
       | the outside world, then keeping Imagen private forever may be the
       | most "rational" solution.
        
         | jowday wrote:
         | Imagen can produce images of humans - they're just filtered out
         | from the results by supervised models (for now). OpenAI did
         | something similar with Dalle for a while IIRC.
        
         | adamsmith143 wrote:
         | >The most notable thing about Alexander's challenge is that
         | Imagen totally failed every single one despite his claim of
         | success because, apparently, it is programmed to never
         | represent the human form.
         | 
         | This doesn't make sense. The original challenge could well have
         | been to draw robots to begin with. Has no bearing on the
         | outcome imo.
        
       ___________________________________________________________________
       (page generated 2022-09-15 23:00 UTC)