[HN Gopher] Did GoogleAI Just Snooker One of Silicon Valley's Sh... ___________________________________________________________________ Did GoogleAI Just Snooker One of Silicon Valley's Sharpest Minds? Author : TeacherTortoise Score : 119 points Date : 2022-09-15 19:32 UTC (3 hours ago) (HTM) web link (garymarcus.substack.com) (TXT) w3m dump (garymarcus.substack.com) | kache_ wrote: | oh no musk ignored my twitter DM it must be because he's scared | of taking a bet and therefore I am right | | btw, AGI is coming 2030. Source? It was revealed to me in a | dream. Check my profile to see where you can email to take bets. | [deleted] | i_like_apis wrote: | I wish more articles followed the standard essay format. At least | state your main thesis in the first paragraph. | | There are interesting things buried in here, but I don't have | time for rambling. | | The edge cases of image models have been more succinctly | summarized and speculated upon elsewhere. | version_five wrote: | Yes I've noticed that a lot of authors expect you to read | through some parable before they tell you what they are going | to tell you. It would be fine with an abstract or even a | sentence below the title that says "ML models are not being | adequately evaluated for composability and it makes them look | more intelligent than they are". Just diving into "consider | clever Hans" makes it tough to know if it's worth reading. | peteradio wrote: | One idea to try to train the AI about compositionality, feed it | Fox in Socks by Dr. Seuss. It's hard to understand that it would | misunderstand the meaning of "on" or "in" or "under" when there | are such nice illustrations. I've got tons of great ideas and I'm | open for hire! | birdyrooster wrote: | Train AI models, not children! | dekhn wrote: | is there a difference? | | I had kids and they were the best machine learnign systems | I've worked with. | [deleted] | version_five wrote: | This is such a good idea, someone please try this if you're set | up to make it happen easily. | | Starting with fox on Knox and Knox in box and moving up to a | tweedle beetle battle in a puddle in a bottle and the bottles | on a poodle and the poodles eating noodles... | | I dont see any evidence any of these models will draw it | correctly, but would love to see what it produces. | powera wrote: | I don't believe "compositionality" is a serious obstacle. | | It is a different issue than generating an image based on a bag- | of-words, so it isn't surprising that an attempt to solve that | issue didn't immediately solve the other. | | But a variety of approaches can easily solve this problem. | ummonk wrote: | Yes, especially when machine translation seems to handle it | just fine. | adamsmith143 wrote: | >I think he is so far I offered to bet him a $100,000 he was | wrong; enough of my colleagues agreed with me that within hours | they quintupled my bet, to $500,000. Musk didn't have the guts to | accept, which tells you a lot. | | What a bloviating egomaniac. Does Musk really have the time to | deal with pissant researchers like him? Whats 500k to a man worth | a hundred billion? | version_five wrote: | Yeah I didn't find that very credible. A busy businessman | ignoring petty bets you propose is not really evidence of | anything, nor is the part about google ignoring his requests. | In fact it's a pretty lame rhetorical device. I could equally | "challenge" a head of state on Twitter and then pretend that | his failure to reply indicates something | tambourine_man wrote: | > Musk didn't have the guts to accept, which tells you a lot. | | Musk actively declined the bet or did he simply not respond? | There is a big difference. | tambourine_man wrote: | Later in the text: | | > ... I have repeatedly asked that Google give the scientific | community access to Imagen. They have refused even to respond. | | It seems the author generally feels more entitled to a response | than he perhaps should. | arisAlexis wrote: | Missing the point: dismissing an apocalyptic possibility as 0 | without proof is dangerous -> therefore we should take it | seriously. Taleb's work is relevant in the concept of risk | analysis. | IshKebab wrote: | It's interesting that he now casually throws out a 5 year old as | the benchmark to beat: | | > nobody has yet publicly demonstrated a machine that can relate | the meanings of sentences to their parts the way a five-year-old | child can. | | Not very long ago that would have been a 3 year old, or maybe | even a smart 2 year old. 5 year olds are extremely good at basic | language and understanding tasks. If we get to the point of AI | that is as good as a 5 year old we're essentially at AGI. | ummonk wrote: | Yeah, and AI is probably already near primate level | intelligence, so what's left is a blink of an eye in | evolutionary timelines. | skybrian wrote: | Partially this is confusing "Scott Alexander won a bet" with | "compositionality is solved." And also, I'm not sure Scott won | the bet? Changing people to robots is a cheap trick. I think | Imagen should have been disqualified because it won't do people. | | Vitor took the other side of the bet and he is also not convinced | [1]: | | > I'm not conceding just yet, even though it feels like I'm just | dragging out the inevitable for a few months. Maybe we should | agree on a new set of prompts to get around the robot issue. | | > In retrospect, I think that your side of the bet is too lenient | in only requiring _one_ of the images to fulfill the prompt. I 'm | happy to leave that part standing as-is, of course, though I've | learned the lesson to be more careful about operationalization. | Overall, these images shift my priors a fair amount, but aren't | enough to change my fundamental view. | | Scott putting "I Won" in the headline when it's not resolved yet | seems somewhat dishonest, or more charitably wishful thinking. | | [1] https://astralcodexten.substack.com/p/i-won-my-three-year- | ai... | ummonk wrote: | This reminds me of the scandal where Youtube science channels did | glowing paid reviews of Waymo's self driving cars without | acknowledging they were paid for it. And technooptimists like | Scott Alexander or Ray Kurzweil have a common tendency to shift | the goalposts and declare they were right with their predictions. | Current AI certainly doesn't demonstrate proto-AGI capabilities. | | That said, we shouldn't miss the forest for the trees. We can be | skeptical that current The pace of AI progress has been immense | and problems that previously seemed difficult (e.g. computer | vision classification, or beating top players at Go) have fallen | one by one. And AI-skepticism's have themselves been moving the | goalposts in response. I see no reason why composition won't be | the same with time. Indeed, a decade ago machine translation used | to struggle to understand the relationships between things, but | now seems to be reliable at preserving the compositional | relationships post-translation. 2029 is rather optimistic, but | AGI does seem to be approaching in the coming few decades. | grandmczeb wrote: | > Youtube science channels did glowing paid reviews of Waymo's | self driving cars without acknowledging they were paid for it. | | Which video is this a reference to? | jeffbee wrote: | If you are referring to Veritasium's Waymo video, it says it is | sponsored content in the description above the fold and it has | the standard paid promotion notice right on top of the video as | soon as you open it. | | As far as I can tell the "controversy" over the video is merely | that one dedicated critic - so dedicated he made an hour-long | response to a 20-minute video - is committed to the idea that | machines won't ever be able to drive, and is irrationally angry | over the fact that machines can and do drive, and do it well. | | https://www.youtube.com/watch?v=yjztvddhZmI | garymarcus wrote: | so much ad hominem in these comments, relatively little | substance. (eg "notorious goal post move, without a single | example of something i actually said and changed my mind on) | ummonk wrote: | The Reddit comment linked by the topmost comment here says that | you claimed AI couldn't do knowledge graphs and then silently | stopped claiming that after being proven wrong. Do you dispute | that telling of events? | IronWolve wrote: | One of things I noticed is the satire, call backs to common | news/ideas can really trip up any AI. Also if you ask it about | anything politics, ask it to describe both sides of an argument. | Thus why people fall back to the steelman cherry picking of | responses to push their arguments. | ajross wrote: | So weird to see a piece ostensibly about logical fallacies deploy | one so cavalierly: | | > I offered to bet [Elon Musk] $100,000 he was wrong [about AGI | by 2029] [...] Musk didn't have the guts to accept, which tells | you a lot. | | The fact that you couldn't get someone engaged in a conversation | absolutely does not "tell you a lot" about the substance of your | argument. It only tells you that you were ignored. | | Now, I happen to think Marcus is right here and Musk is wrong, | but... yikes. That was just a jarring bit of writing. Either do | the detached professorial admonition schtick or take off the | gloves and engage in bad faith advocacy and personal attacks. | Both can be fun and get you eyeballs, and substack is filled with | both. But not at the same time! | comeonbro wrote: | Regarding Gary Marcus, the author of this piece, and his long and | bizarre history of motivated carelessness on the topic of deep | learning: | | https://old.reddit.com/r/TheMotte/comments/v8yyv6/somewhat_c... | [deleted] | Veedrac wrote: | This is mostly just an angry rant, yes, but equally it is just | true. Marcus is intellectually dishonest. | mmazing wrote: | "I am angry not because someone is wrong, but because they are | not interested in becoming less wrong." | | Paraphrased that a bit, but I really like that quote. | lisper wrote: | You know what would have been much more effective than this | counter-screed? A pointer to an image generated by DALL-E of a | horse riding an astronaut. That is something I would really | like to see. And in this case a picture is literally worth a | thousand words. | [deleted] | comeonbro wrote: | https://nitter.net/Plinz/status/1529013919682994176 | ummonk wrote: | Hah there is actually a good example of a horse riding an | astronaut there, just a different kind of riding... | https://nitter.net/Plinz/status/1529018578317348864#m | garymarcus wrote: | in fact I wrote a whole article about this (linked in this | essay, called Horse Rides Astronaut) and linked an example | therein. | _vertigo wrote: | Why did you ignore the Bach examples that show a horse | riding an astronaut? | [deleted] | im3w1l wrote: | I have played around with GPT quite a bit and I would say | that GPT understands the difference. Text-to-image models are | not specialized in the text-parsing part, so I think it's | forgivable that they are not as good at it. | | Edit: Actually I tried this right now with two prompts, and I | was wrong. It might still be that gpt understands | compositionality but the prior that people ride horses is | just that strong. But what I saw was that with this | particular situation the model got it wrong. | | Edit 2: With some heavy hinting it managed to understand the | situation. Italics mine. " _An astronaut is walking on all | four. A very small horse is sitting on top of him, riding him | even. Shortly after the astronaut stops, exhausted._ | | The horse is too heavy for the astronaut to carry and he | quickly becomes exhausted. _Next,_ the horse gets off the | astronaut, stands on its own four legs, and walks away. " | joe_the_user wrote: | Just about every rant on Marcus or other AI critics is some | combination of "you aren't admitting these things are making | great progress on the benchmarks" (implying the false idea that | "a whole lot of progress" adds up to human level AI) and "you | are making 'human level' an unfair moving target by not having | a benchmark for it". The thing about this is that if there was | a real "human level benchmark", we'd be 80% done but we can't | and we aren't. Marcus and other critics have drawn explicit | lines (spatial understanding, composibility, etc)but even those | being crossed won't prove human-level understanding. There is | no proof of human, just a strong enough demonstration. And if | someone can point to dumb stuff in the demo, it isn't strong. | | PS: your link is an embarrassment. It would be flagged and dead | if you pasted in the text here. | mtlmtlmtlmtl wrote: | First time I've seen the term "snooker" used outside of the sport | Snooker. | neaden wrote: | I completely forgot about Google Duplex. It looks like it is | still around but very limited in terms of what phones you can | use, what cities it can be used in, and what businesses in those | cities will accept it. Doesn't appear any progress has really | been made in the past few years. I think this is a great point of | how companies create something with AI that is initially really | cool, but isn't quite there to actually be very usable and gets | forgotten when they roll out the next big thing. | version_five wrote: | The last 10 years of AI is basically defined by proof of | concepts like that that were 80% (or whatever) solutions and | claimed there was a path to something commercially viable. | Turns out that ~20% is always basically impossible - self | driving cars being the archetypal example. I work in the field | and I think it can be a great tool, but it needs to be | acknowledged what its limitations are and how we don't actually | know how to address them yet | jeffbee wrote: | Now it seems like you are the one moving the goalposts. There | are tons of machine-learned models in production, in | translation, text segmentation, image segmentation, image | search, predictive text composition, etc. It's just that | people forget the novelty of all these things immediately | after they were launched. You can point your phone at printed | Chinese text and have it read aloud to you in English. That | is alien tech compared to 10 years ago. | [deleted] | Shebanator wrote: | The Hold for Me and Direct My Call features for Pixel's Phone | app both use Duplex models running locally on your device, and | those features are quite popular. I think that counts as | significant progress by any measure, so your point doesn't hold | in this case. | jgalt212 wrote: | They first approached Lex Fridman, but his home-spun test had | zero questions. /s | crotho wrote: | mgraczyk wrote: | It's interesting that people keep coming up with things that are | meant to distinguish AI systems from human intelligence, but then | when somebody builds a system that crushes the benchmark the next | generation comes up with a new goalpost. | | The difference now is that the timescales are weeks or months | instead of generations. I believe we will see models that have | super-human "compositional" reasoning within 1 year. | cercatrova wrote: | This is called the AI Effect: | https://en.wikipedia.org/wiki/AI_effect | | > The AI effect occurs when onlookers discount the behavior of | an artificial intelligence program by arguing that it is not | _real_ intelligence. | | > Author Pamela McCorduck writes: "It's part of the history of | the field of artificial intelligence that every time somebody | figured out how to make a computer do something--play good | checkers, solve simple but relatively informal problems--there | was a chorus of critics to say, 'that's not thinking'." | Researcher Rodney Brooks complains: "Every time we figure out a | piece of it, it stops being magical; we say, 'Oh, that's just a | computation.'" | omnicognate wrote: | Personally, I haven't moved the goalposts a millimetre in 30 | years, and I won't in future. When a computer does maths - not | as a tool wielded by a human mathematician, but in its own | right discovers/invents and proves significant new theorems, | advancing some area of research mathematics - I'll take | seriously the idea that we've reached AGI. | | Maths in and of itself doesn't require any physical resources. | It's possible that doing maths in practice requires knowledge | of the world to extract some kind of product from (I'm | skeptical, but it's possible), but in principle a rack mounted | server could demonstrate its mathematical ability to the world | with nothing more than the ability to send and receive | messages. | | This hasn't been done so far, not because there are obvious | missing prerequsites, or because nobody's tried it, or because | it has no value, or because there's a prohibitively high | barrier to entry for people to have a go. It hasn't been done | because nobody knows how to make a machine be a mathematician, | and I've seen little evidence of any progress towards it. | | That's my goalpost, always has been. Reach it and I'll be | overjoyed. And FWIW, I strongly believe it can be reached. I | don't see the latest round of ML (or any ML, really) as a step | towards it, but I'd love to be proven wrong. | | When I mention this someone always points at some bit of recent | research, such as [1], but it's invariably just a new way for a | human mathematician to make use of a computer. If anybody knows | of any progress, or serious attempts, towards a true AI | mathematician I'm very curious to know. | | [1] https://www.nature.com/articles/s41586-021-04086-x | adamsmith143 wrote: | You might be out of the loop a bit. | | https://dspace.mit.edu/handle/1721.1/132379.2 | | Is a well known project for an AI Physicist. There are plenty | of other groups working on similar projects | | >I don't see the latest round of ML (or any ML, really) as a | step towards it, but I'd love to be proven wrong. | | LLM models have been able to do basic math for quite a while | now and some have been trained to solve differential | equations, calculus problems, etc. Well on their way to more | impressive capabilities. | omnicognate wrote: | I said "in its own right discovers/invents and proves | significant new theorems, advancing some area of research | mathematics". | | Neither of the things you mention are of this nature, or | working towards it. "Finding a symbolic expression that | matches data from an unknown function" (Feynman) and | "solv[ing] differential equations, calculus problems, etc" | are not descriptions of what a research mathematician does. | spywaregorilla wrote: | Feels to me like your test is really "do something on | your own right", which is the hard fluffy sentient part, | and then some additional guard rails that it needs to be | math for some reason | mathteddybear wrote: | The reason is simple - in math, it means solving open | problems. | ummonk wrote: | That's a different thing, since solving an existing open | problem doesn't mean inventing a new theorem. | omnicognate wrote: | The "do it on your own right" is actually the weaker part | of it. It's somewhat ill-defined, and I could imagine | some future instance where it's highly debatable whether | the AI was working on its own or being used as a tool by | a human. There aren't yet any cases where that's in | question, though, so it's a hypothetical future debate. | In any case, it's certainly not the meat of the test. | | It has to be maths for a specific reason. I think it's in | some sense the purest form of an ability distinctive to | human minds and pervasive in how they work. As I | mentioned, it's an ability that can be demonstrated in | the absence of any particular physical capability, and | yet despite it being perhaps the oldest goal of AI it may | be the one we have made least progress towards. | | Anyway that's my goalpost, and it's not moving. AGI, | being "general", surely should be capable of this | hitherto uniquely human activity. If our attempts so far | are not capable of it, then clearly they are not | "general". If you know of any evidence that my goalpost | has been achieved, please let me know. I'm very eager to | see it happen. | MonkeyMalarky wrote: | Isn't that a good thing? Benchmark defeats AI, AI defeats | benchmark, new benchmark comes along, progress is made. How | else would you measure success? Certainly not with old | benchmarks that 10 different methods all score 99% accuracy on. | kevinventullo wrote: | Perhaps it's fair to say we will have achieved AGI when we run | out of goalposts. | jessaustin wrote: | AGI won't bother convincing us. We don't care what animals in | the zoo think. | cercatrova wrote: | > _We don 't care what animals in the zoo think._ | | Tangential to AGI, but don't we? Vegans seem to have quite | a strong opinion on this assertion. | dqpb wrote: | It's not just intelligence, it's also speed. If you update | your world model fast enough, eventually people just look | like trees. | adamsmith143 wrote: | Gary Marcus is a notorious Goal Post Mover so this is no | surprise coming from him. | | Edit: Gwern has an extensive history with this so I'll let him | do the talking. | | https://old.reddit.com/r/TheMotte/comments/v8yyv6/somewhat_c... | | Further Edits: Not to mention Scott Alexander who has directly | rebutted you numerous times. Or Yann LeCunn. Not sure who | exactly is backing down. | | https://astralcodexten.substack.com/p/my-bet-ai-size-solves-... | | https://astralcodexten.substack.com/p/somewhat-contra-marcus... | | https://analyticsindiamag.com/yann-lecun-resumes-war-of-word... | | Presumably you approach these arguments like Ben Shapiro and | imagine you have "Dunked on the Deep Learning geeks with Facts | and Logic." | garymarcus wrote: | every time i ask someone to name a goal post that i have | moved, they back down. | | i have been pretty damn consistent since me 2001 book. | seiferteric wrote: | > meant to distinguish AI systems from human intelligence | | > but then when somebody builds a system | | I mean this is really it. You still have to have a human to | build these systems that specialist in one thing. Once you | create a system that can automatically create those systems and | it doesn't need humans anymore to solve novel problems, then | there will be no practical difference in kind between human and | AI intelligence. | polygamous_bat wrote: | > Once you create a system that can automatically create | those systems... | | Except we don't have that. We don't have one human that can | create this system by themselves. We have a choice group of a | handful of smart, motivated, and quite generously compensated | humans working on these problems to create such system. As | such, you are already surpassing the "general" intelligence | level by quite a lot. | [deleted] | rebelos wrote: | Imagine watching the seeds of AI that will terraform society and | rapidly displace human labor over the coming decades be planted, | and then still splitting hairs over whether or not it'll achieve | sentience. | | Our world is changing before our very eyes while this guy is | belaboring the technicalities. You could hardly ask for a keener | display of the philosophical gulf between scientists and | engineers. | evouga wrote: | At this point, I'm numb from all of the AI overhype. I was | extremely excited about DALL-E and convinced myself that | concrete fruits of the AI revolution were finally here... until | a few seconds after I got the chance to try some queries | myself. Ditto Copilot. | | The recent progress on generative models is a major research | achievement, to be sure. That said, I'm not sure what it means | it "terraform society," but so far AI shows no signs of making | the same magnitude of impact on society as, say, the S-tier | technological advances of the 20th and early 21st centuries, | such as the personal computer, Internet, smartphone, or atomic | bomb. That all may change if we get AGI that _actually_ works, | of course. | rebelos wrote: | The problem is not progress in AI, but rather your inability | to imagine its near-term trajectory. | | https://twitter.com/AdeptAILabs/status/1570144499187453952 | https://twitter.com/runwayml/status/1568220303808991232 | | https://scale.com/blog/text-universal-interface | | And yes, I think it's fair to say that humanity's final | invention will be an 'S-tier' breakthrough... | evouga wrote: | I mentioned DALL-E and Copilot in my post, so I'm not sure | why you're linking me to a article summarizing recent high- | profile research in large language models... | rebelos wrote: | You appear to have skipped over two links? And the LLM | article goes well beyond DALL-E and Copilot. You should | try reading it. | version_five wrote: | Wow, ok you're a a troll | version_five wrote: | It have a lot of trouble understanding how this sentiment can | exist. | | Especially since the rise of GPT-3 and now these image models, | we've seen the pop-culture face of AI become even narrower. The | promise of generalization that could lead to intelligent | behavior has given way to people sharing amusing pictures or | phrases that these models have generated, because that's what | they do. It's cool, but it's basically become orthogonal to any | AGI, or even AI with applications. It's now just a neat | cultural phenomenon from which laypeople somehow extrapolate | the kind if stuff the parent is saying. | | I'm not saying AI (neural networks) isn't making research | progress, it's just that it has almost nothing to do with any | of what laypeople extrapolate from it | rebelos wrote: | I'm sorry, but there is no gentler way to phrase this: you | are calamitously blind to what's happening on the ground. | | https://twitter.com/AdeptAILabs/status/1570144499187453952 | https://twitter.com/runwayml/status/1568220303808991232 | | https://scale.com/blog/text-universal-interface | projektfu wrote: | I'm impressed by all of these image generators but I still don't | see them working toward being able to say, "Give me an astronaut | riding a horse. Ok, now the same location where he arrives at a | rocket. Now one where he dismounts. Now the horse runs away as | the astronaut enters the rocket." | | You can ask for all those things but the AI still has no idea | what it's doing and cannot tell you where the astronaut is, etc. | masswerk wrote: | I'd also say, every of these images would fail a reverse test | (i.e., asking a person to describe the image and what it | represents.) | | The task is not just about generating an image that may somehow | be in accordance with the prompt, but also to generate a | _significant_ image. | | [Edit] The equivalent to a Turing test for compositional images | would be something like this: have as set of 100 images with | their respective prompts, some generated by an AI, some by a | human graphic designer / artist; let the test person pick the | images that were generated by a computer. Mind that this would | not only involve the problem of compositionality _per se,_ but | also a meaningful and /or artistic composition of the image | itself. Is someone attempting to _express_ what is given in the | prompt? | z3c0 wrote: | I guess you're _technically_ correct, but the task you 're | describing isn't generating an image from a prompt. It would be | to maintain context across distinct-but-related statements | based on an internalized model of reality. That's like | discounting the advent of the calculator because you still need | an accountant. | Barrin92 wrote: | what shows how low level these models still are is that they | don't seem to be able to draw text on a surface. It's just | generally just nonsense. | | Going higher in abstraction like asking for permanence of | distinct objects or entities or world knowledge, like having a | player face the basketball hoop is several levels above that | yet. | jessaustin wrote: | _Yesterday, as part of a new podcast that will launch in the | Spring, I interviewed the brilliant..._ | | This seems like the wrong way to go about podcasting. What can | you say today that will still be interesting to hear in six | months? | jefftk wrote: | If you can't say things today that will still be interesting in | six months you should consider deeper subjects! | | (Overstated for effect. I do think there's a place for news and | timely commentary, but it's far from everything.) | cthalupa wrote: | "Compositionality" isn't there yet, but but the rate of | improvement is impressive. Today there was a new release of CLIP | which provides significantly better compositionality in Stable | Diffusion - | https://twitter.com/laion_ai/status/1570512017949339649 | | It'll be interesting to see how it fares against winoground once | we get a publicly available SD release that makes use of the new | CLIP. | raviparikh wrote: | > If you flip a penny 5 times and get 5 heads, you need to | calculate that the chance of getting that particular outcome is 1 | in 32. If you conduct the experiment often enough, you're going | to get that, but it doesn't mean that much. If you get 3/5 as | Alexander did, when he prematurely declared victory, you don't | have much evidence of anything at all. | | This doesn't make much sense. The task at hand is in no way | equivalent in difficulty to flipping a coin. This is kind of like | saying, "if you beat Usain Bolt in a race 3/5 times, that doesn't | mean anything; it's like getting 3/5 coin flips to be heads." | Tenoke wrote: | While I'm generally very unsympathetic to Marcus' anti-AI | arguments at this point, this critique makes some sense. If | e.g. the model is just combining the features at random, you'd | expect it to combine them the right way over enough tries. It | isn't that simple, and I don't believe it matters as this is | hardly the peak model we'll get but in isolation his objection | is valid. | adamsmith143 wrote: | The point is that the probability space of potential | generated images is enormous so a 3/5 success rate represents | an absurdly unlikely probability of being due to chance. | ALittleLight wrote: | I think you would need to do some kind of analysis. For | example, if your prompt was "red ball on top of blue cube" | and you want to know if the results come from chance you'd | need to know the likelihood of the model putting the red ball | on top of the blue cube by chance. There are maybe four | relative positions for red ball to blue cube - beside, above, | below, in, around. Are they each equally likely? | | I would try to get a collection of prompts like "red ball and | blue cube" or "an empty plane containing only a red ball and | a blue cube" and so on - try to come up with 20 or 30 of | these. Then, generate 100 images for each prompt. Next, see | how likely it is for a red ball to randomly be on top of a | blue cube when it was not directed to be. | | After gathering some baseline data we could then test three | prompts. "Red ball on top of blue cube" and "Red ball beside | blue cube" and "Red ball below blue cube". Generate 100 or | 1000 images for each of these prompts. Count respective | orientations. Then, decide whether red ball being on top of | blue cube is more likely than the baseline when the specific | direction is given and whether it is less likely when | contrary directions are given. | ummonk wrote: | It might understand that there is a cube, there is a ball, | the scene has red and blue parts, and there is a vertical | placement ("on top of"). In that case it would get 1 out of | 4 images right. | fshbbdssbbgdd wrote: | This piece would have been a lot better if it were maybe three | paragraphs long. In summary: | | 1. Scott Alexander should have used an off-the-shelf benchmark | like Winoground instead of rolling his own five-question test. | | 2. He shouldn't declare victory after cherry-picking good results | from a small sample of questions. | robg wrote: | 3. And don't test each example 10 times and conclude 1 correct | guess equals success. | nonameiguess wrote: | For whatever reason, Gary doesn't even mention this, but from | reading Scott's post, I don't think I agree that it even got | 1/5, let alone 3/5. The bell is not on the llama's tail in any | of the examples, though it is very close to the tail in one. | The robot is either looking over the cat or in an unrelated | direction, never at the cat. None of those basketball pictures | shows a robot farmer. The fact that one may be wearing a hat | doesn't make it a farmer. He says he's being generous because | he believes it would have gotten a farmer more easily than | robot farmer, which may be true, but a human artist would | easily be able to depict a robot farmer. | | At least one other key to making a bet like this fair is that | it needs to be arbitrated by a third party. He shouldn't get to | decide himself if he won or not. | kbelder wrote: | I agree with you, that 3/5 is stretching. This seems | premature. | | But, at the rate we're seeing progress, I don't think there's | any doubt at this point that top of the line models will be | able to do all the proposed examples by June 2025. In fact, | by June 2025 I bet that millions of people will be able to | generate those images on their home computers. | badloginagain wrote: | I personally liked the anecdote about Clever Hans. | | I also learned there's a long history of AI skepticism, the | root of which comes down to "Compositionality(?)"- and this | wall of understanding meaning has vexed AI for decades. | | That would be lost in proposed short form summary. | lalaithion wrote: | Scott didn't make up the rules, he agreed on them with another | person who thought this would not happen in 3 years. Gary | Marcus might have thought it was a bad bet, but someone was on | the other side of it, and they presumably thought it was fair | or they wouldn't have made it. | | The original terms of the bet: | | My proposed operationalization of this is that on June 1, 2025, | if either if us can get access to the best image generating | model at that time (I get to decide which), or convince someone | else who has access to help us, we'll give it the following | prompts: | | 1. A stained glass picture of a woman in a library with a raven | on her shoulder with a key in its mouth | | 2. An oil painting of a man in a factory looking at a cat | wearing a top hat | | 3. A digital art picture of a child riding a llama with a bell | on its tail through a desert | | 4. A 3D render of an astronaut in space holding a fox wearing | lipstick | | 5. Pixel art of a farmer in a cathedral holding a red | basketball | | We generate 10 images for each prompt, just like DALL-E2 does. | If at least one of the ten images has the scene correct in | every particular on 3/5 prompts, I win, otherwise you do. | Natsu wrote: | Honestly, the whole thing makes me wonder if we can use this | to generate CAPTCHAs. I don't think a human would have | trouble picking out which image was the lightbulb surrounding | leaves, but apparently AI still does. | skybrian wrote: | I think you got that wrong; Scott wrote the terms. (He wrote | the comment [1] with those rules.) Someone in the comments | agreed to them. | | Then he changed the terms because Imagen won't do people. I | think that's cheating. | | [1] https://astralcodexten.substack.com/p/a-guide-to-asking- | robo... | SteveDR wrote: | Cheating? Thatd make sense if the bet were about the future | of products and ethics. Weren't they trying to predict the | future of the state of the art technology? | skybrian wrote: | It depends on what you mean by "technology" and "exists." | | A research project at Google intentionally won't render | people. Maybe it _could_ render people, theoretically, | but without evidence, we don 't know how well. | cwillu wrote: | Again, if the counter-party agrees to the terms and the | changes, how is it cheating? | skybrian wrote: | It's not clear whether the counter-party agreed to the | change. | | See: https://news.ycombinator.com/item?id=32858426 | lalaithion wrote: | I think you missed the point of my comment. Yes, Scott | wrote the comment containing that proposal. But my point | was that it was an agreement. Two people who disagreed | about AI agreed on the rules, so you can't accuse one of | them of being unfair because you don't like the rules. | Sure, you can say "that's a bad bet, Scott will obviously | win", but you can't say "He shouldn't declare victory after | cherry-picking good results from a small sample of | questions", because those terms were explicitly set in | advance. | | The humans -> robots change is possibly dubious, yes. I | don't think that it's super important, but if it were me, I | wouldn't have posted the blog post as is. I would have | waited until some AI passed all the prompts with humans, | like it most certainly will in a year. | telotortium wrote: | Stable Diffusion will soon update to use the biggest CLIP | model in existence, which may improve understanding of | composition: | https://news.ycombinator.com/edit?id=32858809 | techbio wrote: | These are all the same artwork. | WJW wrote: | The terms of the bet don't refer to any specific artwork, | only to the best image generating model. Hence, you are | correct but it does not matter for the outcome of the bet | under discussion. | origin_path wrote: | The reason Imagen isn't made available to the public probably | isn't about compositionality. The most notable thing about | Alexander's challenge is that Imagen totally failed every single | one despite his claim of success because, apparently, it is | programmed to never represent the human form. Not even Google | employees are allowed to make it draw humans of any kind. They | had to ask it to draw robots instead, but as pointed out in the | comments, changing the requests in that way makes them much | easier for DALL-E2 as well, especially the image with the top | hats. | | If the creators have convinced themselves of some kind of "no | humans" rule, but also know that this would be regarded as | impossibly extreme and raise serious concerns about Google with | the outside world, then keeping Imagen private forever may be the | most "rational" solution. | jowday wrote: | Imagen can produce images of humans - they're just filtered out | from the results by supervised models (for now). OpenAI did | something similar with Dalle for a while IIRC. | adamsmith143 wrote: | >The most notable thing about Alexander's challenge is that | Imagen totally failed every single one despite his claim of | success because, apparently, it is programmed to never | represent the human form. | | This doesn't make sense. The original challenge could well have | been to draw robots to begin with. Has no bearing on the | outcome imo. ___________________________________________________________________ (page generated 2022-09-15 23:00 UTC)