[HN Gopher] The Bitter Lesson (2019) ___________________________________________________________________ The Bitter Lesson (2019) Author : radkapital Score : 95 points Date : 2020-07-09 15:37 UTC (7 hours ago) (HTM) web link (incompleteideas.net) (TXT) w3m dump (incompleteideas.net) | kdoherty wrote: | Potentially also of interest is Rod Brooks' response "A Better | Lesson" (2019): https://rodneybrooks.com/a-better-lesson/ | ksdale wrote: | I think it's plausible that many technological advances follow a | similar. Something like the steam engine is a step-improvement, | but many of the subsequent improvements are basically the obvious | next step, implemented once steel is strong enough, or machining | precise enough, or fuel is refined enough. How many times has the | world changed qualitatively, simply in the pursuit of making | things quantitatively bigger or faster or stronger? | | I can certainly see how it could be considered disappointing that | pure intellect and creativity doesn't always win out, but I, | personally, don't think it's bitter. | | I also have a pet theory that the first AGI will actually be | 10,000 very simple algorithms/sensors/APIs duct-taped together | running on ridiculously powerful equipment rather than any sort | of elegant Theory of Everything, and this wild conjecture may | make me less likely to think this a bitter lesson... | throwaway7281 wrote: | This reminds me of the Banko and Brill paper "Scaling to very | very large corpora for natural language disambiguation" - | https://dl.acm.org/doi/10.3115/1073012.1073017. | | It is exactly the point and it is something not a lot of | researchers really grok. As a researcher you are so smart, why | can't you discover whatever you are seeking? I think in this | decade, we see a couple more scientific discoveries by brute | force which will hopefully will make the scientific type a bit | more humble an honest. | KKKKkkkk1 wrote: | Today Elon Musk announced that Tesla is going to reach level-5 | autonomy by the end of the year. Specifically | | _There are no fundamental challenges remaining for level-5 | autonomy. There are many small problems. And then there 's the | challenge of solving all those small problems and then putting | the whole system together._ [0] | | I feel like this year is going to be another year in which the | proponents of brute-force AI like Elon and Sutton will learn a | bitter lesson. | | [0] https://twitter.com/yicaichina/status/1281149226659901441 | typon wrote: | Elon Musk announcing something doesn't make it true | vlmutolo wrote: | It's funny when you've been thinking for months about how speech | recognition could really benefit from integrating models of the | human vocal tract... | | and then you read this | sqrt17 wrote: | Here's a thing: incorrect assumptions that are built into a | model are more harmful than a model that assumes too little | structure. If you model the vocal tract and the actual exciting | things are the transient noises that occur when we produce | consonants, at best there's lots of work with not much to show | and at worst you're limiting your model in a negative way. | That's the basis for the "every time we fired a linguist, | recognition rates improved" from 90s speech recognition. | | On the other end of the spectrum, data and compute ARE limited | and for some tasks we're at a point where the model eats up all | the humanity's written works and a couple million dollars in | compute and further progress has to come from elsewhere because | even large companies won't spend billions of dollars in compute | and humanity will not suddenly write ten times more blog | articles. | visarga wrote: | I think we're far from having used all the media on the | internet to train a model. GPT-3 used about 570GB of text | (about 50M articles). ImageNet is just 1.5M photos. It's | still expensive to ingest the whole YouTube, Google Search | and Google Photos in a single model. | | And the nice thing about these large models is that you can | reuse them with little fine-tuning for all sorts of other | tasks. So the industry and any hacker can benefit from these | uber-models without having to retrain from scratch. Of | course, if they even fit the hardware available, otherwise | they have to make due with a slightly lower performance. | gwern wrote: | "Every time I fire an anatomist and hire a TPU pod, my WER | halves." | PeterisP wrote: | I think that your particular example is very relevant. | | Of course a good speech recognition system needs to model all | the relevant characteristics of the human vocal tract as such, | and of the many different vocal tracts of individual humans! | | But this is substantially different from the notion of | integrating a _human-made_ model of the human vocal tract. | | In this case the bitter lesson (which, as far as I understand, | does apply to vocal tract modeling - I don't personally work on | speech recognition but colleagues a few doors down do) is that | if you start with some data about human voice and biology; you | develop some explicit model M, and then integrate it into your | system, then it does not work as well if you properly design a | system that will learn speech recognition on the whole, | _learning_ an implicit model M ' of the relevant properties of | the vocal tract (and the distribution of these properties in | different vocal tracts) as a byproduct of that, given | sufficient data. | | A hypothesis (which does need more research to be demonstrated, | though, but we have some empirical evidence for similar things | in most aspects of NLP) on the reason for this is that the | human-made model M _can 't_ be as good as the learned model | because it's restricted by the need to be understandable by | humans. It's simplified and regularized and limited in size so | that it can be reasonably developed, described, analyzed and | discussed by humans - but there's no reason to suppose that the | ideal model that would perfectly match reality is simple enough | for that; it may well be reducible to a parameteric function | that simply has too many parameters to be neatly summarizable | to a human-understandable size without simplifying in ways that | cost accuracy. | ruuda wrote: | A slightly more recent post, that really opened my eyes to this | insight (and references The Bitter Lesson) is this piece by Gwern | on the scaling hypothesis: | https://www.gwern.net/newsletter/2020/05#gpt-3 | mtgp1000 wrote: | >We want AI agents that can discover like we can, not which | contain what we have discovered. Building in our discoveries only | makes it harder to see how the discovering process can be done. | | I think these lessons are less appropriate as our hardware and | our understanding of neural networks improve. An agent which is | able to [self] learn complex probabilistic relationships between | inputs and outputs (i.e. heuristics) requires a minimum | complexity/performance, both in hardware and neural network | design, before any sort of useful[self] learning is possible. | We've only recently crossed that threshold (5-10 years ago) | | >The biggest lesson that can be read from 70 years of AI research | is that general methods that leverage computation are ultimately | the most effective, and by a large margin | | Admittedly, I'm not quite sure of the author's point. They seem | to indicate that there is a trade-off between spending time | optimizing the architecture and baking in human knowledge. | | If that's the case, I would argue that there is an impending | perspective shift in the field of ML, wherein "human knowledge" | is not something to hardcode explicitly, but instead is | implicitly delivered through a combination of appropriate data | curation and design of neural networks which are primed to learn | certain relationships. | | That's the future and we're just collectively starting down that | path - it will take some time for the relevant human knowledge to | accumulate. | lambdatronics wrote: | TL;DR: AI needs a hand up, not a handout. "We want AI agents that | can discover like we can, not which contain what we have | discovered." I was internally protesting all the way through the | note, until I got to that penultimate sentence. | rbecker wrote: | Yeah, it takes a careful, charitable reading to not interpret | it as "don't bother with understanding or finding new methods, | just throw more FLOPS at it". | francoisp wrote: | building a model for and with domain knowledge == premature | optimization? In the end a win on kaggle or a published paper | seems to depend on tweaking hyperparameters based on even more | pointed DK: data set knowledge... | | I wonder what would be required to build a model that explores | the search space of compilable programs in say python that sorts | in correct order. Applying this idea of using ML techniques to | finding better "thinking" blocks for silicon seems promising. | astrophysician wrote: | I think what he's basically saying is that priors (i.e. domain | knowledge + custom, domain-inspired models) help when you're data | limited or when your data is very biased, but once that's not the | case (e.g. we have an infinite supply of voice samples), model | capacity is usually all that matters. | maest wrote: | For contrast, take this Hofstadter quote: | | > This, then, is the trillion-dollar question: Will the approach | undergirding AI today--an approach that borrows little from the | mind, that's grounded instead in big data and big engineering-- | get us to where we want to go? How do you make a search engine | that understands if you don't know how you understand? Perhaps, | as Russell and Norvig politely acknowledge in the last chapter of | their textbook, in taking its practical turn, AI has become too | much like the man who tries to get to the moon by climbing a | tree: "One can report steady progress, all the way to the top of | the tree." | | My take is that there is something intelectually unsatisfying | about solving a problem by simply throwing more computational | power at it, instead of trying to understand it better. | | Imagine in a parallel universe where computational power is | extremely cheap. In this universe, people solve integrals | exclusively by numerical integrations so there is no incentive to | develop any of the Analysis theory we currently have. I would | expect that to be a net negative in the long run as theories like | Gen Relativity would be almost impossible to develop without the | current mathematical apparatus. | YeGoblynQueenne wrote: | Where is this quote from, please? | | To play devil's advocate, I think retort to your comment about | "intellectually satisfying" methods is "yeah, but, they work". | And in any case, "intellectually satisfying" doesn't have a | formal definition in computer science or AI so it can't very | well be a goal, as such. | | My own concern is exactly what Russel & Norvig seem to say in | Hofstadter's comment: by spending all our resources on clmbing | the tallest trees to get to the moon, we're falling behind from | our goal, of ever getting to the moon. That's even more so if | the goal is to use AI to understand our own mind, rather than | to beat a bunch of benchmarks. | self wrote: | The quote is from this article: | | https://www.theatlantic.com/magazine/archive/2013/11/the- | man... | totally_a_human wrote: | This page seems to be down. Is there a mirror? | aszen wrote: | Interesting, I wonder what happens now that Moore's law is | considered dead and we can't rely on computation power increasing | year over year. To make further progess with general purpose | search and learning methods we will need lots more computational | power which may not be cheaply available. Then do we focus our | efforts on developing more efficient learning strategies like the | one we have in our minds ? | | I do agree with the part about not embedding human knowledge into | our computer models, any knowledge worth learning about any | domain the computer should be able learn on its own to make true | progress in AI. | PeterisP wrote: | Can you elaborate why you think that Moore's law is considered | dead? For me it seems that the general progress for the | computing hardware in question (GPUs and specialized ASICs, not | consumer CPUs) we're still seeing steady improvements in | transistors/$ and flops/$ and expect it to still continue for | some time at least. | aszen wrote: | Yes specialized hardware for AI are seeing steady | improvements, I'm curious if these improvements rely on the | particulars of the algorithms running on these machines. As | an example several of the AI chips use lower precision | floating point numbers than general CPUs since the algorithms | in use for training nns don't need the higher precision. | | I actually wonder if having specialized AI hardware isn't the | same problem as having specialized AI models, that is in the | short term it will improve efficiency but in the long run | prevent discovery of newer general learning strategies | because they won't run faster in existing specialized | hardware. | abetusk wrote: | Moore's law might be dead but the deeper law is still alive. | | Moore's law is technically "the number of transistors per unit | area doubles every 24 months" [1]. The more important law is | that the cost of transistors halves every 18-24 months. | | That is, Moore's law talks about how many transistors we can | pack into a unit area. The deeper issue is _how much it costs_. | If we can only pack in a certain amount transistors per area | but the cost drops exponentially, we still see massive gains. | | There's also Wright's law that comes into play [3] that talks | about dropping exponential costs just from institutional | knowledge (2x in production leads to (.75-.9)x in cost). | | [1] https://en.wikipedia.org/wiki/Moore%27s_law | | [2] https://www.youtube.com/watch?v=Nb2tebYAaOA | | [3] https://en.wikipedia.org/wiki/Experience_curve_effects | aszen wrote: | Agreed the cost aspect of Moore's law may continue to remain | true, especially with chiplets with varying fab nodes and 3d | architectures. Wright's law will also bring down costs as | lower nm nodes mature. | | But as mentioned in the comments below ai model training is | increasing exponentially (compute required to train models | has been doubling every 3.6 months) so it still far outstrips | the cost savings. | noanabeshima wrote: | The amount of compute used in the largest AI training runs has | been exponentially growing: | | https://openai.com/blog/ai-and-compute/ | | The amount of compute required for Imagenet classification has | been exponentially decreasing: | | https://openai.com/blog/ai-and-efficiency/ | aszen wrote: | Very interesting links, thanks for sharing. | | So the trend isn't changing we still need bigger models to | make progress in NLP and CV, while the algorithmic | effeciencies are promising but they aren't giving anywhere | near the same improvements as larger models. | | I'm curious how long this trend will continue and if there's | anything promising that can reverse this trend | PeterisP wrote: | IMHO the main thing that determines this trend is whether | the results are _good enough_. For the most part, there 's | only some overlap between the people who work on better | results and people who work on more efficient results, | those research directions are driven by different needs and | thus also tend to happen in different institutions. | | As long as our proof of concept solutions don't yet solve | the task appropriately, as long as the solution is weak | and/or brittle and worse than what we need for the main | partical applications, most of the research focus - and the | research progress - will be on models that try and give | better results. It makes sense to disregarding the compute | cost and other impractical inconveniences when working on | pushing the bleeding edge, trying to make the previously | impossible things possible | | However, when tasks are "solved" from the academic proof- | of-concept perspective, then generally the practical, | applied work on model efficiency can get huge reductions in | computing power required. But that happens _elsewhere_. | | The concept of technology readiness level | (https://en.wikipedia.org/wiki/Technology_readiness_level) | is relevant. For the NLP and CV technologies that are in | TRL 3 or 4, the efficiency does not really matter as long | as it fits in whatever computing clusters you can afford; | this is mainly an issue for the widespread adoption of some | tech in industry by the time the same tech is in TRL 6 or | so, and this work mostly gets done by different people in | different organizations with different funding sources than | the initial TRL 3 research. | aglionby wrote: | My background is in NLP - I suspect we'll see similar in | language processing models as we've seen in vision models. | Consider this[1] article ("NLP's ImageNet moment has | arrived"), comparing AlexNet in 2012 to the first GPT model 6 | years later: we're just a few years behind. | | True, GPT-2 and -3, RoBERTa, T5 etc. are all increasingly | data- and compute-hungry. That's the 'tick' your second | article mentions. | | We simultaneously have people doing research in the 'tock' - | reducing the compute needed. ICLR 2020 was full of | alternative training schema that required less compute for | similar performance (e.g. ELECTRA[2]). Model distillation is | another interesting idea that reduces the amount of | inference-time compute needed. | | [1] https://thegradient.pub/nlp-imagenet/ | | [2] https://openreview.net/pdf?id=r1xMH1BtvB | hpoe wrote: | So I know Moore's law is "dead" (dead as in Cobol or dead as in | Elvis?) and progress is definitely slower than it has been | historically however we have only began to really start | leveraging parallelization at scale from a software | perspective, so I think we have some runway in that direction, | and of course the looming elephant on the horizon, Quantum | computing. | | Sure it is in it's infancy but assuming that the research | continues to prove that quantum computing is viable I expect it | to be an even bigger deal than the move from vacuum tubes to | transistors. At that point we'll be dealing with an entirely | different world in computing. | nessunodoro wrote: | it's kind of poetic that the chief bottleneck of advancement in | the field is now the physical universe - | annoyingnoob wrote: | That is a wall of words, I can't even read it in that format. | koeng wrote: | This lesson can be applied to synthetic biology right now, though | it is still in its infant stages. | | At least a few of the original synthetic biologists are a bit | disappointed in the rise of high-throughput testing for | everything, instead of "robust engineering". Perhaps what allows | us to understand life isn't just more science, but more "biotech | computation". | auggierose wrote: | I guess it depends on what you trying to do. I had a computer | vision problem where I was like, hell yeah, let's machine learn | the hell out of this. 2 months later, and the results were just | not precise enough. It took me 2 more months, and now I am | solving the task easily on an iPhone via Apple Metal in | milliseconds with a hand-crafted optimisation approach ... | jefft255 wrote: | His advice really concerns more scientific research and its | long-term progress, and not really immediate applications. I | think that injecting human knowledge can lead to faster, more | immediate progress, and he seems to believe that too. The | "bitter lesson" is that general, data-driven approaches will | always win out eventually. | [deleted] | sytse wrote: | The article says we should focus on increasing the compute we use | in AI instead of embedding domain specific knowledge. OpenAI | seems to have taken this lesson to heart. They are training a | generic model using more compute than anything else. | | Many researchers predict a plateau for AI because it is missing | the domain specific knowledge but this article and the benefits | of more compute that OpenAI is demonstrating beg to differ. | throwaway7281 wrote: | Model compression is an active research field and will probably | be quite lucrative, as you will literally able to save | millions. | dyukqu wrote: | Previous discussion: | https://news.ycombinator.com/item?id=19393432 | JoeAltmaier wrote: | Got to believe, this is like heroin. Its a win until it isn't. | Then where will AI researchers be? No progress for 20 (50?) years | because the temptation to not understand but to just build | performant engineering solutions, was so strong. | | In fact, is the researcher supposed to be building the most | performant solution? This article seems alarmingly misinformed. | To understand 'artificial intelligence' isn't a race to VC money. | visarga wrote: | AI as a field relied mostly on 'understanding' based approaches | for 50 years without much success. These approaches were too | brittle and ungrounded. Why return to something that doesn't | work? | | DNNs today can generate images that are hard to distinguish | from real photos, super natural voices and surprisingly good | text. They can beat us at all board games and most video games. | They can write music and poetry better than the average human. | Probably also drive better than an average human. Why worry | about 'no progress for 50 years' at this point? | JoeAltmaier wrote: | Because, they can't invent a new game. Unless of course they | were only designed to invent games, and by trial and error | and statistical correlation to existing games, thus producing | a generic thing that relates to everything but invents | nothing. | | I'm not an idiot. I understand that we won't have general | purpose thinking machines any time soon. But to give up | entirely looking into that kind of thing, seems to me to be a | mistake. To rebrand the entire field as calculating results | to given problems and behaviors using existing mathematical | tools, seems to do a disservice to the entire concept and | future of artificial intelligence. | | Imagine if the field of mathematics were stumped for a while, | so investigators decided to just add up things faster and | faster, and call that Mathematics. | visarga wrote: | What GPT-3 and other models lack is embodiment. There are | of course RL agents embodied in simulated environments, | like games and robot sims, but this pales in comparison to | our access to nature and the human society. When we will be | able to give them a body they will naturally rediscover | play and games. | | Human superiority doesn't come just from the brain, it | comes from the environment this brain has access to - other | humans, culture, tools, nature, and the bodily affordances | (hands, feet, eyes, ability to assimilate organic food...). | AI needs a body and an environment to evolve in. | otoburb wrote: | >> _This article seems alarmingly misinformed._ | | I hate appeals to authority as much as anybody else on HN, but | I'm not sure that we could say Rich Sutton[1] is "misinformed". | He's an established expert in the field, and if we discount his | academic credentials then at least consider he's understandably | biased towards this line of thinking as one of the early | pioneers of reinforcement learning techniques[2] and currently | a research scientist at DeepMind leading their office in | Alberta, Canada. | | [1] https://en.wikipedia.org/wiki/Richard_S._Sutton | | [2] http://incompleteideas.net/papers/sutton-88-with- | erratum.pdf | JoeAltmaier wrote: | He's writing that article for a reason, to be sure. Its just | not the one that the article says its about, I'm thinking. | fxtentacle wrote: | The current top contender on AI optical flow uses LESS CPU and | LESS RAM than last year's leader. As such, I strongly disagree | with the article. | | Yes, many AI fields have become better from improved | computational power. But this additional computational power has | unlocked architectural choices which were previously impossible | to execute in a timely manner. | | So the conclusion may equally well be that a good network | architecture results in a good result. And if you cannot use the | right architecture due to RAM or CPU constraints, then you will | get bad results. | | And while taking an old AI algorithm and re-training it with 2x | the original parameters and 2x the data does work and does | improve results, I would argue that that's kind of low-level | copycat "research" and not advancing the field. Yes, there's a | lot of people doing it, but no, it's not significantly advancing | the field. It's tiny incremental baby steps. | | In the area of optical flow, this year's new top contenders | introduce many completely novel approaches, such as new | normalization methods, new data representations, new | nonlinearities and a full bag of "never used before" augmentation | methods. All of these are handcrafted elements that someone built | by observing what "bug" needs fixing. And that easily halved the | loss rate, compared to last year's architectures, while using | LESS CPU and RAM. So to me, that is clear proof of a superior | network architecture, not of additional computing power. | cgearhart wrote: | I have read this before and broadly agree with the point--it's no | use trying to curate expertise into AI. But I don't think | modeling p(y|x) or it's friend p(y, x) is the end we're looking | for either. But, it's unreasonably effective, so we keep doing | it. (I don't have an answer or an alternative; causality appeals | to my intuition, but it's really clunky and has seemingly not | paid off.) | sgt101 wrote: | Actually I feel like causalities time has come. The framework | that has convinced me is just the simple approach of doing | controlled experiments over observational data to establish | causal links via DAGs no need for any drama! | cgearhart wrote: | It seems to be just shuffling around the hard part of the | problem. Causality still depends on some unstructured | optimization problem of generating and evaluating causal | diagram candidates. I haven't really seen it applied where | the set of potential causal relationships is huge. | avmich wrote: | > When a simpler, search-based approach with special hardware and | software proved vastly more effective, these human-knowledge- | based chess researchers were not good losers. | | It's like calling Russia a loser in Cold War. Technically the | effect is reached; practically the side which "lost" gained | possibly largest benefits. | glitchc wrote: | When it comes to games, exploitation (of tendencies, weaknesses), | misdirection, subterfuge and yomi play a far bigger role in | winning than actual skill. Humans are much better than computers | at all of those. Perhaps a dubious honour, but an advantage | nonetheless. We're only really in trouble when the machine learns | to reliably replicate the same tactics. | elcomet wrote: | I think that computers managed to beat humans at poker already. | (Online poker, which is different from physical games, where of | course AI cannot compete) | YeGoblynQueenne wrote: | >> In computer chess, the methods that defeated the world | champion, Kasparov, in 1997, were based on massive, deep search. | | "Massive, deep search" that started from a book of opening moves | and the combined expert knowledge of several chess Grandmasters. | And that was an instance of the minimax algorithm with alpha-beta | cutoff, i.e. a search algorithm specifically designed for two- | player, deterministic games like chess. And with a hand-crafted | evaluation function, whose parameters were filled-in by self- | play. But still, an evaluation function; because the minimax | algorithm requires one and blind search alone did not, could not, | come up with minimax, or with the concept of an evaluation | function in a million years. Essentially, human expertise about | what matters in the game was baked-in to Deep Blue's design from | the very beginning and permeated every aspect of its design. | | Of course, ultimately, search was what allowed Deep Blue to beat | Kasparov (31/2-21/2; Kasparov won two games and drew another). | That, in the sense that the alpha-beta minimax algorithm itself | is a search algorithm and it goes without saying that a longer, | deeper, better search will inevitably eventually outperform | whatever a human player is doing, which clearly is not search. | | But, rather than an irrelevant "bitter" lesson about how big | machines can perfom more computations than a human, a really | useful lesson -and one that we haven't yet learned, as a field- | is why humans can do so well _without search_. It is clear to | anyone who has played any board game that humans can 't search | ahead more than a scant few ply, even for the simplest games. And | yet, it took 30 years (counting from the Dartmouth workshop) for | a computer chess player to beat an expert human player. And | almost 60 to beat one in Go. | | No, no. The biggest question in the field is not one that is | answered by "a deeper search". The biggest question is "how can | we do that without a search"? | | Also see Rodney Brook's "better lesson" [2] addressing the other | successes of big search discussed in the article. | | _____________ | | [1] | https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)#Des... | | [2] https://rodneybrooks.com/a-better-lesson/ | new2628 wrote: | At least in chess, if it is not the search, then it is probably | the evaluation function. | | Expert players have likely a very well-tuned evaluation | function of how strong a board "feels". Some of it is | explainable easily: center domination, diagonal bishop, | connected pawn structure, rook supporting pawn from behind, | others are more elaborate, come with experience and harder to | verbalize. | | When expert players play against computers, the limitation of | their evaluation function becomes visible. Some board may feel | strong, but you are missing some corner case that the minmax | search observes and exploits. | YeGoblynQueenne wrote: | I like to caution against taking concepts from computer | science and AI and applying them directly to the way the | human mind works. Unless we know that a player is applying a | specific evaluation function (e.g. because they tell us, or | because they vocalise their thought process etc) then even | suggesting that "players have an evaluation function" is | extrapolating far from what it is safe. For one thing- what | does a "function" look like in the human mind? | | Whatever human minds do, computing is only a very general | metaphor for it and it's very risky to assume we understand | anything about our mind just because we understand our | computers. | burntoutfire wrote: | > No, no. The biggest question in the field is not one that is | answered by "a deeper search". The biggest question is "how can | we do that without a search"? | | My guess is that we're doing pattern recognision, where we | recognize taht a current game state is similar to a situation | that we've been in before (in some previous game), and recall | the strategy we took and the outcomes it had lead to. With | large enough body of experience, you can to remember lots of | past attempted strategies for every kind of game state (of | course, within some similarity distance). | blt wrote: | This insight is the essence of the AlphaZero architecture. | Whereas a pure Monte Carlo Tree Search (MCTS) starts each | node in the search tree with a uniform distribution over | actions, AlphaZero trains a neural network to observe the | game state and output a distribution over actions. This | distribution is optimized to be as similar as possible to the | distribution obtained from running MCTS from that state in | the past. It's very similar to the way humans play games. | dreamcompiler wrote: | Are we certain that well-trained human players are not doing | search? It's possible that a search subnetwork gets "compiled | without debugger symbols" and the owner of the brain is simply | unaware that it's happening. | gwern wrote: | I'm not sure why YeGoblynQueenne thinks this is such a | mystery. (This is not the first time I've been puzzled by | their pessimism on HN.) There is no mystery here: AlphaZero | shows that you can get superhuman performance by searching | only a few ply by sufficiently good pattern recognition in a | highly parameterized and well-trained value function, and | MuZero makes this point even more emphatically by doing away | with the formal search entirely in favor of an more abstract | recurrent pondering. What more is there to say? | YeGoblynQueenne wrote: | >> (This is not the first time I've been puzzled by their | pessimism on HN.) | | I don't understand why you keep making personal comments | like that about me. I suspect you don't realise that they | are unpleasant. Please let me make it clear: such personal | comments are unpleasant. Could you please stop them? Thank | you. | YeGoblynQueenne wrote: | MuZero performs a "formal search". In many more ways than | one, for example optimisation is still a search for an | optimal search of parameters. But I guess you mean that it | doesn't perform a tree search? Quoting from the abstract of | the paper on arxiv [1]: | | _In this work we present the MuZero algorithm which, by | combining _a tree-based search_ with a learned model, | achieves superhuman performance in a range of challenging | and visually complex domains, withoutany knowledge of their | underlying dynamics_. | | (My underlining) | | If I remember correctly, MuZero is model-free in the sense | that it learns its own evaluation function and reward | policy etc (also going by the abstract). But it retains | MCTS. | | Indeed, it wouldn't really make sense to drop MCTS from the | architecture of a system designed to play games. I mean, it | would be really hard to justify discarding a component that | is well known to work and work well, both from an | engineering and a scientific point of view. | | _________________ | | https://arxiv.org/abs/1911.08265 | YeGoblynQueenne wrote: | >> Are we certain that well-trained human players are not | doing search? | | Yes- because human players can only search a tiny portion of | a game tree and a minimax search of the same extent is not | even sufficient to beat a dedicated human in tic-tac-to, leta | lone chess. That is, unless one wishes to countenance the | possibility of an "unconscious search" which of course might | as well be "the grace of God" or any such hand-wavy non- | explanation. | | >> It's possible that a search subnetwork gets "compiled | without debugger symbols" and the owner of the brain is | simply unaware that it's happening. | | Sorry, I don't understand what you mean. | oezi wrote: | Why do you dismiss the unconscious search that humans do in | Go? Having learned Go some years ago it is such an exciting | thing to realize that with practice the painstaking process | of consciously evaluating the myriads possibilities of | moves gives way to just "seeing" solutions out of nothing. | You can really feel that your brain did wire itself up to | do analysis for you at a level that is subconscious but | interfaces so gracefully with your conscious cognition that | it is a real marvel. | YeGoblynQueenne wrote: | >> Why do you dismiss the unconscious search that humans | do in Go? | | The question is why you say that humans perform an | unconscious search when they play Go. And what kind of | search is it, other than unconscious? Could you describe | it, e.g. in algorithmic notation? I mean, I'm sure you | couldn't because if you could then the problem of | teaching a computer to play Go as well as a human would | have been solved years and years ago. But, if you can't | describe what you're doing, then how do you know it's a | "search"? | | Note that in AI, when we talk of "search" (edit: at | least, in the context of game-playing) we mean something | very specific: an algorithm that examines the nodes of a | tree and applies some criterion to label each examined | node as a target node or not a target node. Humans are | absolutely awful at executing such an algorithm with our | minds for any but the most trivial of trees, at least | compared to computers. | mtgp1000 wrote: | >But, rather than an irrelevant "bitter" lesson about how big | machines can perfom more computations than a human, a really | useful lesson -and one that we haven't yet learned, as a field- | is why humans can do so well without search | | I think the answer is heuristics based on priors(e.g. board | state), which we've demonstrated (with alphago and derivatives, | especially alphago zero) that neural networks are readily able | to learn. | | This is why I get the impression that modern neural networks | are quickly approaching humanlike reasoning - once you figure | out how to | | (1) encode (or train) heuristics and | | (2) encode relationships between concepts in a manner which | preserves a sort of topology (think for example of a graph | where nodes represent generic ideas) | | You're well on your way to artificial general reasoning - the | only remaining question becomes one of hardware (compute, | memory, and/or efficiency of architecture). ___________________________________________________________________ (page generated 2020-07-09 23:01 UTC)