[HN Gopher] Recent Advances in Natural Language Processing ___________________________________________________________________ Recent Advances in Natural Language Processing Author : saadalem Score : 139 points Date : 2020-08-16 18:03 UTC (4 hours ago) (HTM) web link (deponysum.com) (TXT) w3m dump (deponysum.com) | YeGoblynQueenne wrote: | Separate comment because separate point. | | To my mind, all this benchmark-beating is just pointless faffing | about and not a little frustrating to see that this is where the | current state of the art is at. NLP systems keep getting better | and better at beating benchmarks and yet we still don't have | artificial agents that can hold an intelligent conversation, even | in a restricted context. | | Or rather, we had such programs since the late 1960's, with no | one else's than Terry Winograd's own doctoral thesis system, | SHRDLU [1], but it seems that in the race to build powerful | classifiers, we forgot how to build systems that can interact | intelligently with a human. | | Here's a small excerpt from SHRDLU (taken from the wikipedia page | on SHRDLU, originally in an appendix in Terry Winograd's thesis): | Person: Is there a large block behind a pyramid? | Computer: YES, THREE OF THEM: A LARGE RED ONE, A LARGE GREEN | CUBE, AND THE BLUE ONE. Person: Put a small one onto the | green cube which supports a pyramid. Computer: OK. | Person: Put the littlest pyramid on top of it. Computer: | OK. Person: Does the shortest thing the tallest pyramid's | support supports support anything green? Computer: YES, | THE GREEN PYRAMID. Person: What color is it? | Computer: BY "IT", I ASSUME YOU MEAN THE SHORTEST THING THE | TALLEST PYRAMID'S SUPPORT SUPPORTS. Computer: RED | | Note that this is not just mindless text generation. The | conversation is held in the context of a "blocks world" where a | robot with a single hand and eye ("Computer" in the excerpt | above) moves blocks of various shapes and colours around, as | directed by a human user in free-form natural language. When the | Computer says "OK" after it's directed to "put the littlest | pyramid on top of it" it's because it really has grabbed the | smallest pyramid in the blocks world and placed it on top of the | small block in an earlier sentence, as the Person asked. The | program has a memory module to keep track of what ellipses like | "it", "one" etc refer to throughout the conversation. | | SHRDLU was a traditional program hand-crafted by a single PhD | student- no machine learning, no statistical techniques. It | included, among other things, a context-free grammar (!) of | natural English and a planner (to control the robot's hand) all | written in Lisp and PLANNER. In its limited domain, it was | smarter than anything ever created with statisical NLP methods. | | ______________________ | | [1] https://en.wikipedia.org/wiki/SHRDLU | quotemstr wrote: | Why is it surprising that a CFG can approximate a subset of | English grammar? | trott wrote: | > Why is it surprising that a CFG can approximate a subset of | English grammar? | | "Colorless green ideas sleep furiously" is a famous example | of a sentence that is grammatical, but meaningless. The goal | of SHRDLU was far more ambitious than approximating English | grammar. | liuliu wrote: | We knew hand-crafted program in limited domains can work for | NLP, computer vision and voice recognition long time ago. The | challenge is always, that the limited domain can be extremely | limited, and to get anything practically interesting requires a | lot of human involvement to encode the world (expert system). | | Statistical methods traded that. With data, some labelled, some | unlabelled and some weakly-labelled, we can generate these | models with much more efficient human involvement (refine the | statistical models and labelling data). | | I honestly don't see the frustration. Yes, current NLP model | may not be the "intelligent agent" everyone looking for yet to | any extent. But claiming it is all faffing and no better than | 1960s is quite a stretch. | joe_the_user wrote: | The thing that qualifies as "faffing" in my opinion isn't the | statistical NLP programs, which "are what they are" but | _claims of progress based primarily on benchmarks_ , as | YeGoblynQueene rightly states. | | And "limited domain" is relative. A program that gets many | aspects of language right talking about a small world might | be said to have a larger domain than a program that outputs | stream of semi-plausible, semi-gibberish involving the whole | of English language. Which again isn't saying modern NLP is | nothing but rather we should a somewhat better to talk of it | (and machine learning generally) than "hitting benchmarks". | YeGoblynQueenne wrote: | >> We knew hand-crafted program in limited domains can work | for NLP, computer vision and voice recognition long time ago. | | Yes, we did. So- where are the natural language interfaces by | which we can communicate with artificial agents in such | limited domains? Where are the applications, today, that | exhibit behaviour as seemingly intelligent as SHRDLU in the | '60s? I mean, have you personally seen and interacted with | one? Can you show me an example of such a modern system? | Edit: Note again that SHRDLU was created by a single PhD | student with all the resources of ... a single PhD student. | It's no stretch to imagine that an entity of the size of | Google or Facebook could achieve something considerably more | useful, still in a limited domain. But this has never even | been attempted. | | Yes, it is faffing about. Basically NLP gave up on figuring | out how language works and switched to a massive attempt to | model large datasets evaluated by contrived benchmarks that | serve no other purpose than to show how well modern | techniques can model large datasets. | joe_the_user wrote: | Good question! | | As you say SHRDLU was the best program of that time. | | I've read Winnograd's book and looked at the SHRDLU source | code (not that I'm much of a lisp hacker but I got some | time). It's built-on a parser and planner (logic program, | pre-prolog). And it's built the old-fashioned-way, | rewriting the source code with the parser rewriting input | and then re-rerunning and other harry things. I think this | achieves the parsing of idiomatic constructions on a high | level. I believe the "raw lisp" of the day was both | incredibly powerful since you could do anything and | incredibly hard to scale ... you could do anything. | | Winnograd wrote it himself but I think that's because he | had to write it himself. In a sense, a programmer is always | most productive when they are writing something by | themselves because they don't have to explain anything they | are doing (until the fix-ups and complexity overwhelm them | but the base fact remains). And in the case of SHRDLU, | Winnograd would have had an especially hard time explaining | what he was doing. I mean, there was a theory behind - I've | read Winnograd's book. But there was lots and lots of | brilliant spaghetti/glue-code to actually make it work, | code that jumped module and function boundaries. And the | final had a reputation of being very buggy, sometimes it | worked and sometimes it didn't. And Winnograd was a | brilliant programmer and widely read in linguistics and | other fields. | | The software industry is an industry. No company wants to | depend on the brilliance of it's workers. A company needs | to produce based on some sort of average and a person | working with just average skills isn't going to do SHRDLU. | | So, yeah, I think that's why actual commercial programs | never reached the level of SHRDLU | _emacsomancer_ wrote: | So what you're saying is that we really shouldn't be | relying on industry for good AI? | joe_the_user wrote: | Well, an "industrial model" is a model of a factory and | it seems unlikely you could product a fully-functional, | from scratch GFAI program like SHRDLU in something like a | factory. | | Perhaps one could create a different kind of enterprise | for this but it's kind of an open problem. | YeGoblynQueenne wrote: | Winograd made changes to the Lisp assembly to make SHRDLU | work and he never back-ported them to his SHRDLU code, | but his original version worked fine and was stable. The | experience of breaking refers to later versions that were | expanded by his doctoral students and others and to ports | to java and I think C. The original code was written in | 1969 but inevitably suffered from bit rot in the | intervening years so it's true that there is no stable | version today that can reliably do what Winograd's | original code could do... but Winograd's original code | was rock solid, according to the people who saw it | working. | | There's some information about all that here: | | http://maf.directory/misc/shrdlu.html | | _[Dave McDonald] (davidmcdonald@alum.mit.edu) was Terry | Winograd 's first research student at MIT. Dave reports | rewriting "a lot" of SHRDLU ("a combination of clean up | and a couple of new ideas") along with Andee Rubin, Stu | Card, and Jeff Hill. Some of Dave's interesting | recollections are: "In the rush to get [SHRDLU] ready for | his thesis defense [Terry] made some direct patches to | the Lisp assembly code and never back propagated them to | his Lisp source... We kept around the very program image | that Terry constructed and used it whenever we could. As | an image, [SHRDLU] couldn't keep up with the periodic | changes to the ITS, and gradually more and more bit rot | set in. One of the last times we used it we only got it | to display a couple of lines. In the early days... that | original image ran like a top and never broke. Our | rewrite was equally so... The version we assembled circa | 1972/1973 was utterly robust... Certainly a couple of | dozen [copies of SHRDLU were distributed]. Somewhere in | my basement is a file with all the request letters... | I've got hard copy of all of the original that was Lisp | source and of all our rewrites... SHRDLU was a special | program. Even today its parser would be competitive as an | architecture. For a recursive descent algorithm it had | some clever means of jumping to anticipated alternative | analyses rather than doing a standard backup. It defined | the whole notion of procedural semantics (though Bill | Woods tends to get the credit), and its grammar was the | first instance of Systemic Functional Linguistics applied | to language understanding and quite well done." Dave | believes the hardest part of getting a complete SHRDLU to | run again will be to fix the code in MicroPlanner since | "the original MicroPlanner could not be maintained | because it had hardwired some direct pointers into the | state of ITS (as actual numbers!) and these 'magic | numbers' were impossible to recreate circa 1977 when we | approached Gerry Sussman about rewriting MicroPlanner in | Conniver." _ | | Regarding the advantage of a lone programmer- that's | real, but large teams have built successful software | projects before, very often. I guess you don't even need | a big team, just a dozen people who all know what they're | doing. That shouldn't be hard to put together given FANG- | level resources. Hell, that shouldn't be hard to do given | a pool of doctoral students from a top university... but | nowadays even AI PhD students would have no idea how to | recreate something like SHRDLU. | | Edit: I got interested in SHRDLU recently (hence the | comments in this thread) and I had a look at Winograd's | thesis to see if there was any chance to recreate it. The | article above includes a link to a bunch of flocharts of | SHRDLU'S CFG but even deciphering those hand-drawn and | occasionally vague plans would take a month or two of | doing nothing else, something for which I absolutely do | not have the time. And that's only the grammar - the rest | of the program woudl have to be reverse-engineered from | Winograd's thesis, examples of output from the original | code or later clones, etc. That's a project for a team of | digital archeologists, not software developers. | joe_the_user wrote: | _"... his original version worked fine and was stable. | The experience of breaking refers to later versions that | were expanded by his doctoral students and others and to | ports to java and I think C. "_ | | I can believe this but I think your details overall | reinforce my points above. | | _For a recursive descent algorithm it had some clever | means of jumping to anticipated alternative analyses | rather than doing a standard backup._ | | Yeah, fabulous but extremely hard to extend or reproduce. | The aim of companies was to _scale_ something like this. | It seems like the fundamental problem was only a few | really smart people could programs to this level and no | one could take them beyond it (the saw that a person half | to be twice as smart to debug a program as to write comes | in, etc). | lacker wrote: | The modern natural language interfaces with limited domains | are Alexa and Siri. Yes, they're limited. But they are far | more impressive and useful than SHRDLU. | YeGoblynQueenne wrote: | Alexa and Siri (and friends) are competely incapable of | interacting with a user with the precision of SHRDLU. You | can ask them to retrieve information from a Google search | but e.g. they have no memory of the anaphora in earlier | sentences in the same conversation. If you say "it" a few | times to refer to different entities they completely lose | the plot. | | They are also completely incapable of reasoning about | their environment, not least because they don't have any | concept of an "environment" - which was represented by | the planner and the PROGRAMMAR language in SHRDLU. | | And of course, systems like Siri and Alexa can't do | anything even remotely like correctly disambiguating the | "support support supports" show-off sentence in the | excerpt above. Not even close. | | Edit: Sorry, there's a misunderstanding about "limited | domain" in your comment. Alexa and Siri don't operate in | a limited domain. A "limited domain" would be something | like being in charge of your music collection and nothing | else. Alexa annd Siri etc are supposed to be general-use | agents. I mean, they are, it's just that they suck at | it... and would still suck in a limited domain also. | rvense wrote: | My impression is that the systems never progressed much | after SHRDLU even though there were attempts at larger | scale "expert systems". But adding more advanced rules and | patterns proved extremely difficult and did not always have | the expected effect of making the systems more general. | | There was the whole AI winter thing, of course, but that | was as much a result of things not living up to the hype as | a cause. | YeGoblynQueenne wrote: | This doesn't directly address your question, though it | perhaps can give you some pointers if you want to read | about the history of AI and the AI winter of the '80s, | but in a way SHRDLU featured prominently in the AI | winter, at least in Europe, particularly in the UK. | | So, in the UK at least the AI winter was precipitated by | the Lighthill Report, a report published in 1973, | compiled by a Sir James Lighthill and commissioned by the | British Research Council, i.e. the people who held all | the research money at the time in the UK. The report was | furiously damning of AI research of the time, mostly | because of grave misunderstandings e.g. with respect to | combinatorial explosion and basically accused researchers | of, well, faffing about and not doing anything useful | with their grant money. The only exception to this was | SHRDLU, that Lighthill praised as an example of how AI | should be done. | | Anyway, if you have time, you can watch the televised | debate between Lighthill and three luminaries of AI, John | McCarthy (the man who named the field, created Lisp and | did a few other notable things), Donald Michie (known for | his MENACE reinforcement-learning program running on... | matchboxes, and basically setting up AI research in the | UK) and Richard Gregorie (a cognitive scientist from the | US for whom I confess I don't know much). The (short) | wikipedia article on the Lighthill Report has links to | all the youtube videos: | | https://en.wikipedia.org/wiki/Lighthill_report | | It's interesting to see in the videos the demonstration | of the Freddy robot from Edinburgh, that was capable of | constructing objects by detecting their components with | early machine vision techniques. In the 1960's. | Incidentally: | | _Even with today 's knowledge, methodology, software | tools, and so on, getting a robot to do this kind of | thing would be a fairly complex and ambitious project._ | | http://www.aiai.ed.ac.uk/project/freddy/ | | The above was written sometime in the '90s, I reckon but | it is still true today. Unfortunately, Lighthill's report | killed the budding robotics research sector in the UK and | it has literally never recovered since. This is typical | of the AI winter of the '80s. Promising avenues of | research were abandoned not because of any scientific | reasons, as is sometimes assumed ("expert systems didn't | scale" etc) but, rather, because pencil pushers in charge | of disbursing public money didn't get the science. | | Edit: A couple more pointers. John McCarthy's review of | the Lighthill Report: | | http://www- | formal.stanford.edu/jmc/reviews/lighthill/lighthi... | | An article on the AI winter of the '80s by the editor of | IEEE Intelligent Systems: | | https://www.computer.org/csdl/magazine/ex/2008/02/mex2008 | 020... | rvense wrote: | Interesting, thank you for the clarifications. | ericbarrett wrote: | I read grandparent's post as saying that despite all the | research and untold amounts of compute power poured into NLP | over the decades, its practitioners have yet to address the | original real-world goals that led us to study it in the | first place. Missing the forest for the trees, if you will. | | (I don't assert that it has hasn't. I know just enough about | the topic to see how little I know. But it seems, from the | outside, a valid criticism, and not one unique to the field.) | tmalsburg2 wrote: | Since this post is receiving downvotes, I'd like to know | which part of the HN guidelines it is violating. | mqus wrote: | Not a single mention if this is only applicable to english or to | other natural languages. Afaict this mostly lists advancements in | ELP (english language processing), Especially the Winograd schema | (or ar least the given example) seems to be heavily focused on | english. | | Relevant article for this problem: | https://news.ycombinator.com/item?id=24026511 | MiroF wrote: | But there's no reason the models are english specific... | woodson wrote: | In a certain way, they could be, simply because their | structure may work better on the type of language that | English is. It may not work as well for languages that | exhibit other grammatical patterns (e.g. morphologically rich | languages, or those that exhibit superficially more flexible | word order which nonetheless conveys information about | topic/focus). | dvduval wrote: | Some languages like Chinese have a large corpus of | information that can be studied, and there has been a | continuous effort to standardize the language. it wouldn't | surprise me if it turns out to be easier than English. I | would expect it would be more difficult when you're talking | about languages that have regional differences in usage of | words, perhaps northern Vietnamese vs. Southern Vietnamese, | or the Spanish of Spain versus the Spanish of Mexico. | east2west wrote: | Chinese has large regional differences, too. Cantonese | has different grammar, not just word usage. Modern | Chinese grammar is somewhat similar to English while | Cantonese follows ancient Chinese, which places verb to | the end of sentence. | dvduval wrote: | True, and Mandarin would be the better word to use here. | Mandarin and Cantonese are for all intents and purposes | separate languages. They are more different than English | and Spanish. | yorwba wrote: | If we're going down that path, then it should more | specifically be Standard Mandarin, since Standard | Mandarin and e.g. Southwestern Mandarin are only | partially mutually intelligible like French and Spanish | are partially mutually intelligible. | DonaldFisk wrote: | Cantonese is, like English and Mandarin, SVO. | | E.g. I bought this book: ngoh(S) maaihjo(V) nibun syu(O). | | Contrast with Japanese: watashi wa(S) kono hon o(O) | kaimashita(V). | | Differences from Mandarin Grammar are listed here: https: | //en.wikipedia.org/wiki/Cantonese_grammar#Differences_... | | Classical Chinese is also SVO: | https://en.wikipedia.org/wiki/Classical_Chinese_grammar | kingkawn wrote: | All language has significant regional difference if it | covers enough regions | akerro wrote: | I'm 100% sure NLP will do the same mistake with non-English | language as face recognition did with faces of black people. | | I'm taking bets. | m0zg wrote: | For some of them, there is. Russian (my mother tongue), for | example, has the kind of morphology that makes linguists' | hair stand on end. Also, verbs and nouns are gendered (and | therefore they must agree, and you must know what everything | refers to at all times, often pretty far away from where the | noun was first mentioned), and there are declensions on | various parts of speech, and they must agree as well. And | words can be freely formed out of several roots, prefixes, | and suffixes. For any of this to be understood or generated, | your model has to be able to model all of that, which is much | harder than modeling English. | YeGoblynQueenne wrote: | Yes, there is. A "model" is a set of parameters optimisted by | some algorithm or system trained on the data in a specific | dataset. Thus, a language model trained on a dataset of | English language examples is only capable of representing | English language utterances, not, e.g. French, or Greek, or | Gujarati utterances. Diagrammatically: data | --> system --> model | | What is not necessarily English-specific are the systems used | to train different language models, at least in theory. In | practice, systems are typically hand-crafted and fine-tuned | to specific datasets, to such a degree that most of the work | has to be done anew to train on a different dataset. | ascavalcante80 wrote: | NLP is great for many things, but, from my own experience as a | NLP developer, machines are not even close to understand human | language. They can interpret well some kind of written speech, | but they will struggle to grasp two humans speaking to each | other. The progress we are make on building chatbots and vocal | assistants is mainly due to the fact We are learning how to | speaking to the machines, and not the contrary. | p1esk wrote: | Note this is pre-GPT-3. In fact I expect GPT-4 will be where | interesting things start happening in NLP. | curiousgal wrote: | I honestly don't get where the big deal is with NLP. So far the | most useful application has been customer support chatbots and | those still don't rise to the level of having an actual human | that can understand the intricacies of your special request. | feral wrote: | I work on such a bot @Intercom. | | A lot of support requests aren't actually intricate and | special: there's almost always loads of simple requests that | come in again and again. | | When you ask a request like that and get the answer | instantly, that really is ML delivering value. | | You mightn't think it, but a _lot_ of people work in customer | support, and spend a lot of time answering rote questions | again and again. | | People talk about how much hype there is, or the chance of | another AI winter. Yes there's a ton of hype - but I think | they aren't considering the real value already being | delivered this time around. | | Everyone is excited about GPT3, but there's already been | amazing progress in practical NLP already, over the last few | years. | ben_w wrote: | Current NLP is bad. Still useful (Google search increasingly | feels like it is doing NLP to change what I asked for into | what it thinks I meant) but bad. A hypothetical future | "perfect" NLP can demonstrate any skill that a human could | learn by reading, and computers can read so much more than | any given human. | plafl wrote: | Is reading enough to understand the real world without | direct experience of the real world? Is there any research | that tries to answer this question? | p1esk wrote: | _Is there any research that tries to answer this | question?_ | | That's the whole point of the experiment called GPT-3. | rvense wrote: | As of about ten years ago when I received my degree in | linguistics, I understood there were two schools on this | issue: | | A: Of course not, let's do something else. B: What? You | put text in the maths and stuff comes out. | MiroF wrote: | That's just not even close to the only or most useful | application. | | I use NLP and associates s2s techniques every day. I struggle | to see how so many people don't see the obvious benefits deep | inference is bringing to stuff all around them. | p1esk wrote: | _those still don 't rise to the level of having an actual | human_ | | What if they did? Do you see where the big deal is now? | claudeganon wrote: | This past week, I used this new edit distance library to | identify quasi-duplicates in a large dataset: | | https://github.com/Bergvca/string_grouper | | Saved me hours of work because all the Levenshtein | implementations are pretty slow and I'm going to need to | rerun the analysis as the dataset grows. | | I don't know about consumer-facing tools, but NLP stuff has | helped me solve all kinds of tedious data problems at work. | thekyle wrote: | If a chatbot was as good as a human then would you notice it | was a chatbot? | narag wrote: | Has it happened that a "thought experiment" has become a real | experiment ever? | exo-pla-net wrote: | Not sure exactly what you mean, but Einstein developed | relativity largely through thought experiments. And relativity | has been verified by real experiments. | dane-pgp wrote: | Most historians think that this was actually a thought | experiment: | | https://en.wikipedia.org/wiki/Galileo's_Leaning_Tower_of_Pis... | | An equivalent experiment was famously carried out for real in | 1971 on the surface of the Moon. | simonh wrote: | In a sense every experiment starts as a thought experiment. | FiberBundle wrote: | I found the science exams results interesting and skimmed the | paper [1]. They report an accuracy of >90% on the questions. What | I found puzzling was that they have a section in the experimental | results part where they test the robustness of the results using | adverserial answer options, more specifically they used some | simple heuristic to choose 4 additional answer options from the | set of other questions which maximized 'confusion' for the model. | This resulted in a drop of more than 40 percentage points in the | accuracy of the model. I find this extremely puzzling, what do | these models actually learn? Clearly they don't actually learn | any scientific principles. | | [1] https://arxiv.org/pdf/1909.01958.pdf | teej wrote: | That's the thing. Machine Learning is a misnomer, the models | don't "learn" anything about the domain they operate in. It's | just statistical inference. | | A dog can learn to turn left or to turn right for treats. But | they don't understand the concept of "direction", their brain | isn't wired that way. | | Machine learning models perform tricks for treats. The tricks | they do get more impressive by the day. But don't be deceived, | they aren't wired to gain knowledge. | 0-_-0 wrote: | Doesn't your brain derive knowledge through statistical | inference? How else? I can imagine the AGI of the future | (having perfected knowledge gathering through statistical | inference) some day saying: "Humans perform tricks for | treats. The tricks they do are impressive. But don't be | deceived, they aren't wired to gain knowledge." | teej wrote: | I didn't claim that the way brains work didn't include | statistical inference. That's why the dog metaphor works. | Both the dog and the machine learning model use statistical | inference to perform tricks. | | Dogs haven't been to the moon, however. There's more to the | brain we don't understand. | plutonorm wrote: | My god if I hear this argument one more time I'm going to | pop. What on earth gives you the idea that you are something | more than statistical inference? | teej wrote: | Yes, it's widely believed that statistical inference is a | part of how the brain operates. But we have barely | scratched the surface in our understanding how the human | brain works. | | Do you honestly believe statistical inference completely | explains a human's ability to learn? | wrs wrote: | I would be interested in hearing the results from _humans_ | presented with adversarial answer options. You may say that a | machine learning correlations between words isn't really | learning science, but I wonder how many human students aren't | either, just pretty much learning correlations between words to | pass tests... | FiberBundle wrote: | They do give an example of a question, in which the model | chose an incorrect answer in the adversarial setting: | | "The condition of the air outdoors at a certain time ofday is | known as (A) friction (B) light (C) force | (D)weather[correct](Q) joule (R) gradient[selected](S)trench | (T) add heat" | | I assume this might be characteristic for other questions as | well, although I don't know anything about the Regents | Science Exam and whether there are multiple questions about | closely related topics. | jacobwilliamroy wrote: | Most multiple choice math problems can be completely | circumvented by simply finding the digital root of the | expression in the problem. I was surprised to find this to be | true, even on college entrance exams. | flir wrote: | Seriously? Can you explain the mechanism? 'cos that sounds | like - well, numerology, to be honest. | 0-_-0 wrote: | Adversiarial training means that they specifically search for | answers that the network would misunderstand. If this only | leads to a 40 percent loss (the network still answers correctly | 50% of the time) I still consider that remarkable. | | Choosing the best from 8 answers where some of them were | adversarially derived should be equivalent to choosing the best | from all possible answers, of which there could be tens of | thousands. How would a human do in that situation? | | Although the kinds of mistakes the network makes seem like a | mistake you would never do (i.e. you wouldn't call the | condition of air outdoors _gradient_ ), the opposite could also | be true, that it would easily answer questions you would have a | problem with. | _emacsomancer_ wrote: | A bit I found rather strange, on the language-side: | | > This is to say the patterns in language use mirror the patterns | of how things are(1). | | > (1)- Strictly of course only the patterns in true sentences | mirror, or are isomorphic to, the arrangement of the world, but | most sentences people utter are at least approximately true. | | Presumably this should really say something like "...but most | sentences people utter are at least approximately true _of their | mental representation of the world_. " | rvense wrote: | I think the point about language being a model of reality was | interesting. I have an MA in linguistics including some NLP from | about a decade ago and was looking at a career in academic NLP. I | ultimately left to become a programmer because (of life | circumstances and the fact that) I didn't see much of a future | for the field, precisely because it was ignoring the (to me) | obvious issues of written language bias, ignorance of multi- | modality and situatedness etc. that are brought up in this post. | | All of these results are very interesting, but I'm not really | feeling like we've been proved wrong yet. There is a big question | of scalability here, at least as far as the goal of AGI goes, | which the author also admits: | | > Of course everyday language stands in a woolier relation to | sheep, pine cones, desire and quarks than the formal language of | chess moves stands in relation to chess moves, and the patterns | are far more complex. Modality, uncertainty, vagueness and other | complexities enter but the isomorphism between world and language | is there, even if inexact. | | This woolly relation between language and reality is well-known. | It has been studied in various ways in linguistics and the | philosophy of language, for instance by Frege and not least | Foucault and everything after. I also think many modern | linguistic schools take a very different view of "uncertainty and | vagueness" than I sense in the author here, but they are | obviously writing for non-specialist audience and trying not to | dwell on this subject here. | | My point is, when making and evaluating these NLP methods and the | tools they are used to construct, it is extremely important to | understand that language models social realities rather than any | single physical one. It seems to me all too easy, coming from | formal grammar or pure stats or computer science, to rush into | these things with naive assumptions about what words are or how | they mean things to people. I dread to think what will happen if | we base our future society on tools made in that way. | skybrian wrote: | These tools seem to be getting pretty good at fiction. In | particular, playing around with AI Dungeon, it doesn't believe | anything, or alternately you could say it believes everything | at once. It's similar to a library that contains books by | different authors, some fact, some fiction. Contradictions | don't matter. Only local consistency matters, and even then, | not that much. | | Unfortunately, many people want to believe that they are being | understood. But, on the bright side, this stuff seems | artistically useful? Entertainment is a big business. | rvense wrote: | > many people want to believe that they are being understood | | Not only that, they also want to understand. I read some | computer poetry a few years ago that _worked_. The computer | did not intend any meaning, but I found it anyway. | | And of course I did. This is how language works. We assume | that people (and by extension, the texts they write) follow a | set of rules which structures how we engage with them in a | successful manner. Paul Grice formulated this as the | cooperative principle in his study of everyday language, but | I believe it is the exact thing that is at play when people | meet NLP: | | https://en.wikipedia.org/wiki/Cooperative_principle | joe_the_user wrote: | _My point is, when making and evaluating these NLP methods and | the tools they are used to construct, it is extremely important | to understand that language models social realities rather than | any single physical one._ | | I'd claim actual involves language both a general shared | reality and quite concrete and specific discussions of single | physical and logic facts/models. Some portion of language | certainl looks mostly like a stream of associations. But within | it is also are references to physical reality and world model | and the two are complexly intertwined (making logical sense is | akin to a "parity check" - you can for a while without it but | then you have to look at the whole to get it). I believe one | can see this in a GPT paragraph, where the first two sentences | seem intelligence, well written but the third sentence | contradicts the first two sentences sufficiently one's mind | isn't sure "what's being said" (and even here, our "need for | logic" might be loose enough that we only notice the | "senselessness" after the third logical error). | rvense wrote: | In human language I think physical reality is always a few | layers out. Language is social, first and foremost, and | naming something is not neutral. We can hardly refer to | single objects directly, we mostly do it through their class | membership, which always, always include a whole range of | associations, reductions, metaphor, etc. that are cultural. | joe_the_user wrote: | _In human language I think physical reality is always a few | layers out._ | | Yes, the point is you can neglect physical (and logic) | reality for a while in a stream of sentence. But not | forever and that's current NLP's output has it's limits. | Just a simple level, a stream of glittering adjectives can | be added to a thing and just add up to "desirable" unless | those adjectives "go over threshold" and contradict each | other and then the description can get tagged by the brain | as a bit senseless. | bloaf wrote: | I know that there are allegedly NLP algorithms for generating | things like articles about sports games. I assume they have | something more like the type signature (timeline of events) -> | (narrative about said events) | | What this article is about is more (question/prompt) -> | (answer/continuation of prompt) | | Does anyone know if there is progress in the (timeline of events) | -> (narrative about said events) space? | YeGoblynQueenne wrote: | >> The Winograd schema test was originally intended to be a more | rigorous replacement for the Turing test, because it seems to | require deep knowledge of how things fit together in the world, | and the ability to reason about that knowledge in a linguistic | context. Recent advances in NLP have allowed computers to achieve | near human scores:(https://gluebenchmark.com/leaderboard/). | | The "Winograd schema" in Glue/SuperGlue refers to the Winograd- | NLI benchmark which is simplified with respect to the original | Winograd Schema Challenge [1], on which the state-of-the-art | still significantly lags human performance: | | _The Winograd Schema Challenge is a dataset for common sense | reasoning. It employs Winograd Schema questions that require the | resolution of anaphora: the system must identify the antecedent | of an ambiguous pronoun in a statement. Models are evaluated | based on accuracy._ | | _WNLI is a relaxation of the Winograd Schema Challenge proposed | as part of the GLUE benchmark and a conversion to the natural | language inference (NLI) format. The task is to predict if the | sentence with the pronoun substituted is entailed by the original | sentence. While the training set is balanced between two classes | (entailment and not entailment), the test set is imbalanced | between them (35% entailment, 65% not entailment). The majority | baseline is thus 65%, while for the Winograd Schema Challenge it | is 50% (Liu et al., 2017). The latter is more challenging._ | | https://nlpprogress.com/english/common_sense.html | | There is also a more recent adversarial version of the Winograd | Schema Challenge called Winogrande. I can't say I'm on top of the | various results and so I don't know the state of the art, but | it's not yet "near human", not without caveats (for example, | wikipedia reports 70% accuracy on 70 problems manually selected | from the originoal WSC). | | __________ | | [1] https://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4492 | jvanderbot wrote: | I'd go one step further: Humans themselves don't understand | anything, we are just good at constructing logical-sounding | (plausible, testable) stories about things. These are mental | models, and it's the only way we can make reasonable predictions | to within error tolerances of our day-to-day experience, but they | are flat-out lies and stories we tell ourselves not based on a | high-fidelity understanding of anything. | | Rumination, deep thinking, etc is simply actor-critic learning of | these mental models for story-telling. | runT1ME wrote: | Do current NLP systems understand arithmetic, and can the do it | with unfamiliar numbers they've never seen on? If not, I'd | think that your theory is demonstratably false, as a child can | extrapolate mathematical axioms from just a few example | problems, whereas NLP models are not able to do so. | glenstein wrote: | >Do current NLP systems understand arithmetic, and can the do | it with unfamiliar numbers they've never seen on? | | I don't know if that question is rhetorical or not, but GPT-3 | can do basic math for problems it has not been directly | trained on, and there's been a fair amount of debate, | including right here at hn, about what the takeaway is | supposed to be. | YeGoblynQueenne wrote: | GPT-3 can't do arithmetic very well at all. There is a big, | fat, extraordinary claim that it can in the GPT-3 paper but | it's only based on perfect accuracy on two-digit addition | and subtraction, ~90% accuracy on three digit addition and | subtraction and ... around 20% accuracy on addition and | substraction between from three to five digits and | multiplication from between two measly digits. Note: no | division at all and no arithmetic with more than five | digits. And very poor testing to ensure that the solved | problems don't just happen to be in the model's training | dataset to begin with, which is the simplest explanation of | the reported results given that the arithmetic problems | GPT-3 solves correctly are the ones that are most likely to | be found in a coprus of natural language (i.e. two- and | three- digit addition and subtraction). | | tl;dr, GPT-3 can't do basic math for problems it has not | been directly trained on. | | ____________ | | [1] https://arxiv.org/abs/2005.14165 | | See section 3.9.1 and Figure 3.10. There is an additional | category of problems of combinations of addition, | subtraction and multiplication between three single-digit | numbers. Performance is poor. | avmich wrote: | >but it's only based on perfect accuracy on two-digit | addition and subtraction, ~90% accuracy on three digit | addition and subtraction and ... | | Which grade of school it corresponds to? It would be | astonishing to have an AI system of this capability level | mere few years ago. | YeGoblynQueenne wrote: | Performing arithmetic per se is not impressive. A | calculator can do it and the rules of arithmetic are not | so complex that they can't be hand-coded, as they are, | routinely. The extraordinary claim in the GPT-3 paper is | that a language model is capable of performing arithmetic | operations, rather than simply memorising their results | [1]. Language models compute the probabilities of a token | to follow from a sequence of tokens and in particular | have no known ability to perform any arithmetic, so if | GPT-3, which is a language model, were capable of doing | something that it is not designed to do, then that would | be very interesting indeed. Unfortunately, such an | extraordinary claim is backed up with very poor evidence, | and so amounts to nothing more than invoking magick. | | __________ | | [1] From the paper linked above: | | _In addition, inspection of incorrect answers reveals | that the model often makes mistakes such as not carrying | a "1", suggesting it is actually attempting to perform | the relevant computation rather than memorizing a table._ | | Now that I re-read this I'm struck by the extent to which | the authors are willing to pull their results this way | and that to force their preferred interpretation on them. | Their model answers two-digit addition problems | correctly? It's learned addition! Their model is making | mistakes? It's because it's actually trying to compute | the answer and failing! The much simpler explanation, | that their model has memorised solutions to a few | problems but there are many more it hasn't even seen, | seems to be assigned a very, very low prior. Who cares | about such a boring interpration? Language models are few | shot learners! (once trained with a few billion examples | that is). | gwern wrote: | How ironic you claim that the paper overstates it, when | you very carefully leave out every single qualifier about | BPEs and how GPT-3's arithmetic improves massively when | numbers are reformatted to avoid BPE problems. Pot, | kettle. | ximeng wrote: | "Byte pair encoding": more discussion at | https://www.gwern.net/GPT-3#bpes | Davidzheng wrote: | I question the premise here. Can young children (say <5) who | don't know arithmetic at all learn the axioms from just a | bunch of examples? This isn't how children are taught | arithmetic; they are taught the rules/algorithm not just a | bunch of data-points. Do you have a source on the claim? | joe_the_user wrote: | The funny thing with GPT-3 is that it comes up with sentences | that seem to make correct deductions using arithmetic or | other submodels of reality but it will blithely generate | further sentences that contradict these apparent | understandings. It's impressive and incoherent all at the | same time. | cscurmudgeon wrote: | But you claim to understand right in this comment how humans | understand other things. | | Isn't that self-contradictory? | p1esk wrote: | He's just not very good at constructing logical-sounding | (plausible, testable) stories about things :) | xdavidliu wrote: | no he did not claim so. His claim is just as much a lie as | all other claims. Lies can be valuable, even if they are not | strictly true. | soulofmischief wrote: | Mental models are not lies. | | "The car is red" is not a lie just because I didn't phrase it | internally as "The car reflects photons of a frequency of | around 700nm". | | We have to be able to simplify and internalize simplified | models in order to make any sense of anything. It's the same | reason your eyes only have a focal point in the dead center: | attention requires vast amounts of processing power. | | To reiterate, a simplification is not a lie. Especially not a | flat-out lie. | Turing_Machine wrote: | Yes. Bridges are designed using old-school Newtonian | mechanics, without (in general) considering relativistic or | quantum effects at all, but they stand up nonetheless. | | That wouldn't be the case if they were built based on "flat- | out lies". | | Edit: the sticking point here is the use of the word "lie", I | think. | | A lie is a falsehood told with intent to deceive. | | Thus, novels are not "lies", despite being works of the | imagination. Everyone knows that the people in Moby-Dick or | Wuthering Heights were made up. | | Neither is a simplified model a "lie" if it is close enough. | | All engineers know that the steel and concrete components of | a bridge have quantum and relativistic effects going on | inside them, so ignoring those effects isn't a "lie" in any | meaningful sense. | | It just doesn't matter for the purpose at hand. | bananaface wrote: | I get what you're saying, but I think this is a side-effect of | saving thinking in abstractions. We use abstractions to plug | holes to avoid having to drill down on every concept we're | exposed to. | | That can look like a surface-level linguistic understanding, | but it's not, it's a surface-level abstraction. It's not | arbitrary, it has structure, and when you flesh it out you're | fleshing it out with actual abstract structure, _not_ just | painting over the gaps with arbitrary language. | skybrian wrote: | Darn, based on the title, I was hoping for an overview of recent | research. | | Lots of people are having fun playing with GPT-3 or AI Dungeon, | myself included, but it seems like there is other interesting | research going on like the REALM paper [1], [2]. What should I be | reading? Why aren't people talking about REALM more? I'm no | expert, but it seems like keeping the knowledge base outside the | language model has a lot going for it? | | [1] https://ai.googleblog.com/2020/08/realm-integrating- | retrieva... [2] https://arxiv.org/abs/2002.08909 | plutonorm wrote: | I think the standard responses you see about gpt-3 - | | Hur durr, can't have intelligence from statistics. Hurt dur, | Chinese room. Hur durr, doesn't understand semantics, it's just | dumb. Ect ect ect ad nausea. | | They, go to show not how dumb gpt-3 is but how unthinking most | people are. Just pulling from a bag of generally agreed upon | notions, matching them up in simplistic ways and regurgitating. ___________________________________________________________________ (page generated 2020-08-16 23:00 UTC)