[HN Gopher] Recent Advances in Natural Language Processing
       ___________________________________________________________________
        
       Recent Advances in Natural Language Processing
        
       Author : saadalem
       Score  : 139 points
       Date   : 2020-08-16 18:03 UTC (4 hours ago)
        
 (HTM) web link (deponysum.com)
 (TXT) w3m dump (deponysum.com)
        
       | YeGoblynQueenne wrote:
       | Separate comment because separate point.
       | 
       | To my mind, all this benchmark-beating is just pointless faffing
       | about and not a little frustrating to see that this is where the
       | current state of the art is at. NLP systems keep getting better
       | and better at beating benchmarks and yet we still don't have
       | artificial agents that can hold an intelligent conversation, even
       | in a restricted context.
       | 
       | Or rather, we had such programs since the late 1960's, with no
       | one else's than Terry Winograd's own doctoral thesis system,
       | SHRDLU [1], but it seems that in the race to build powerful
       | classifiers, we forgot how to build systems that can interact
       | intelligently with a human.
       | 
       | Here's a small excerpt from SHRDLU (taken from the wikipedia page
       | on SHRDLU, originally in an appendix in Terry Winograd's thesis):
       | Person: Is there a large block behind a pyramid?
       | Computer: YES, THREE OF THEM: A LARGE RED ONE, A LARGE GREEN
       | CUBE, AND THE BLUE ONE.         Person: Put a small one onto the
       | green cube which supports a pyramid.         Computer: OK.
       | Person: Put the littlest pyramid on top of it.         Computer:
       | OK.         Person: Does the shortest thing the tallest pyramid's
       | support supports support anything green?         Computer: YES,
       | THE GREEN PYRAMID.         Person: What color is it?
       | Computer: BY "IT", I ASSUME YOU MEAN THE SHORTEST THING THE
       | TALLEST PYRAMID'S SUPPORT SUPPORTS.         Computer: RED
       | 
       | Note that this is not just mindless text generation. The
       | conversation is held in the context of a "blocks world" where a
       | robot with a single hand and eye ("Computer" in the excerpt
       | above) moves blocks of various shapes and colours around, as
       | directed by a human user in free-form natural language. When the
       | Computer says "OK" after it's directed to "put the littlest
       | pyramid on top of it" it's because it really has grabbed the
       | smallest pyramid in the blocks world and placed it on top of the
       | small block in an earlier sentence, as the Person asked. The
       | program has a memory module to keep track of what ellipses like
       | "it", "one" etc refer to throughout the conversation.
       | 
       | SHRDLU was a traditional program hand-crafted by a single PhD
       | student- no machine learning, no statistical techniques. It
       | included, among other things, a context-free grammar (!) of
       | natural English and a planner (to control the robot's hand) all
       | written in Lisp and PLANNER. In its limited domain, it was
       | smarter than anything ever created with statisical NLP methods.
       | 
       | ______________________
       | 
       | [1] https://en.wikipedia.org/wiki/SHRDLU
        
         | quotemstr wrote:
         | Why is it surprising that a CFG can approximate a subset of
         | English grammar?
        
           | trott wrote:
           | > Why is it surprising that a CFG can approximate a subset of
           | English grammar?
           | 
           | "Colorless green ideas sleep furiously" is a famous example
           | of a sentence that is grammatical, but meaningless. The goal
           | of SHRDLU was far more ambitious than approximating English
           | grammar.
        
         | liuliu wrote:
         | We knew hand-crafted program in limited domains can work for
         | NLP, computer vision and voice recognition long time ago. The
         | challenge is always, that the limited domain can be extremely
         | limited, and to get anything practically interesting requires a
         | lot of human involvement to encode the world (expert system).
         | 
         | Statistical methods traded that. With data, some labelled, some
         | unlabelled and some weakly-labelled, we can generate these
         | models with much more efficient human involvement (refine the
         | statistical models and labelling data).
         | 
         | I honestly don't see the frustration. Yes, current NLP model
         | may not be the "intelligent agent" everyone looking for yet to
         | any extent. But claiming it is all faffing and no better than
         | 1960s is quite a stretch.
        
           | joe_the_user wrote:
           | The thing that qualifies as "faffing" in my opinion isn't the
           | statistical NLP programs, which "are what they are" but
           | _claims of progress based primarily on benchmarks_ , as
           | YeGoblynQueene rightly states.
           | 
           | And "limited domain" is relative. A program that gets many
           | aspects of language right talking about a small world might
           | be said to have a larger domain than a program that outputs
           | stream of semi-plausible, semi-gibberish involving the whole
           | of English language. Which again isn't saying modern NLP is
           | nothing but rather we should a somewhat better to talk of it
           | (and machine learning generally) than "hitting benchmarks".
        
           | YeGoblynQueenne wrote:
           | >> We knew hand-crafted program in limited domains can work
           | for NLP, computer vision and voice recognition long time ago.
           | 
           | Yes, we did. So- where are the natural language interfaces by
           | which we can communicate with artificial agents in such
           | limited domains? Where are the applications, today, that
           | exhibit behaviour as seemingly intelligent as SHRDLU in the
           | '60s? I mean, have you personally seen and interacted with
           | one? Can you show me an example of such a modern system?
           | Edit: Note again that SHRDLU was created by a single PhD
           | student with all the resources of ... a single PhD student.
           | It's no stretch to imagine that an entity of the size of
           | Google or Facebook could achieve something considerably more
           | useful, still in a limited domain. But this has never even
           | been attempted.
           | 
           | Yes, it is faffing about. Basically NLP gave up on figuring
           | out how language works and switched to a massive attempt to
           | model large datasets evaluated by contrived benchmarks that
           | serve no other purpose than to show how well modern
           | techniques can model large datasets.
        
             | joe_the_user wrote:
             | Good question!
             | 
             | As you say SHRDLU was the best program of that time.
             | 
             | I've read Winnograd's book and looked at the SHRDLU source
             | code (not that I'm much of a lisp hacker but I got some
             | time). It's built-on a parser and planner (logic program,
             | pre-prolog). And it's built the old-fashioned-way,
             | rewriting the source code with the parser rewriting input
             | and then re-rerunning and other harry things. I think this
             | achieves the parsing of idiomatic constructions on a high
             | level. I believe the "raw lisp" of the day was both
             | incredibly powerful since you could do anything and
             | incredibly hard to scale ... you could do anything.
             | 
             | Winnograd wrote it himself but I think that's because he
             | had to write it himself. In a sense, a programmer is always
             | most productive when they are writing something by
             | themselves because they don't have to explain anything they
             | are doing (until the fix-ups and complexity overwhelm them
             | but the base fact remains). And in the case of SHRDLU,
             | Winnograd would have had an especially hard time explaining
             | what he was doing. I mean, there was a theory behind - I've
             | read Winnograd's book. But there was lots and lots of
             | brilliant spaghetti/glue-code to actually make it work,
             | code that jumped module and function boundaries. And the
             | final had a reputation of being very buggy, sometimes it
             | worked and sometimes it didn't. And Winnograd was a
             | brilliant programmer and widely read in linguistics and
             | other fields.
             | 
             | The software industry is an industry. No company wants to
             | depend on the brilliance of it's workers. A company needs
             | to produce based on some sort of average and a person
             | working with just average skills isn't going to do SHRDLU.
             | 
             | So, yeah, I think that's why actual commercial programs
             | never reached the level of SHRDLU
        
               | _emacsomancer_ wrote:
               | So what you're saying is that we really shouldn't be
               | relying on industry for good AI?
        
               | joe_the_user wrote:
               | Well, an "industrial model" is a model of a factory and
               | it seems unlikely you could product a fully-functional,
               | from scratch GFAI program like SHRDLU in something like a
               | factory.
               | 
               | Perhaps one could create a different kind of enterprise
               | for this but it's kind of an open problem.
        
               | YeGoblynQueenne wrote:
               | Winograd made changes to the Lisp assembly to make SHRDLU
               | work and he never back-ported them to his SHRDLU code,
               | but his original version worked fine and was stable. The
               | experience of breaking refers to later versions that were
               | expanded by his doctoral students and others and to ports
               | to java and I think C. The original code was written in
               | 1969 but inevitably suffered from bit rot in the
               | intervening years so it's true that there is no stable
               | version today that can reliably do what Winograd's
               | original code could do... but Winograd's original code
               | was rock solid, according to the people who saw it
               | working.
               | 
               | There's some information about all that here:
               | 
               | http://maf.directory/misc/shrdlu.html
               | 
               |  _[Dave McDonald] (davidmcdonald@alum.mit.edu) was Terry
               | Winograd 's first research student at MIT. Dave reports
               | rewriting "a lot" of SHRDLU ("a combination of clean up
               | and a couple of new ideas") along with Andee Rubin, Stu
               | Card, and Jeff Hill. Some of Dave's interesting
               | recollections are: "In the rush to get [SHRDLU] ready for
               | his thesis defense [Terry] made some direct patches to
               | the Lisp assembly code and never back propagated them to
               | his Lisp source... We kept around the very program image
               | that Terry constructed and used it whenever we could. As
               | an image, [SHRDLU] couldn't keep up with the periodic
               | changes to the ITS, and gradually more and more bit rot
               | set in. One of the last times we used it we only got it
               | to display a couple of lines. In the early days... that
               | original image ran like a top and never broke. Our
               | rewrite was equally so... The version we assembled circa
               | 1972/1973 was utterly robust... Certainly a couple of
               | dozen [copies of SHRDLU were distributed]. Somewhere in
               | my basement is a file with all the request letters...
               | I've got hard copy of all of the original that was Lisp
               | source and of all our rewrites... SHRDLU was a special
               | program. Even today its parser would be competitive as an
               | architecture. For a recursive descent algorithm it had
               | some clever means of jumping to anticipated alternative
               | analyses rather than doing a standard backup. It defined
               | the whole notion of procedural semantics (though Bill
               | Woods tends to get the credit), and its grammar was the
               | first instance of Systemic Functional Linguistics applied
               | to language understanding and quite well done." Dave
               | believes the hardest part of getting a complete SHRDLU to
               | run again will be to fix the code in MicroPlanner since
               | "the original MicroPlanner could not be maintained
               | because it had hardwired some direct pointers into the
               | state of ITS (as actual numbers!) and these 'magic
               | numbers' were impossible to recreate circa 1977 when we
               | approached Gerry Sussman about rewriting MicroPlanner in
               | Conniver." _
               | 
               | Regarding the advantage of a lone programmer- that's
               | real, but large teams have built successful software
               | projects before, very often. I guess you don't even need
               | a big team, just a dozen people who all know what they're
               | doing. That shouldn't be hard to put together given FANG-
               | level resources. Hell, that shouldn't be hard to do given
               | a pool of doctoral students from a top university... but
               | nowadays even AI PhD students would have no idea how to
               | recreate something like SHRDLU.
               | 
               | Edit: I got interested in SHRDLU recently (hence the
               | comments in this thread) and I had a look at Winograd's
               | thesis to see if there was any chance to recreate it. The
               | article above includes a link to a bunch of flocharts of
               | SHRDLU'S CFG but even deciphering those hand-drawn and
               | occasionally vague plans would take a month or two of
               | doing nothing else, something for which I absolutely do
               | not have the time. And that's only the grammar - the rest
               | of the program woudl have to be reverse-engineered from
               | Winograd's thesis, examples of output from the original
               | code or later clones, etc. That's a project for a team of
               | digital archeologists, not software developers.
        
               | joe_the_user wrote:
               | _"... his original version worked fine and was stable.
               | The experience of breaking refers to later versions that
               | were expanded by his doctoral students and others and to
               | ports to java and I think C. "_
               | 
               | I can believe this but I think your details overall
               | reinforce my points above.
               | 
               |  _For a recursive descent algorithm it had some clever
               | means of jumping to anticipated alternative analyses
               | rather than doing a standard backup._
               | 
               | Yeah, fabulous but extremely hard to extend or reproduce.
               | The aim of companies was to _scale_ something like this.
               | It seems like the fundamental problem was only a few
               | really smart people could programs to this level and no
               | one could take them beyond it (the saw that a person half
               | to be twice as smart to debug a program as to write comes
               | in, etc).
        
             | lacker wrote:
             | The modern natural language interfaces with limited domains
             | are Alexa and Siri. Yes, they're limited. But they are far
             | more impressive and useful than SHRDLU.
        
               | YeGoblynQueenne wrote:
               | Alexa and Siri (and friends) are competely incapable of
               | interacting with a user with the precision of SHRDLU. You
               | can ask them to retrieve information from a Google search
               | but e.g. they have no memory of the anaphora in earlier
               | sentences in the same conversation. If you say "it" a few
               | times to refer to different entities they completely lose
               | the plot.
               | 
               | They are also completely incapable of reasoning about
               | their environment, not least because they don't have any
               | concept of an "environment" - which was represented by
               | the planner and the PROGRAMMAR language in SHRDLU.
               | 
               | And of course, systems like Siri and Alexa can't do
               | anything even remotely like correctly disambiguating the
               | "support support supports" show-off sentence in the
               | excerpt above. Not even close.
               | 
               | Edit: Sorry, there's a misunderstanding about "limited
               | domain" in your comment. Alexa and Siri don't operate in
               | a limited domain. A "limited domain" would be something
               | like being in charge of your music collection and nothing
               | else. Alexa annd Siri etc are supposed to be general-use
               | agents. I mean, they are, it's just that they suck at
               | it... and would still suck in a limited domain also.
        
             | rvense wrote:
             | My impression is that the systems never progressed much
             | after SHRDLU even though there were attempts at larger
             | scale "expert systems". But adding more advanced rules and
             | patterns proved extremely difficult and did not always have
             | the expected effect of making the systems more general.
             | 
             | There was the whole AI winter thing, of course, but that
             | was as much a result of things not living up to the hype as
             | a cause.
        
               | YeGoblynQueenne wrote:
               | This doesn't directly address your question, though it
               | perhaps can give you some pointers if you want to read
               | about the history of AI and the AI winter of the '80s,
               | but in a way SHRDLU featured prominently in the AI
               | winter, at least in Europe, particularly in the UK.
               | 
               | So, in the UK at least the AI winter was precipitated by
               | the Lighthill Report, a report published in 1973,
               | compiled by a Sir James Lighthill and commissioned by the
               | British Research Council, i.e. the people who held all
               | the research money at the time in the UK. The report was
               | furiously damning of AI research of the time, mostly
               | because of grave misunderstandings e.g. with respect to
               | combinatorial explosion and basically accused researchers
               | of, well, faffing about and not doing anything useful
               | with their grant money. The only exception to this was
               | SHRDLU, that Lighthill praised as an example of how AI
               | should be done.
               | 
               | Anyway, if you have time, you can watch the televised
               | debate between Lighthill and three luminaries of AI, John
               | McCarthy (the man who named the field, created Lisp and
               | did a few other notable things), Donald Michie (known for
               | his MENACE reinforcement-learning program running on...
               | matchboxes, and basically setting up AI research in the
               | UK) and Richard Gregorie (a cognitive scientist from the
               | US for whom I confess I don't know much). The (short)
               | wikipedia article on the Lighthill Report has links to
               | all the youtube videos:
               | 
               | https://en.wikipedia.org/wiki/Lighthill_report
               | 
               | It's interesting to see in the videos the demonstration
               | of the Freddy robot from Edinburgh, that was capable of
               | constructing objects by detecting their components with
               | early machine vision techniques. In the 1960's.
               | Incidentally:
               | 
               |  _Even with today 's knowledge, methodology, software
               | tools, and so on, getting a robot to do this kind of
               | thing would be a fairly complex and ambitious project._
               | 
               | http://www.aiai.ed.ac.uk/project/freddy/
               | 
               | The above was written sometime in the '90s, I reckon but
               | it is still true today. Unfortunately, Lighthill's report
               | killed the budding robotics research sector in the UK and
               | it has literally never recovered since. This is typical
               | of the AI winter of the '80s. Promising avenues of
               | research were abandoned not because of any scientific
               | reasons, as is sometimes assumed ("expert systems didn't
               | scale" etc) but, rather, because pencil pushers in charge
               | of disbursing public money didn't get the science.
               | 
               | Edit: A couple more pointers. John McCarthy's review of
               | the Lighthill Report:
               | 
               | http://www-
               | formal.stanford.edu/jmc/reviews/lighthill/lighthi...
               | 
               | An article on the AI winter of the '80s by the editor of
               | IEEE Intelligent Systems:
               | 
               | https://www.computer.org/csdl/magazine/ex/2008/02/mex2008
               | 020...
        
               | rvense wrote:
               | Interesting, thank you for the clarifications.
        
           | ericbarrett wrote:
           | I read grandparent's post as saying that despite all the
           | research and untold amounts of compute power poured into NLP
           | over the decades, its practitioners have yet to address the
           | original real-world goals that led us to study it in the
           | first place. Missing the forest for the trees, if you will.
           | 
           | (I don't assert that it has hasn't. I know just enough about
           | the topic to see how little I know. But it seems, from the
           | outside, a valid criticism, and not one unique to the field.)
        
           | tmalsburg2 wrote:
           | Since this post is receiving downvotes, I'd like to know
           | which part of the HN guidelines it is violating.
        
       | mqus wrote:
       | Not a single mention if this is only applicable to english or to
       | other natural languages. Afaict this mostly lists advancements in
       | ELP (english language processing), Especially the Winograd schema
       | (or ar least the given example) seems to be heavily focused on
       | english.
       | 
       | Relevant article for this problem:
       | https://news.ycombinator.com/item?id=24026511
        
         | MiroF wrote:
         | But there's no reason the models are english specific...
        
           | woodson wrote:
           | In a certain way, they could be, simply because their
           | structure may work better on the type of language that
           | English is. It may not work as well for languages that
           | exhibit other grammatical patterns (e.g. morphologically rich
           | languages, or those that exhibit superficially more flexible
           | word order which nonetheless conveys information about
           | topic/focus).
        
             | dvduval wrote:
             | Some languages like Chinese have a large corpus of
             | information that can be studied, and there has been a
             | continuous effort to standardize the language. it wouldn't
             | surprise me if it turns out to be easier than English. I
             | would expect it would be more difficult when you're talking
             | about languages that have regional differences in usage of
             | words, perhaps northern Vietnamese vs. Southern Vietnamese,
             | or the Spanish of Spain versus the Spanish of Mexico.
        
               | east2west wrote:
               | Chinese has large regional differences, too. Cantonese
               | has different grammar, not just word usage. Modern
               | Chinese grammar is somewhat similar to English while
               | Cantonese follows ancient Chinese, which places verb to
               | the end of sentence.
        
               | dvduval wrote:
               | True, and Mandarin would be the better word to use here.
               | Mandarin and Cantonese are for all intents and purposes
               | separate languages. They are more different than English
               | and Spanish.
        
               | yorwba wrote:
               | If we're going down that path, then it should more
               | specifically be Standard Mandarin, since Standard
               | Mandarin and e.g. Southwestern Mandarin are only
               | partially mutually intelligible like French and Spanish
               | are partially mutually intelligible.
        
               | DonaldFisk wrote:
               | Cantonese is, like English and Mandarin, SVO.
               | 
               | E.g. I bought this book: ngoh(S) maaihjo(V) nibun syu(O).
               | 
               | Contrast with Japanese: watashi wa(S) kono hon o(O)
               | kaimashita(V).
               | 
               | Differences from Mandarin Grammar are listed here: https:
               | //en.wikipedia.org/wiki/Cantonese_grammar#Differences_...
               | 
               | Classical Chinese is also SVO:
               | https://en.wikipedia.org/wiki/Classical_Chinese_grammar
        
               | kingkawn wrote:
               | All language has significant regional difference if it
               | covers enough regions
        
           | akerro wrote:
           | I'm 100% sure NLP will do the same mistake with non-English
           | language as face recognition did with faces of black people.
           | 
           | I'm taking bets.
        
           | m0zg wrote:
           | For some of them, there is. Russian (my mother tongue), for
           | example, has the kind of morphology that makes linguists'
           | hair stand on end. Also, verbs and nouns are gendered (and
           | therefore they must agree, and you must know what everything
           | refers to at all times, often pretty far away from where the
           | noun was first mentioned), and there are declensions on
           | various parts of speech, and they must agree as well. And
           | words can be freely formed out of several roots, prefixes,
           | and suffixes. For any of this to be understood or generated,
           | your model has to be able to model all of that, which is much
           | harder than modeling English.
        
           | YeGoblynQueenne wrote:
           | Yes, there is. A "model" is a set of parameters optimisted by
           | some algorithm or system trained on the data in a specific
           | dataset. Thus, a language model trained on a dataset of
           | English language examples is only capable of representing
           | English language utterances, not, e.g. French, or Greek, or
           | Gujarati utterances. Diagrammatically:                 data
           | --> system --> model
           | 
           | What is not necessarily English-specific are the systems used
           | to train different language models, at least in theory. In
           | practice, systems are typically hand-crafted and fine-tuned
           | to specific datasets, to such a degree that most of the work
           | has to be done anew to train on a different dataset.
        
       | ascavalcante80 wrote:
       | NLP is great for many things, but, from my own experience as a
       | NLP developer, machines are not even close to understand human
       | language. They can interpret well some kind of written speech,
       | but they will struggle to grasp two humans speaking to each
       | other. The progress we are make on building chatbots and vocal
       | assistants is mainly due to the fact We are learning how to
       | speaking to the machines, and not the contrary.
        
       | p1esk wrote:
       | Note this is pre-GPT-3. In fact I expect GPT-4 will be where
       | interesting things start happening in NLP.
        
         | curiousgal wrote:
         | I honestly don't get where the big deal is with NLP. So far the
         | most useful application has been customer support chatbots and
         | those still don't rise to the level of having an actual human
         | that can understand the intricacies of your special request.
        
           | feral wrote:
           | I work on such a bot @Intercom.
           | 
           | A lot of support requests aren't actually intricate and
           | special: there's almost always loads of simple requests that
           | come in again and again.
           | 
           | When you ask a request like that and get the answer
           | instantly, that really is ML delivering value.
           | 
           | You mightn't think it, but a _lot_ of people work in customer
           | support, and spend a lot of time answering rote questions
           | again and again.
           | 
           | People talk about how much hype there is, or the chance of
           | another AI winter. Yes there's a ton of hype - but I think
           | they aren't considering the real value already being
           | delivered this time around.
           | 
           | Everyone is excited about GPT3, but there's already been
           | amazing progress in practical NLP already, over the last few
           | years.
        
           | ben_w wrote:
           | Current NLP is bad. Still useful (Google search increasingly
           | feels like it is doing NLP to change what I asked for into
           | what it thinks I meant) but bad. A hypothetical future
           | "perfect" NLP can demonstrate any skill that a human could
           | learn by reading, and computers can read so much more than
           | any given human.
        
             | plafl wrote:
             | Is reading enough to understand the real world without
             | direct experience of the real world? Is there any research
             | that tries to answer this question?
        
               | p1esk wrote:
               | _Is there any research that tries to answer this
               | question?_
               | 
               | That's the whole point of the experiment called GPT-3.
        
               | rvense wrote:
               | As of about ten years ago when I received my degree in
               | linguistics, I understood there were two schools on this
               | issue:
               | 
               | A: Of course not, let's do something else. B: What? You
               | put text in the maths and stuff comes out.
        
           | MiroF wrote:
           | That's just not even close to the only or most useful
           | application.
           | 
           | I use NLP and associates s2s techniques every day. I struggle
           | to see how so many people don't see the obvious benefits deep
           | inference is bringing to stuff all around them.
        
           | p1esk wrote:
           | _those still don 't rise to the level of having an actual
           | human_
           | 
           | What if they did? Do you see where the big deal is now?
        
           | claudeganon wrote:
           | This past week, I used this new edit distance library to
           | identify quasi-duplicates in a large dataset:
           | 
           | https://github.com/Bergvca/string_grouper
           | 
           | Saved me hours of work because all the Levenshtein
           | implementations are pretty slow and I'm going to need to
           | rerun the analysis as the dataset grows.
           | 
           | I don't know about consumer-facing tools, but NLP stuff has
           | helped me solve all kinds of tedious data problems at work.
        
           | thekyle wrote:
           | If a chatbot was as good as a human then would you notice it
           | was a chatbot?
        
       | narag wrote:
       | Has it happened that a "thought experiment" has become a real
       | experiment ever?
        
         | exo-pla-net wrote:
         | Not sure exactly what you mean, but Einstein developed
         | relativity largely through thought experiments. And relativity
         | has been verified by real experiments.
        
         | dane-pgp wrote:
         | Most historians think that this was actually a thought
         | experiment:
         | 
         | https://en.wikipedia.org/wiki/Galileo's_Leaning_Tower_of_Pis...
         | 
         | An equivalent experiment was famously carried out for real in
         | 1971 on the surface of the Moon.
        
         | simonh wrote:
         | In a sense every experiment starts as a thought experiment.
        
       | FiberBundle wrote:
       | I found the science exams results interesting and skimmed the
       | paper [1]. They report an accuracy of >90% on the questions. What
       | I found puzzling was that they have a section in the experimental
       | results part where they test the robustness of the results using
       | adverserial answer options, more specifically they used some
       | simple heuristic to choose 4 additional answer options from the
       | set of other questions which maximized 'confusion' for the model.
       | This resulted in a drop of more than 40 percentage points in the
       | accuracy of the model. I find this extremely puzzling, what do
       | these models actually learn? Clearly they don't actually learn
       | any scientific principles.
       | 
       | [1] https://arxiv.org/pdf/1909.01958.pdf
        
         | teej wrote:
         | That's the thing. Machine Learning is a misnomer, the models
         | don't "learn" anything about the domain they operate in. It's
         | just statistical inference.
         | 
         | A dog can learn to turn left or to turn right for treats. But
         | they don't understand the concept of "direction", their brain
         | isn't wired that way.
         | 
         | Machine learning models perform tricks for treats. The tricks
         | they do get more impressive by the day. But don't be deceived,
         | they aren't wired to gain knowledge.
        
           | 0-_-0 wrote:
           | Doesn't your brain derive knowledge through statistical
           | inference? How else? I can imagine the AGI of the future
           | (having perfected knowledge gathering through statistical
           | inference) some day saying: "Humans perform tricks for
           | treats. The tricks they do are impressive. But don't be
           | deceived, they aren't wired to gain knowledge."
        
             | teej wrote:
             | I didn't claim that the way brains work didn't include
             | statistical inference. That's why the dog metaphor works.
             | Both the dog and the machine learning model use statistical
             | inference to perform tricks.
             | 
             | Dogs haven't been to the moon, however. There's more to the
             | brain we don't understand.
        
           | plutonorm wrote:
           | My god if I hear this argument one more time I'm going to
           | pop. What on earth gives you the idea that you are something
           | more than statistical inference?
        
             | teej wrote:
             | Yes, it's widely believed that statistical inference is a
             | part of how the brain operates. But we have barely
             | scratched the surface in our understanding how the human
             | brain works.
             | 
             | Do you honestly believe statistical inference completely
             | explains a human's ability to learn?
        
         | wrs wrote:
         | I would be interested in hearing the results from _humans_
         | presented with adversarial answer options. You may say that a
         | machine learning correlations between words isn't really
         | learning science, but I wonder how many human students aren't
         | either, just pretty much learning correlations between words to
         | pass tests...
        
           | FiberBundle wrote:
           | They do give an example of a question, in which the model
           | chose an incorrect answer in the adversarial setting:
           | 
           | "The condition of the air outdoors at a certain time ofday is
           | known as (A) friction (B) light (C) force
           | (D)weather[correct](Q) joule (R) gradient[selected](S)trench
           | (T) add heat"
           | 
           | I assume this might be characteristic for other questions as
           | well, although I don't know anything about the Regents
           | Science Exam and whether there are multiple questions about
           | closely related topics.
        
           | jacobwilliamroy wrote:
           | Most multiple choice math problems can be completely
           | circumvented by simply finding the digital root of the
           | expression in the problem. I was surprised to find this to be
           | true, even on college entrance exams.
        
             | flir wrote:
             | Seriously? Can you explain the mechanism? 'cos that sounds
             | like - well, numerology, to be honest.
        
         | 0-_-0 wrote:
         | Adversiarial training means that they specifically search for
         | answers that the network would misunderstand. If this only
         | leads to a 40 percent loss (the network still answers correctly
         | 50% of the time) I still consider that remarkable.
         | 
         | Choosing the best from 8 answers where some of them were
         | adversarially derived should be equivalent to choosing the best
         | from all possible answers, of which there could be tens of
         | thousands. How would a human do in that situation?
         | 
         | Although the kinds of mistakes the network makes seem like a
         | mistake you would never do (i.e. you wouldn't call the
         | condition of air outdoors _gradient_ ), the opposite could also
         | be true, that it would easily answer questions you would have a
         | problem with.
        
       | _emacsomancer_ wrote:
       | A bit I found rather strange, on the language-side:
       | 
       | > This is to say the patterns in language use mirror the patterns
       | of how things are(1).
       | 
       | > (1)- Strictly of course only the patterns in true sentences
       | mirror, or are isomorphic to, the arrangement of the world, but
       | most sentences people utter are at least approximately true.
       | 
       | Presumably this should really say something like "...but most
       | sentences people utter are at least approximately true _of their
       | mental representation of the world_. "
        
       | rvense wrote:
       | I think the point about language being a model of reality was
       | interesting. I have an MA in linguistics including some NLP from
       | about a decade ago and was looking at a career in academic NLP. I
       | ultimately left to become a programmer because (of life
       | circumstances and the fact that) I didn't see much of a future
       | for the field, precisely because it was ignoring the (to me)
       | obvious issues of written language bias, ignorance of multi-
       | modality and situatedness etc. that are brought up in this post.
       | 
       | All of these results are very interesting, but I'm not really
       | feeling like we've been proved wrong yet. There is a big question
       | of scalability here, at least as far as the goal of AGI goes,
       | which the author also admits:
       | 
       | > Of course everyday language stands in a woolier relation to
       | sheep, pine cones, desire and quarks than the formal language of
       | chess moves stands in relation to chess moves, and the patterns
       | are far more complex. Modality, uncertainty, vagueness and other
       | complexities enter but the isomorphism between world and language
       | is there, even if inexact.
       | 
       | This woolly relation between language and reality is well-known.
       | It has been studied in various ways in linguistics and the
       | philosophy of language, for instance by Frege and not least
       | Foucault and everything after. I also think many modern
       | linguistic schools take a very different view of "uncertainty and
       | vagueness" than I sense in the author here, but they are
       | obviously writing for non-specialist audience and trying not to
       | dwell on this subject here.
       | 
       | My point is, when making and evaluating these NLP methods and the
       | tools they are used to construct, it is extremely important to
       | understand that language models social realities rather than any
       | single physical one. It seems to me all too easy, coming from
       | formal grammar or pure stats or computer science, to rush into
       | these things with naive assumptions about what words are or how
       | they mean things to people. I dread to think what will happen if
       | we base our future society on tools made in that way.
        
         | skybrian wrote:
         | These tools seem to be getting pretty good at fiction. In
         | particular, playing around with AI Dungeon, it doesn't believe
         | anything, or alternately you could say it believes everything
         | at once. It's similar to a library that contains books by
         | different authors, some fact, some fiction. Contradictions
         | don't matter. Only local consistency matters, and even then,
         | not that much.
         | 
         | Unfortunately, many people want to believe that they are being
         | understood. But, on the bright side, this stuff seems
         | artistically useful? Entertainment is a big business.
        
           | rvense wrote:
           | > many people want to believe that they are being understood
           | 
           | Not only that, they also want to understand. I read some
           | computer poetry a few years ago that _worked_. The computer
           | did not intend any meaning, but I found it anyway.
           | 
           | And of course I did. This is how language works. We assume
           | that people (and by extension, the texts they write) follow a
           | set of rules which structures how we engage with them in a
           | successful manner. Paul Grice formulated this as the
           | cooperative principle in his study of everyday language, but
           | I believe it is the exact thing that is at play when people
           | meet NLP:
           | 
           | https://en.wikipedia.org/wiki/Cooperative_principle
        
         | joe_the_user wrote:
         | _My point is, when making and evaluating these NLP methods and
         | the tools they are used to construct, it is extremely important
         | to understand that language models social realities rather than
         | any single physical one._
         | 
         | I'd claim actual involves language both a general shared
         | reality and quite concrete and specific discussions of single
         | physical and logic facts/models. Some portion of language
         | certainl looks mostly like a stream of associations. But within
         | it is also are references to physical reality and world model
         | and the two are complexly intertwined (making logical sense is
         | akin to a "parity check" - you can for a while without it but
         | then you have to look at the whole to get it). I believe one
         | can see this in a GPT paragraph, where the first two sentences
         | seem intelligence, well written but the third sentence
         | contradicts the first two sentences sufficiently one's mind
         | isn't sure "what's being said" (and even here, our "need for
         | logic" might be loose enough that we only notice the
         | "senselessness" after the third logical error).
        
           | rvense wrote:
           | In human language I think physical reality is always a few
           | layers out. Language is social, first and foremost, and
           | naming something is not neutral. We can hardly refer to
           | single objects directly, we mostly do it through their class
           | membership, which always, always include a whole range of
           | associations, reductions, metaphor, etc. that are cultural.
        
             | joe_the_user wrote:
             | _In human language I think physical reality is always a few
             | layers out._
             | 
             | Yes, the point is you can neglect physical (and logic)
             | reality for a while in a stream of sentence. But not
             | forever and that's current NLP's output has it's limits.
             | Just a simple level, a stream of glittering adjectives can
             | be added to a thing and just add up to "desirable" unless
             | those adjectives "go over threshold" and contradict each
             | other and then the description can get tagged by the brain
             | as a bit senseless.
        
       | bloaf wrote:
       | I know that there are allegedly NLP algorithms for generating
       | things like articles about sports games. I assume they have
       | something more like the type signature (timeline of events) ->
       | (narrative about said events)
       | 
       | What this article is about is more (question/prompt) ->
       | (answer/continuation of prompt)
       | 
       | Does anyone know if there is progress in the (timeline of events)
       | -> (narrative about said events) space?
        
       | YeGoblynQueenne wrote:
       | >> The Winograd schema test was originally intended to be a more
       | rigorous replacement for the Turing test, because it seems to
       | require deep knowledge of how things fit together in the world,
       | and the ability to reason about that knowledge in a linguistic
       | context. Recent advances in NLP have allowed computers to achieve
       | near human scores:(https://gluebenchmark.com/leaderboard/).
       | 
       | The "Winograd schema" in Glue/SuperGlue refers to the Winograd-
       | NLI benchmark which is simplified with respect to the original
       | Winograd Schema Challenge [1], on which the state-of-the-art
       | still significantly lags human performance:
       | 
       |  _The Winograd Schema Challenge is a dataset for common sense
       | reasoning. It employs Winograd Schema questions that require the
       | resolution of anaphora: the system must identify the antecedent
       | of an ambiguous pronoun in a statement. Models are evaluated
       | based on accuracy._
       | 
       |  _WNLI is a relaxation of the Winograd Schema Challenge proposed
       | as part of the GLUE benchmark and a conversion to the natural
       | language inference (NLI) format. The task is to predict if the
       | sentence with the pronoun substituted is entailed by the original
       | sentence. While the training set is balanced between two classes
       | (entailment and not entailment), the test set is imbalanced
       | between them (35% entailment, 65% not entailment). The majority
       | baseline is thus 65%, while for the Winograd Schema Challenge it
       | is 50% (Liu et al., 2017). The latter is more challenging._
       | 
       | https://nlpprogress.com/english/common_sense.html
       | 
       | There is also a more recent adversarial version of the Winograd
       | Schema Challenge called Winogrande. I can't say I'm on top of the
       | various results and so I don't know the state of the art, but
       | it's not yet "near human", not without caveats (for example,
       | wikipedia reports 70% accuracy on 70 problems manually selected
       | from the originoal WSC).
       | 
       | __________
       | 
       | [1] https://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4492
        
       | jvanderbot wrote:
       | I'd go one step further: Humans themselves don't understand
       | anything, we are just good at constructing logical-sounding
       | (plausible, testable) stories about things. These are mental
       | models, and it's the only way we can make reasonable predictions
       | to within error tolerances of our day-to-day experience, but they
       | are flat-out lies and stories we tell ourselves not based on a
       | high-fidelity understanding of anything.
       | 
       | Rumination, deep thinking, etc is simply actor-critic learning of
       | these mental models for story-telling.
        
         | runT1ME wrote:
         | Do current NLP systems understand arithmetic, and can the do it
         | with unfamiliar numbers they've never seen on? If not, I'd
         | think that your theory is demonstratably false, as a child can
         | extrapolate mathematical axioms from just a few example
         | problems, whereas NLP models are not able to do so.
        
           | glenstein wrote:
           | >Do current NLP systems understand arithmetic, and can the do
           | it with unfamiliar numbers they've never seen on?
           | 
           | I don't know if that question is rhetorical or not, but GPT-3
           | can do basic math for problems it has not been directly
           | trained on, and there's been a fair amount of debate,
           | including right here at hn, about what the takeaway is
           | supposed to be.
        
             | YeGoblynQueenne wrote:
             | GPT-3 can't do arithmetic very well at all. There is a big,
             | fat, extraordinary claim that it can in the GPT-3 paper but
             | it's only based on perfect accuracy on two-digit addition
             | and subtraction, ~90% accuracy on three digit addition and
             | subtraction and ... around 20% accuracy on addition and
             | substraction between from three to five digits and
             | multiplication from between two measly digits. Note: no
             | division at all and no arithmetic with more than five
             | digits. And very poor testing to ensure that the solved
             | problems don't just happen to be in the model's training
             | dataset to begin with, which is the simplest explanation of
             | the reported results given that the arithmetic problems
             | GPT-3 solves correctly are the ones that are most likely to
             | be found in a coprus of natural language (i.e. two- and
             | three- digit addition and subtraction).
             | 
             | tl;dr, GPT-3 can't do basic math for problems it has not
             | been directly trained on.
             | 
             | ____________
             | 
             | [1] https://arxiv.org/abs/2005.14165
             | 
             | See section 3.9.1 and Figure 3.10. There is an additional
             | category of problems of combinations of addition,
             | subtraction and multiplication between three single-digit
             | numbers. Performance is poor.
        
               | avmich wrote:
               | >but it's only based on perfect accuracy on two-digit
               | addition and subtraction, ~90% accuracy on three digit
               | addition and subtraction and ...
               | 
               | Which grade of school it corresponds to? It would be
               | astonishing to have an AI system of this capability level
               | mere few years ago.
        
               | YeGoblynQueenne wrote:
               | Performing arithmetic per se is not impressive. A
               | calculator can do it and the rules of arithmetic are not
               | so complex that they can't be hand-coded, as they are,
               | routinely. The extraordinary claim in the GPT-3 paper is
               | that a language model is capable of performing arithmetic
               | operations, rather than simply memorising their results
               | [1]. Language models compute the probabilities of a token
               | to follow from a sequence of tokens and in particular
               | have no known ability to perform any arithmetic, so if
               | GPT-3, which is a language model, were capable of doing
               | something that it is not designed to do, then that would
               | be very interesting indeed. Unfortunately, such an
               | extraordinary claim is backed up with very poor evidence,
               | and so amounts to nothing more than invoking magick.
               | 
               | __________
               | 
               | [1] From the paper linked above:
               | 
               |  _In addition, inspection of incorrect answers reveals
               | that the model often makes mistakes such as not carrying
               | a "1", suggesting it is actually attempting to perform
               | the relevant computation rather than memorizing a table._
               | 
               | Now that I re-read this I'm struck by the extent to which
               | the authors are willing to pull their results this way
               | and that to force their preferred interpretation on them.
               | Their model answers two-digit addition problems
               | correctly? It's learned addition! Their model is making
               | mistakes? It's because it's actually trying to compute
               | the answer and failing! The much simpler explanation,
               | that their model has memorised solutions to a few
               | problems but there are many more it hasn't even seen,
               | seems to be assigned a very, very low prior. Who cares
               | about such a boring interpration? Language models are few
               | shot learners! (once trained with a few billion examples
               | that is).
        
               | gwern wrote:
               | How ironic you claim that the paper overstates it, when
               | you very carefully leave out every single qualifier about
               | BPEs and how GPT-3's arithmetic improves massively when
               | numbers are reformatted to avoid BPE problems. Pot,
               | kettle.
        
               | ximeng wrote:
               | "Byte pair encoding": more discussion at
               | https://www.gwern.net/GPT-3#bpes
        
           | Davidzheng wrote:
           | I question the premise here. Can young children (say <5) who
           | don't know arithmetic at all learn the axioms from just a
           | bunch of examples? This isn't how children are taught
           | arithmetic; they are taught the rules/algorithm not just a
           | bunch of data-points. Do you have a source on the claim?
        
           | joe_the_user wrote:
           | The funny thing with GPT-3 is that it comes up with sentences
           | that seem to make correct deductions using arithmetic or
           | other submodels of reality but it will blithely generate
           | further sentences that contradict these apparent
           | understandings. It's impressive and incoherent all at the
           | same time.
        
         | cscurmudgeon wrote:
         | But you claim to understand right in this comment how humans
         | understand other things.
         | 
         | Isn't that self-contradictory?
        
           | p1esk wrote:
           | He's just not very good at constructing logical-sounding
           | (plausible, testable) stories about things :)
        
           | xdavidliu wrote:
           | no he did not claim so. His claim is just as much a lie as
           | all other claims. Lies can be valuable, even if they are not
           | strictly true.
        
         | soulofmischief wrote:
         | Mental models are not lies.
         | 
         | "The car is red" is not a lie just because I didn't phrase it
         | internally as "The car reflects photons of a frequency of
         | around 700nm".
         | 
         | We have to be able to simplify and internalize simplified
         | models in order to make any sense of anything. It's the same
         | reason your eyes only have a focal point in the dead center:
         | attention requires vast amounts of processing power.
         | 
         | To reiterate, a simplification is not a lie. Especially not a
         | flat-out lie.
        
           | Turing_Machine wrote:
           | Yes. Bridges are designed using old-school Newtonian
           | mechanics, without (in general) considering relativistic or
           | quantum effects at all, but they stand up nonetheless.
           | 
           | That wouldn't be the case if they were built based on "flat-
           | out lies".
           | 
           | Edit: the sticking point here is the use of the word "lie", I
           | think.
           | 
           | A lie is a falsehood told with intent to deceive.
           | 
           | Thus, novels are not "lies", despite being works of the
           | imagination. Everyone knows that the people in Moby-Dick or
           | Wuthering Heights were made up.
           | 
           | Neither is a simplified model a "lie" if it is close enough.
           | 
           | All engineers know that the steel and concrete components of
           | a bridge have quantum and relativistic effects going on
           | inside them, so ignoring those effects isn't a "lie" in any
           | meaningful sense.
           | 
           | It just doesn't matter for the purpose at hand.
        
         | bananaface wrote:
         | I get what you're saying, but I think this is a side-effect of
         | saving thinking in abstractions. We use abstractions to plug
         | holes to avoid having to drill down on every concept we're
         | exposed to.
         | 
         | That can look like a surface-level linguistic understanding,
         | but it's not, it's a surface-level abstraction. It's not
         | arbitrary, it has structure, and when you flesh it out you're
         | fleshing it out with actual abstract structure, _not_ just
         | painting over the gaps with arbitrary language.
        
       | skybrian wrote:
       | Darn, based on the title, I was hoping for an overview of recent
       | research.
       | 
       | Lots of people are having fun playing with GPT-3 or AI Dungeon,
       | myself included, but it seems like there is other interesting
       | research going on like the REALM paper [1], [2]. What should I be
       | reading? Why aren't people talking about REALM more? I'm no
       | expert, but it seems like keeping the knowledge base outside the
       | language model has a lot going for it?
       | 
       | [1] https://ai.googleblog.com/2020/08/realm-integrating-
       | retrieva... [2] https://arxiv.org/abs/2002.08909
        
       | plutonorm wrote:
       | I think the standard responses you see about gpt-3 -
       | 
       | Hur durr, can't have intelligence from statistics. Hurt dur,
       | Chinese room. Hur durr, doesn't understand semantics, it's just
       | dumb. Ect ect ect ad nausea.
       | 
       | They, go to show not how dumb gpt-3 is but how unthinking most
       | people are. Just pulling from a bag of generally agreed upon
       | notions, matching them up in simplistic ways and regurgitating.
        
       ___________________________________________________________________
       (page generated 2020-08-16 23:00 UTC)