[HN Gopher] Learning to think critically about machine learning
       ___________________________________________________________________
        
       Learning to think critically about machine learning
        
       Author : SleekEagle
       Score  : 99 points
       Date   : 2022-05-02 15:13 UTC (7 hours ago)
        
 (HTM) web link (news.mit.edu)
 (TXT) w3m dump (news.mit.edu)
        
       | xyzzy21 wrote:
       | It would also be nice to remove the "magically thinking" around
       | machine learning. It's mathematically related to all prior signal
       | processing techniques (mostly a proper superset) but it also have
       | fundamental limits that no one talks about seriously. ML et al.
       | are NOT MAGIC but they are treated as if they were.
       | 
       | And that is in itself a dangerous moral and ethical lapse.
        
         | vincentmarle wrote:
         | When you have a complex system that produces nth-order effects,
         | then the only approach is to treat it as empirical phenomena
         | (aka black box magic), and that is what most research papers in
         | this field do.
        
           | throwawaygh wrote:
           | In the 80s and 90s it was really common to anthropomorphize
           | spaghetti code.
           | 
           | Just because something is difficult to analyze doesn't mean
           | it has limitless power.
        
         | ravi-delia wrote:
         | Maybe this is just my soft, theory-laiden pure math brain
         | talking, but I'd be a lot less impressed with machine learning
         | if we had a decent formal understanding of them. As is they're
         | way weirder than I think most engineering types give them
         | credit for. But then again, that's how I feel about a lot of
         | applied stuff, it all feels a little magic. I can read the
         | papers, I can mess around with it, but somehow it's still
         | surprising how well it can work.
        
           | SleekEagle wrote:
           | Ultimately it comes down to gradient-based descent (which is
           | pretty magical in its own right), but what's most surprising
           | to me is that the loss landscape is actually organized enough
           | to yield impressive results. Obviously the difficulties of
           | training large NNs are well-documented, but I'm surprised
           | it's even that easy
        
         | nonrandomstring wrote:
         | > It would also be nice to remove the "magically thinking"
         | around machine learning.
         | 
         | To be honest it would be a morally and ethical less dangerous
         | world if we could get our feet back on the ground in relation
         | to digital technologies in general.
         | 
         | > fundamental limits that no one talks about seriously.
         | 
         | I am starting to touch and stumble into the invisible cultural
         | walls that I think make people "afraid" to talk about
         | limitations. I am not yet done analysing that, but suspect it
         | has something to do with the maxim that people are reluctant to
         | question things on which their salary depends. That seems to be
         | a difference between "scientists" and "hackers" in some way.
         | 
         | Going back to Hal Abelson's philosophy, "magic" _is_ a
         | legitimate mechanism in coding, because we _suppose_ that
         | something is possible, and by an inductive /deductive interplay
         | (abduction) we create the conditions for the magic to be true.
         | 
         | The danger comes when that "trick" (which is really one of
         | Faith) is mixed with ignorance and monomaniacal fervour, and so
         | inflated to a general philosophy about technology.
        
           | time_to_smile wrote:
           | > suspect it has something to do with the maxim that people
           | are reluctant to question things on which their salary
           | depends.
           | 
           | I once worked on a team that spent a lot of time building
           | models to optimize parts of the app for user behavior (trying
           | to intentionally remain vague for anonymity reasons). Through
           | an easy experiment I ran I ended up (accidentally)
           | demonstrating that the majority of DS work was not adding
           | more than minimal improvements, and so little monetary value
           | and it did not justify any of the time spend on this.
           | 
           | I was let go not long after this, despite having help lead
           | the team to record revenues by using a simple model (which
           | ultimately was what proved the futility of much of the work
           | the team did).
           | 
           | Just a word of caution as you
           | 
           | > start to touch and stumble into the invisible cultural
           | walls that I think make people "afraid" to talk about
           | limitations
        
             | hotpotamus wrote:
             | It's long been my suspicion that much of tech is just
             | throwing more and more effort into ever diminishing returns
             | and I think a lot of us at least feel that too, but the pay
             | is good and you don't have to dig ditches, so what are you
             | going to do?
        
             | nonrandomstring wrote:
             | Good story. I guess you had done with your work there.
             | Sometimes teams/places have a way of naturally helping us
             | move to the next stage.
             | 
             | Competences work at multiple levels, visible and invisible.
             | Being good at your job. Showing you're good at your job.
             | Believing in your job. Getting other people to believe in
             | your job. Getting other people to believe that you believe
             | in your job... and so on _ad absurdum_. Once one part of
             | that slips the whole game can unravel fast.
        
         | arcticfox wrote:
         | > ML et al. are NOT MAGIC but they are treated as if they were.
         | 
         | They're not magic - nothing is, but what _are_ they?
         | 
         | > but it also have fundamental limits that no one talks about
         | seriously
         | 
         | What are these fundamental limits? 20 years ago I imagine
         | skeptics in your camp would have set these "fundamental" limits
         | at lower than DALL-E 2, GPT-3, AlphaStar etc. Or are you
         | talking about limits today? In which case, sure, but I think
         | "fundamental" is the wrong word to use there given they change
         | continuously.
         | 
         | > It's mathematically related to all prior signal processing
         | techniques (mostly a proper superset)
         | 
         | And human brains are what if not signal processing machines?
        
           | amelius wrote:
           | > They're not magic - nothing is, but what are they?
           | 
           | Emergent magic.
        
         | woopwoop wrote:
         | Honestly at this point it kind of is magic. These things are
         | knocking out astonishing novel tasks every month, but the state
         | of our knowledge is "why does sgd even work lol". There is no
         | coherent theory.
        
           | srean wrote:
           | > "why does sgd even work lol"
           | 
           | I find this hand a little over played.
           | 
           | It depends on the degree of fidelity we demand of the answer
           | and how deep we want to go questioning the layers of answers.
           | However, if one is happy with a LOL CATS fidelity, which
           | suffices in many cases, we do have a good enough
           | understanding of SGD -- change the parameters slightly in the
           | direction that makes the system work a little bit better,
           | rinse and repeat.
           | 
           | No one would be astonished that using such a system leads to
           | better parameter settings than ones starting point, or at
           | least not significantly worse.
           | 
           | Its only when we ask more questions, ask deeper questions
           | that we get to "we do not understand why SGD works so
           | astonishingly well"
        
             | woopwoop wrote:
             | Yeah I didn't mean to imply "Why does SGD result in lower
             | training loss than the initial weights" is an open
             | question. But I don't think even lolcatz would call that a
             | sufficient explanation. After all if the only criterion is
             | "improves on initial training loss" you could just try
             | random weights and pick the best one. The non-convexity
             | makes sgd already pretty mysterious, and that is without
             | even getting into the generalization performance, which
             | seems to imply that somehow sgd is implicitly regularizing.
        
               | srean wrote:
               | I dont disagree, except perhaps the lolcatz's demand for
               | rigour. _Improve with small and simple steps till you
               | cant_ is not a bad idea after all.
               | 
               | BTW your randomized algorithm with a minor tweak is
               | surprisingly (unbelievably) effective -- randomize the
               | weights of the hidden layers, do a gradient descent on
               | just the final layer. Note the loss is even convex in the
               | last layer weights if matching/canonical activation
               | function is used. In fact you dont even have to try
               | different random choices, but of course that would help.
               | The random kitchen sink line of results are a more recent
               | heir to this line of work.
               | 
               | I suspect that you already know this and the fact that
               | the noise in SGD does indeed regularize and the way it
               | does so for convex function has been well understood
               | since the 70s, so I am leaving this tidbit for others who
               | are new to this area.
        
             | Filligree wrote:
             | Why are there so few local minima, you mean?
             | 
             | I think it'd have to be related to the huge number of
             | dimensions it works on. But I have no idea how I'd even
             | begin to prove that.
        
               | srean wrote:
               | Its not even certain that they are few. Whats rather
               | unsettling is that with these local moves of SGD the
               | parameters settle on a good enough local minima in spite
               | of the fact that we know that many local minima exists
               | that have zero or near zero training loss. There are
               | glimmers or insight here and there but the thing is yet
               | to be fully understood
        
           | SemanticStrengh wrote:
           | No neural networks aee stagnant on most key NLP tasks. While
           | there has been some advances in cool tasks, the needed tasks
           | for NLU are potently wintered.
        
           | 300bps wrote:
           | _Honestly at this point it kind of is magic._
           | 
           | How much of that magic is smoke and mirrors? For example, the
           | First Tech Challenge (from FIRST Robotics) used Tensor Flow
           | to train a library to detect the difference between a white
           | sphere vs a golden cube using a mobile phone's on-board
           | camera.
           | 
           | The first time I saw it, it did seem pretty magical. Then in
           | testing realized it was basically a glorified color sensor.
           | 
           | I think these things make for great and astonishing demos but
           | don't hold up to their promise. Happy to hear real-world
           | examples that I can look into though.
        
             | woopwoop wrote:
             | Even if it were practically useless (which it is not,
             | although the practical applications are less impressive
             | than the research achievements at this point), it would be
             | magical. Deep learning has dominated imagenet for a decade
             | now, for example. One reason this is magical is because the
             | sota models are extremely over parametrized. There exist
             | weights that perform perfectly on the training data but
             | give random answers on the test data [0]. But in practice
             | these degenerate weights are not found during sgd. What's
             | going on there? As far as I know there is no satisfying
             | explanation.
             | 
             | [0] https://arxiv.org/abs/1611.03530
        
             | wgd wrote:
             | I mentored an FTC team that was using the vision system
             | this year, and my overall impression was that the
             | TensorFlow model was absolute garbage and probably
             | performed worse than a simple "identify blobs by color"
             | algorithm would have.
             | 
             | The vision model was tolerably decent at tracking
             | incremental updates to object positioning, but for some
             | reason would take 2+ seconds to notice that a valid object
             | was now in view (which is quite a lot, in the context of a
             | 30s autonomous period), and frequently identified the back
             | walls of the game field as giant cubes.
        
             | dekhn wrote:
             | there's a big difference between a glorified color sensor
             | and a well trained deep learning library (I can say this
             | with authority because I hired an intern at Google to help
             | build one of those detectors). It's still not magic, but a
             | well-trained network is robust and generalizable in a way
             | that a color sensor cannot be.
        
         | SemanticStrengh wrote:
         | NNs are just glorified logistic regression. People should
         | simply understand that neural networks cannot emulate a dumb
         | calculator accurately, this simple fact is enough to realize
         | being an universal approximator is in practice a fallacy, and
         | true Causal NLU or AGI is essentially out of reach of neural
         | networks, by design. Only a brain fidel architecture would have
         | hope however C.elegans retro engineering is underfunded and
         | spiking neural networks are untrainable.
        
           | mpfundstein wrote:
           | knock knock. some critic from the 70s arrived. hows gofai
           | going?
        
             | SemanticStrengh wrote:
             | Oh yes it's not GOFAI that has won the ARC challenge it's
             | neural networks, right? right?
             | https://www.kaggle.com/c/abstraction-and-reasoning-
             | challenge
             | 
             | I have more expertise in deep learning than anyone else
             | here and the delusions of the incoming transformer winter
             | will be painful to watch. In the meantime, enjoy your echo
             | chamber.
        
               | nuclearnice1 wrote:
               | > the delusions of the incoming transformer winter will
               | be painful to watch
               | 
               | Meaning?
        
               | SemanticStrengh wrote:
               | Meaning that HN in ten years will mock current HN
        
           | Der_Einzige wrote:
           | Using gradient based techniques does a LOT to force neural
           | network weights to resemble surfaces that they do not at all
           | look like when using global optimization and gradient free
           | techniques to optimize them.
           | 
           | Most of the stupid crap that people give about degenerate
           | cases where deep learning doesn't work (cartpoll in
           | reinforcement learning, sine/infinite unbounded functions)
           | are showcasing how bad gradient based training is - not how
           | bad deep learning is at solving these problems. I can within
           | seconds solve cartpoll with neural networks using
           | neuroevolution of weights....
        
           | [deleted]
        
           | visarga wrote:
           | > NNs are just glorified logistic regression.
           | 
           | 2015 called, they want you back! Now seriously, "just" does
           | an amazing amount of work for you. How do you "just" make
           | logistic regression write articles on politics, convert
           | queries in SQL statements? or draw a daikon radish in a tutu?
           | 
           | Humans are "just" chemistry and electricity, and the whole
           | universe just a few types of forces and particles. But that
           | doesn't explain our complexity at all.
        
             | SemanticStrengh wrote:
             | Neural networks do achieve impressive things but they also
             | fail to achieve essential things that preclude them from an
             | AGI or Causal NLU ambition, such as the inability to
             | approximate a dumb calculator without significant accuracy
             | loss.
        
               | Filligree wrote:
               | _I_ can't approximate a dumb calculator without
               | significant accuracy loss. Not without emulating symbolic
               | computation, which current AI is perfectly capable of
               | doing if you ask it the right way.
               | 
               | Whatever makes you think it's necessary for AGI, when we
               | don't have it?
        
               | SemanticStrengh wrote:
               | NNs fails to do any algorithmy like pathfinding, sorting,
               | etc The point is not that you have it it's that you can
               | have it by learning and using a pen and paper. Natural
               | language understanding require both neural network like
               | pattern recognition abilities and advanced algorithmic
               | calculations. Since neural networks are pathetically bad
               | at algorithmy, we need neuro-symbolic software. However
               | the symbolic part is rigid and program synthesis is
               | exponential. Therefore the brain is the only technology
               | on earth to be able to dynamically code algorithmic
               | solutions. Neural networks have only solved a subset of
               | the class of automated programs.
        
               | visarga wrote:
               | There are about 3,610 results for "neural network
               | pathfinding" in Google Scholar since 2021. Try a search.
        
               | SemanticStrengh wrote:
               | And as you can trivially see it is outputting nonsense
               | values https://www.lovebirb.com/Projects/ANN-
               | Pathfinder?pgid=kqe249... (see last slide) At least in
               | this implementation
               | 
               | Even if it had 80% accuracy (optimistic) it would still
               | he too mediocre to be used at any serious scale.
        
               | visarga wrote:
               | It's a model mismatch, not an inherent impossibility. A
               | calculator needs to have an adaptive number of
               | intermediate steps. Usually our models have fixed depth,
               | but in auto-regressive modelling the tape can become
               | longer as needed by the stepwise algorithm. Recent models
               | show LMs can do arithmetic, symbolic math and common
               | sense chain-of-thought step by step reasoning and reach
               | much higher accuracies.
               | 
               | In other words, we too can't do three digit
               | multiplication in our heads reliably, but can do it much
               | better on paper, step by step. The problem you were
               | mentioning is caused by the bad approach - LMs need
               | intermediate reasoning steps to get from problem to
               | solution, like us. We just need to ask them to produce
               | the whole reasoning chain.
               | 
               | - Chain of Thought Prompting Elicits Reasoning in Large
               | Language Models https://arxiv.org/abs/2201.11903
               | 
               | - Deep Learning for Symbolic Mathematics
               | https://arxiv.org/abs/1912.01412
        
           | drdeca wrote:
           | Do you mean that a network _trained_ to imitate a calculator
           | won't do so accurately or that there is no combination of
           | weights which would produce the behaviors of a calculator?
           | 
           | Because, with RELU activation, I'm fairly confident that the
           | latter, at least, is possible.
           | 
           | (Where inputs are given using digits (where each digit could
           | be represented with one floating point input), and the output
           | is also represented with digits)
           | 
           | Like, you can implement a lookup table with neural net
           | architecture. That's not an issue.
           | 
           | And composing a lookup table with itself a number of times
           | lets one do addition, etc.
           | 
           | ... ok, I suppose for multiplication you would have to like,
           | use more working space than what would effectively be a
           | convolution, and one might complain that this extra structure
           | of the network is "what is really doing the work", but, I
           | don't think it is more complicated than the existing NN
           | architectures?
        
             | SemanticStrengh wrote:
             | I am talking about training a neural network to achieve
             | calculations. And yes look-up tables might be fit for
             | addition but not for multiplication. The accuracy would be
             | <90% which is a joke for any serious use.
        
         | wolverine876 wrote:
         | People will (and I'm sure do) use this magical thinking
         | politically, persuading people to trust the computer and
         | therefore, unwittingly, trust the persons who control the
         | computer. That, to me, is the greatest threat - it is an
         | obvious way to grab power, and most people I know don't even
         | question it. It's a major consequence of mass public
         | surveillance.
        
           | heavyset_go wrote:
           | Bureaucracies would love for a blackbox to delegate all of
           | their decisions and responsibilities to in an effort to shift
           | liability away from themselves.
           | 
           | You can't be liable for anything, you were just doing what
           | the computer told you to do, and computers aren't fallible
           | like people are.
        
         | godelski wrote:
         | I think this is a common problem and comes because we stressed
         | how these models are not interpretable. It is kinda like
         | talking about Schrodinger's cat. With a game of telephone
         | people think the cat is both alive and dead and not that our
         | models can't predict definite outcomes, only probabilities.
         | Similarly with ML people do not understand that "not
         | interpretable" doesn't mean we can't know anything about the
         | model's decision making, but that we can't know everything that
         | the model is choosing to do. Worse though, I think a lot of ML
         | folks themselves don't know a lot of stats and signal
         | processing. They just aren't things that aren't taught in
         | undergrad and frequently not in grad.
        
           | mirntyfirty wrote:
           | Along with that it becomes remarkably more difficult to
           | distinguish causation vs correlation although I'm sure that
           | point is heavily debated
        
             | godelski wrote:
             | > difficult to distinguish causation vs correlation
             | 
             | I mean this is an extremely difficult thing to disentangle
             | in the first place. It is very common for people in one
             | breath to recite that correlation does not equate to
             | causation and then in the next breath propose causation.
             | Cliches are cliches because people keep making the error.
             | People really need to understand that developing causal
             | graphs is really difficult, and that there's almost always
             | more than one causal factor (a big sticking point for
             | politics and the politicization of science, to me, is that
             | people think there are one and only one causal factor).
             | 
             | Developing causal models is fucking hard. But there is work
             | in that area in ML. It just isn't as "sexy" because they
             | aren't as good. The barrier to entry is A LOT higher than
             | other type of learning, so this prevents a lot of people
             | from pursuing this area. But still, it is an necessary
             | condition if we're ever going to develop AGI. It's probably
             | better to judge how close we are to AGI with causal
             | learning than it is for something like Dall-E. But most
             | people aren't aware of this because they aren't in the
             | weeds.
             | 
             | I should also mention that causal learning doesn't
             | necessitate that we can understand the causal relationships
             | within our model, just the data. So our model wouldn't be
             | interpretable although it could interpret the data and form
             | causal DAGs.
        
         | bell-cot wrote:
         | In my wishful thinking, by far the best way to do that would be
         | for the courts to stick companies with full legal liability for
         | the shortcomings of their "machine learning" systems. And if
         | it's fairly easily demonstrate that GiantCo's ML decision
         | making system is a sexist, racist, ageist...then GiantCo is not
         | just guilty, but also presumed to have known in advance that
         | they were systematically and deliberately on the wrong side of
         | the law.
        
         | Jenk wrote:
         | Supplant "magic" with "not understood"
         | 
         | Suddenly it all becomes a lot more palatable that many don't
         | know how it works.
        
       | photochemsyn wrote:
       | ML is really cool technology with incredible applications and
       | potential. For example, biological genomes can now be sequenced
       | relatively easily but finding the actual protein-coding sequences
       | hidden in these massive genome sequences can be difficult. ML
       | provides some novel approaches to this problem, and can
       | potentially markup these genomes and even classify the probable
       | structure/function of the resulting proteins. Amazing stuff,
       | really.
       | 
       | However, anyone who thinks this tech couldn't go off the rails in
       | the hands of nefarious actors should go read Ed Black's "IBM and
       | the Holocaust" or Josef Teboho Ansorge's "Identify and Sort".
        
       | noasaservice wrote:
       | Perhaps this is a hot take, but when I see ML being used (for the
       | most part), it's because people do not fundamentally understand
       | OR know how to solve the problem in question.
       | 
       | And instead of understanding what's going on, and solving the
       | problem efficiently, instead it's "throw GB's or TB's of data at
       | an algo that we tweak and hope for the best".
       | 
       | Sure, it gets results for massive processing times and data. And
       | sure it's "fuzzy" and can fail on really stupid stuff and provide
       | bad answers confidently. But it'll get the next VC funding line,
       | won't it?
        
         | axg11 wrote:
         | Let's look at a few of the most impressive applications of
         | machine learning:
         | 
         | - Classification in computer vision
         | 
         | - Protein folding (AlphaFold)
         | 
         | - Image generation (Dall-E 2)
         | 
         | - Answering general language queries (GPT-3)
         | 
         | It's unclear how any of these applications could have been
         | tackled _without_ machine learning. We had no solid grasp on
         | these problems before ML, with the exception of protein
         | folding. I think you are being cynical. Of course not every ML
         | project will result in a home run success. Should that make us
         | sceptical of the entire field?
        
           | Jensson wrote:
           | But most people using neural nets in the industry aren't
           | working on those things. They just slap it on any data
           | inference task, even though in most cases traditional
           | statistical/data science models works much better.
        
         | PartiallyTyped wrote:
         | How would you solve problems like Q/A? How would you solve
         | problems in RL scenarios? How would you solve image recognition
         | problems?
         | 
         | In the end, ML is not that different to what we are doing with
         | all models, ie use a set of data to create a crude model of the
         | process that we are trying to learn/figure out.
        
       | jimbokun wrote:
       | I'm not sure engineering types will be satisfied with these kinds
       | of conclusions:
       | 
       | > "It is not someone else's job to figure out the why or what
       | happens when things go wrong. It is all of our responsibility and
       | we can all be equipped to do it. Let's get used to that. Let's
       | build up that muscle of being able to pause and ask those tough
       | questions, even if we can't identify a single answer at the end
       | of a problem set," Kaiser says.
       | 
       | Hopefully the actual course content is more concrete than this.
       | But this kind of language strikes me as encouraging people to
       | feel the "correct" way about a problem, but not really
       | emphasizing coming up with concrete, actionable solutions.
       | 
       | And without actionable solutions, I feel like the value of this
       | content would be very limited.
        
       | SemanticStrengh wrote:
       | I would prefer machine learning to learn critical thinking
        
       | oxff wrote:
       | It is just a super massive graph optimization, don't get confused
       | about it like the OpenAI guys or whoever thinks matrix
       | multiplication is achieving consciousness .
        
         | SleekEagle wrote:
         | Why are the two necessarily unrelated? Can human being just be
         | considered to be learning via optimization, and perhaps
         | consciousness is an emergent property of an agent with a large
         | enough world model, or a world model that includes the agent
         | itself?
         | 
         | While I don't think a majority really thinks current systems
         | are conscious, SOTA results are absolutely astounding (check
         | out DALL-E 2 if you haven't seen it already). Whether or not an
         | agent is conscious doesn't really matter from a practical
         | standpoint (but obviously a moral one) in the long run - it is
         | intelligence that matters with these agents, and they're
         | getting absurdly more intelligent by the half-decade
        
           | SemanticStrengh wrote:
           | We are on an AGI winter in NLU. Cool openAI demos are cool
           | and irrelevant. The HN crowd should really learn to go see
           | the leaderboards by himself on paperswithcode.com then he
           | would realize the reality, we are stagnant on the key basic
           | tasks (e.g coreference resolution). While GPT-3 is just a
           | subtile bullshit generator that push to its paro(t)xism the
           | illusion of understanding that mere collocated statistics
           | amalgamation provide, Dalle-2 on the other hand is very
           | impressive but it does not advance the key NLU tasks and just
           | show how far a smart trick (constrastive learning) can go
           | before plateau-ing.
           | 
           | The idea consciousness emerge proportionately with the
           | accuracy of your mental isomorphic world representation is
           | cute however we don't become more conscious by becoming more
           | erudite, and the most intense magical qualias, such as e.g
           | orgasms are accessible to the simplest mammals and are
           | unrelated to activities in the higher cognitive regions of
           | the brain. Even a newborn that has no understanding of its
           | surrounding experience qualias.
        
             | johnsimer wrote:
             | You can make the argument that stagnant progress isn't
             | actually not progress, when it comes to AI progress.
             | Kilcher and Karpathy recently had a video where they
             | discussed how some new model (PALM or Dalle2 I forget
             | which) showed zero progress during X thousand training
             | cycles, and then suddenly rapid progress after those
             | training cycles. It was as if the model was spending
             | thousands of training cycles on grokking the concept, and
             | then finally grokked it. It could simply be that as we
             | continue to increase the number of parameters and data
             | quality on these models that we will continue to see
             | progress on the route to AGI as a whole, but only in step
             | change functions that require many training cycles
        
               | iMage wrote:
               | Out of curiosity, what was the video?
        
               | SemanticStrengh wrote:
               | How much more parameters do you need? PALM is 530
               | BILLIONs and underperform in NLP tasks vs XLnet (300
               | millions), as such very large language model are extreme
               | failures. They do not improve the state of the art once
               | you have proper datasets and do full shot learning and
               | I'm not even talking about fine-tuning.
               | 
               | Very large languages model hide to the layman that they
               | are the gigantesque failure in NLP ever by showing they
               | improve the state of the art in zero or few shot
               | learning. Who cares this is so cringe. Full size learning
               | is what matter the most and even full size learning do
               | not yield satisfying accuracy on most Nlp tasks (but
               | close enough) Therefore the only use of PALM is to have
               | mediocre (70-80%) accuracy which is better than previous
               | SOTA, only for tasks that have no good quality existing
               | datasets. And 530 billion is close to the max we can
               | realistically achieve, it already cost ~10 millions in
               | hardware and underperform a 300 million model in full
               | size learning (e.g dependency parsing, word sense
               | disambiguation, coreference resolution, NER, etc)
               | 
               | It's crazy people don't realize this gigantic failure but
               | as always it's because they don't care enough
        
           | oneoff786 wrote:
           | Feelings aren't an emergent property of intelligence, with
           | intelligence being defend as some nth derivative of
           | optimization capability. So I think no.
           | 
           | Human consciousness isn't just a brain. It's a system of
           | which the brain is a part, occurring through time.
        
       | jasfi wrote:
       | For those interested in NLU/AGI, see LxAGI: https://lxagi.com. No
       | demo available just yet.
        
       | lacker wrote:
       | It's interesting to look at the actual course content here.
       | 
       | https://ocw.mit.edu/courses/res-tll-008-social-and-ethical-r...
       | 
       |  _More generally, what does it mean for a model to be "fair"?_
       | 
       |  _LIT Company's Definition of Fairness (Group Unaware): The
       | company believes that a fair process and, therefore, a fair
       | model, would not account for gender or race at all._
       | 
       |  _Advocacy Group 's Definition (Demographic parity): An advocacy
       | group believes that a model is fair if the distribution of
       | outcomes for each demographic, gender, or other subgroup is the
       | same among those that applied and those that were accepted. For
       | example, in the example above, 30% of the applicants for loan
       | applications come from women. In the demographic parity
       | definition of fairness, this means 30% of the approved loan
       | applications should come from women._
       | 
       | I feel like the course content is somewhat slanted here. It is
       | missing the definition of "fairness" in which you treat race and
       | gender just like any other feature. Many systems work this way in
       | practice - for example car insurance charges you differently by
       | gender, because the statistics for genders are different. Ad-
       | matching by gender and race is a longstanding practice. And any
       | new system that you just train from scratch, by default it will
       | not know to treat gender or race different from anything else.
       | 
       | It is an interesting question, though. The main problems, I
       | think, are practical ones - large enough AI models cannot be
       | "race-blind" because if you remove race as a feature, they will
       | be able to infer it anyways from proxy features. Whereas the only
       | real way to enforce a system achieves the same percentage results
       | for different groups is to add a "quota system" where you
       | explicitly use different thresholds for different groups. So the
       | practical alternatives often become "quota" or "nothing".
        
         | pfortuny wrote:
         | The second definition assumes that the law of large numbers
         | applies to any instance. It is impossible to satisfy each and
         | every time. And it may also be blind to inherenet inequalities
         | (as insurance companies know).
        
         | andersource wrote:
         | Overall agree, although regarding
         | 
         | > large enough AI models cannot be "race-blind" because if you
         | remove race as a feature, they will be able to infer it anyways
         | from proxy features
         | 
         | In theory using a gradient reversal layer and an adversarial
         | classifier you could do just that, to an extent. It could hurt
         | the model's performance, which is exactly your point (should we
         | ignore features with signal because they can be used to
         | discriminate.)
        
         | wolverine876 wrote:
         | > It is missing the definition of "fairness" in which you treat
         | race and gender just like any other feature.
         | 
         | The real issue here, of course, is whether they are just like
         | every other feature:
         | 
         | Certainly in our society they are not perceived that way.
         | People perceive very serious issues and have very strong
         | feelings around race and gender. We see that right here on HN,
         | of course.
         | 
         | There is also, of course, a lot of discrimination by humans
         | based on race and gender. If we want an unbiased, fair (and
         | accurate) system, we have to correct for that. And the
         | discrimination creates higher order effects: If there is
         | discrimination against group X in K-12 education funding, then
         | fewer of X will go to college, and fewer will have higher-
         | paying jobs. If we then select blindly for income, we
         | incorporate that bias (which might be appropriate if studying
         | income by group, but not if we use it as a proxy for
         | intelligence or effort).
         | 
         | > the practical alternatives often become "quota" or "nothing".
         | 
         | Those aren't practical alternatives, they are logical extremes
         | creating a Manichean choice. Those are alternatives or a
         | political debate, not for practical problem-solving.
        
         | RangerScience wrote:
         | I'm pretty fascinated by all of this, although have barely
         | dipped my toes in. IMHO -
         | 
         | "Fairness" is a technical term-of-art meaning "the outcome
         | _should not_ be effected by inputs X, Y and Z", and the
         | collected science around making a system behave that way. It
         | closely but not quite matches the colloquial meaning, ie
         | "that's not fair!" - kinda like how "a fair coin" has a
         | specific meaning that mostly tracks with how people use the
         | word, but not quite, and with _a lot_ more specificity.
         | 
         | It's typically applied when you want to correct for real-world
         | "unfair biases" in the training data; which in practical
         | application is typically race, gender and the other legally
         | protected categories - but AFAIK is just whichever inputs you
         | decide you want to not have an impact on the outcome.
         | 
         | AFAIK what you get out of the AI/ML "fairness" science is a way
         | to measure, and correct for, dependency on the inputs that you
         | (exterior to the system) have decided that you want to _not_
         | impact the outcome.
        
       ___________________________________________________________________
       (page generated 2022-05-02 23:01 UTC)