hngopher.com

       [HN Gopher] New study disavows marshmallow test's predictive powers
       ___________________________________________________________________
        
       New study disavows marshmallow test's predictive powers
        
       Author : npalli
       Score  : 73 points
       Date   : 2022-02-21 20:36 UTC (2 hours ago)
        
 (HTM) web link (anderson-review.ucla.edu)
 (TXT) w3m dump (anderson-review.ucla.edu)
        
       | karaterobot wrote:
       | A good test for whether a psychological or sociological study may
       | turn out to be hard to replicate is: does it make a sweeping
       | claim about something as complex as human beings? If it's not
       | tentative, incremental, wrapped in caveats and conditionals, I
       | don't put much weight in it anymore.
        
         | goatlover wrote:
         | Pretty much this.
        
       | dang wrote:
       | All: if you're going to post here, can please make sure you're
       | not posting a shallow dismissal? Those are the quickest and
       | easiest reactions to post, but they're repetitive and boring.
       | This site is supposed to be for _interesting_ conversation, and
       | that requires new information--not things we 've all heard
       | before.
       | 
       | Hint: if you're making a strong, large statement--e.g. an
       | emphatic claim about an entire category of things--then it's most
       | likely a shallow comment.
       | 
       | https://news.ycombinator.com/newsguidelines.html
        
       | awb wrote:
       | Here's a 2011 meta analysis that reports that DRD (delayed reward
       | discounting -- basically putting lower importance on delayed
       | gratification and instead putting greater importance on immediate
       | rewards) is highly associated with addictive personalities:
       | 
       | https://addictions.psych.ucla.edu/wp-content/uploads/sites/1...
       | 
       | > _Conclusions_ These results provide strong evidence of greater
       | DRD in individuals exhibiting addictive behavior in general and
       | particularly in individuals who meet criteria for an addictive
       | disorder.
        
       | erichocean wrote:
       | The article, and especially the headline, are extremely
       | misleading.
       | 
       | The actual result: measures of self-control either weakly or
       | strongly predict positive life outcomes, depending on the measure
       | and how much adjusting was done, e.g.
       | 
       | > _[The study] created a new measure of the time each original
       | preschooler waited before taking a bite (or getting the reward)
       | to adjust for variables such as age, gender and experiment
       | conditions._
       | 
       | This study found that the "marshmallow test"--as a single measure
       | --is no more or less predictive than a basket of other measures
       | of self-control the study tested, or any of those other measures
       | of self-control taken alone.
       | 
       | Despite the misleading article and headline, the study itself
       | seems well-designed (e.g. pre-registered), but the conclusion in
       | the headline is utterly wrong as that is not what the study
       | found: self-control matters, can be measured, and those measures
       | weakly or strongly predict positive life outcomes.
       | 
       | Here's an accurate headline: Self-control still predicts positive
       | life outcomes, Marshmallow Test creator finds.
        
         | antonfire wrote:
         | > This study found that the "marshmallow test"--as a single
         | measure--is no more or less predictive than a basket of other
         | measures of self-control the study tested, or any of those
         | other measures of self-control taken alone.
         | 
         | Are we looking at the same study? I don't see where "no more or
         | less predictive than a basket" comes from, specifically where
         | "no less predictive" comes from.
         | 
         | My reading of the abstract is that, the study found that a
         | measure based on the "marshmallow test" ("preschool delay of
         | gratification", RND in the article body), is not predictive of
         | the outcomes they measured (11 capital formation outcomes).
         | 
         | It also found that a basket of measures of self-control
         | (collected at various ages, RNSRI/RNCCQ in the article body)
         | _is_ predictive of the outcomes, whether you include the
         | preschool measurement or not.
         | 
         | So from skimming the study without even reading the article, it
         | sounds to me like they found that the preschool measure doesn't
         | predict the outcomes they're measuring by itself, and it
         | doesn't contribute predictive power when it's used as part of
         | an index of self-control measured at a variety of ages.
        
         | oraphalous wrote:
         | Headline seems accurate to me. The headline:
         | 
         | New Study Disavows Marshmallow Test's Predictive Powers.
         | 
         | And it does. The marshmellow test as a single measure is
         | referred to as RND. Which is a test which measures
         | gratification wait times and is applied in pre-school. Their
         | hypothesis regarding RND:
         | 
         | hyp2: On its own, RND (measured around age four) will have only
         | a very small correlation with the measures of mid-life capital
         | formation.
         | 
         | And they report a confirmation of this hypothesis.
         | 
         | The other hypothesis refers to RNSRI rank-normalized self-
         | regulatory index - which is 4 different components measured at
         | different ages - each component is RND + 86 other measures!
         | This is reported as having a "modest" impact on outcomes - not
         | "strong" as you say. So your reporting seems the more
         | inaccurate to me.
         | 
         | But this is irrelevant anyway with respect to your claim about
         | the headline, which is only referring to the paper's disavowel
         | of RND.
         | 
         | Further evidence of the headline's accuracy and the paper's
         | disavowel of RND is that they also looked at RNCCQ - which is
         | RNSRI minus the inclusion of the RND test from each of the four
         | components. They found that including RND did not improve RNSRI
         | over RNCCQ in terms of their predictive power.
        
         | KerrAvon wrote:
         | The original headline isn't inaccurate, though it is clumsily
         | worded and neither it nor your proposed headline fully describe
         | the results of the study. The study itself says the following
         | (quoting verbatim). Note that the second point is more or less
         | what the headline says.
         | 
         | - Self-regulation composite (preschool & ages 17-37) predicts
         | capital formation at 46.
         | 
         | - Preschool delay of gratification alone does not predict
         | capital formation at 46.
         | 
         | - The composite is more predictive partly because it consists
         | of many items.
         | 
         | - No evidence of more predictive power for self-regulation
         | reported later in life.
        
         | feanaro wrote:
         | Yes, but how would you then inanely riff on psychology, which
         | all the cool kids are doing nowadays?
        
           | [deleted]
        
         | nefitty wrote:
         | Thank you for the clarification. You've thought about this a
         | lot. If a friend asked you for advice on how to help their
         | struggling teenage son improve his self-control, what do you
         | think you would say?
        
       | renewiltord wrote:
       | Oh interesting. Self-regulation is correlated with good outcomes
       | but the marshmallow test is a poor test of self regulation. Okay,
       | interesting.
       | 
       | What I would enjoy, I think, is taking a monthly test battery and
       | uploading that to a central database with other self-researchers
       | and then looking at that in a historic sense to derive ideas to
       | study. Obviously, since one is post-hoc slicing one will find
       | many spurious correlations but perhaps these correlations will
       | yield interesting areas to search around. Does anyone know of
       | anything like this?
        
       | acchow wrote:
       | From the Journal Article: "They included a total of 550 students
       | from Stanford's Bing Nursery School, aged about 4 years old
       | (ranging from 2 to 6). Many of the participants are children of
       | Stanford faculty and staff."
       | 
       | https://www.sciencedirect.com/science/article/pii/S016726811...
       | 
       | How can we conclude anything at all about the general population
       | using a sample of Stanford kids?
        
         | gkop wrote:
         | This isn't my field, but I think this is par for the course
         | unfortunately and a manifestation of a larger issue. Eg
         | https://journals.plos.org/plosone/article/file?id=10.1371/jo...
        
         | dahart wrote:
         | I like to ask this whenever Dunning Kruger comes up; DK was a
         | sample of Cornell undergrads, and the study was one tenth the
         | size of the Stanford study. DK participants were volunteers of
         | a psych class who got extra credit. Presumably they needed and
         | could use extra credit, which may have excluded the A and the F
         | students. It's hard to imagine ways to start with more bias, or
         | how we can possibly accept this sample as representative of
         | humanity.
        
         | ren_engineer wrote:
         | Social sciences are all a sham, now think about the trillions
         | of dollars of government spending that are based on that same
         | sham science as justification. People wonder why so many
         | government programs fail, it's because they are built on a
         | rotten foundation
         | 
         | https://en.wikipedia.org/wiki/Replication_crisis
         | 
         | >How can we conclude anything at all about the general
         | population using a sample of Stanford kids?
         | 
         | it's a well known problem that is rarely brought up, WEIRD
         | bias. Most social science research participants are college
         | students being bribed with extra credit or gift cards
         | 
         | https://en.wikipedia.org/wiki/Psychology#WEIRD_bias
        
           | brimble wrote:
           | Good social science possible, but is difficult and
           | (sometimes) expensive. If you can get the same _personal_
           | outcome by doing something cheap and easy instead, of course
           | that 's what most people are going to do. Fixing that seems
           | to be the Big Problem for most of science, for at least the
           | last few decades (though, yes, particularly social science).
           | 
           | > People wonder why so many government programs fail
           | 
           | This, though, I'm not so sure about. Do government programs
           | fail at a rate greater than those undertaken by other large
           | organizations, like corporations or non-profits?
        
       | scotuswroteus wrote:
       | David Brooks in shambles
        
       | learn_more wrote:
       | Sounds like it is predictive. Just not when:
       | 
       | > Controlling for differences such as household income and
       | cognitive abilities ...
       | 
       | So it's a (predictive) IQ test.
       | 
       | Perhaps it disavows the prior assumed basis of "deferred
       | gratification", but not the predictive power of the test.
        
       | api wrote:
       | Am I overreacting to consider these kinds of psychometric studies
       | to be not much better than phrenology?
       | 
       | "Behavioral phrenology" maybe?
        
         | lr4444lr wrote:
         | Delayed gratification AFAIK has solid research as a trait
         | predictive of many things. That a child's ability at 4 or 5 to
         | do it being predictive of their adult self is something else,
         | though.
        
           | TrinaryWorksToo wrote:
           | There could easily be confounders to that though. Like people
           | who are wealthy might be able to delay satisfaction better
           | than poor, because their needs are more satisfied.
        
             | dahart wrote:
             | Indeed, and the article mentions this. "The Watts study
             | findings support a common criticism of the marshmallow
             | test: that waiting out temptation for a later reward is
             | largely a middle or upper class behavior. If you come from
             | a place of shortages and broken promises, eating the treat
             | in front of you now might be the better bet than trusting
             | there will be more later."
        
               | fancifalmanima wrote:
               | To say this more explicitly, even the idea of waiting for
               | the second marshmallow being the "preferred" behavior is
               | somewhat classist.
               | 
               | Sounds more like the test is just testing for an
               | adaptation that happens to be well suited to living in a
               | upper-middle class to wealthy environment. If resources
               | are scarce, the kid that takes what they can get now
               | rather than trusting other people will do better in the
               | long run.
        
             | nostrademons wrote:
             | A lot of observable phenomena function as positive feedback
             | loops, simply because positive feedback loops are usually
             | needed to generate effect sizes that become "observable"
             | beyond individual variation. It's very likely the being
             | able to delay gratification makes you wealthy, which makes
             | you better able to delay gratification, which makes you
             | wealthier, and so on. And that's why we have discernible
             | social classes, where mobility from one to another becomes
             | very difficult.
             | 
             | Breaking the feedback loop usually involves doing something
             | farsighted, risky, and irrational - for example, risking
             | getting fired from your retail job by studying programming
             | and applying to software engineering jobs in your downtime,
             | or quitting your stable corporate job to found a startup.
        
           | dahart wrote:
           | Predictive is synonymous with correlated in a research
           | setting, but lay use of that word seems like it runs the risk
           | of implying causation. This may be the primary problem with
           | the Standford Marshmallow experiment, right? - that delayed
           | gratification is highly correlated with socioeconomic status,
           | which is well known to be an excellent predictor of future
           | socioeconomic status.
        
         | civilized wrote:
         | It's an interesting idea. But to me the marshmallow test at
         | least had some plausible connection to personality.
         | 
         | But maybe in the days of phrenology, people thought a hooked
         | nose* had a plausible connection to personality as well?
         | 
         | Weird to think about.
         | 
         | *Sorry, this is physiognomy not phrenology. The same basic
         | point stands though.
        
           | frgtpsswrdlame wrote:
           | >But maybe in the days of phrenology, people thought a hooked
           | nose had a plausible connection to personality as well?
           | 
           | Phrenology is bumps on the head right? I think hook nose
           | would be physiognomy. But yes, the idea was that your
           | behavior was due to your brain and your brain was composed of
           | many different parts that each controlled different
           | propensities or abilities. Then it was just a matter of
           | identifying where those propensities lived, in relation to
           | the head and then you could feel for the differences from
           | person to person across the surface of the head. From the
           | naive viewpoint it _is_ plausible, oh, you say the back right
           | section of my head, above the ear is a bit larger so the
           | self-control portion of my brain is well developed? I 've
           | always thought so!
           | 
           | Setting aside the marshmallow test, you can easily see how
           | scientific theories about this sort of thing, both right and
           | wrong, easily integrate.
        
           | well_i_guess wrote:
           | I think that the issue is that there is no true metric for
           | "highly marketable talents/traits." One generations genius
           | could be another generations average worker, solely because
           | market forces eliminate the competitive advantage of certain
           | things. Many, many authors seem to lament the distractability
           | of the current generations yet I would bet you many of the
           | most famous people to Gen Z are incredibly attention-fickle.
           | Whereas, 20 years ago, focus would probably be an essential
           | skill for key performers.
        
             | fancifalmanima wrote:
             | Focus is almost surely an essential skill for key
             | performers. Even among the most famous Gen Z -- you don't
             | think they focus on their social media presence and what
             | they do? What is an 8 hour photo shoot if not focusing? A
             | lot of work goes into what social media influencers post,
             | its not all done on a whim. There's also plenty of Gen Z
             | doing other more traditional work (almost everyone of that
             | generation, really). If anything, they've probably had to
             | develop coping mechanisms from an extremely early age to
             | deal with distraction, compared to prior generations.
        
               | brimble wrote:
               | I'm reminded of the SlateStarCodex post that mulls over
               | the difference between "real" ADD and just having totally
               | ordinary (but pretty great) difficulty focusing on the
               | exact same boring crap on a computer screen day, after
               | day, after day--especially if, in the latter case, a lot
               | of the people these folks are comparing themselves to,
               | when deciding that they might have ADD, are _already_ on
               | ADD meds (or coke...) for exactly that reason.
               | 
               | If our society needs 1% of the population to be
               | accountants (to pick an example) but only 0.1% of the
               | population either have incredible focus abilities or
               | don't find accounting brain-meltingly dull, then at least
               | 90% of accountants are going to feel like they have a lot
               | of trouble focusing at work. Once enough start medicating
               | (legally or otherwise) it's gonna feel to others like
               | they really do have a condition that most don't, but they
               | both kinda do (in a practical sense, they _do_ need to
               | focus better to keep up with their peers) and kinda don
               | 't (in that it's sort of our society that's sick, not
               | them--they're just acting like _most_ people would, in
               | that situation).
        
             | jrumbut wrote:
             | My poorly informed impression is that the key challenge of
             | any data driven investigation is striking the balance
             | between how hard something is to measure and how close it
             | is to what you really want to know.
             | 
             | The marshmallow test was so appealing because it was
             | incredibly easy to perform and seemed like it was pretty
             | close to a measure of the kind of self-control and
             | discipline that's needed to succeed in a variety of life's
             | most important challenges.
        
         | DeusExMachina wrote:
         | Given the current replication crisis, I would say, not much.
         | 
         | https://en.wikipedia.org/wiki/Replication_crisis
        
         | gumby wrote:
         | Yes, I think it's better, for the reason this article explains:
         | people are following up and revisiting the conclusions.
         | 
         | Nutrition studies are more like phrenology in that they are so
         | hard to do with so many confounding factors that you can't
         | really trust any macro conclusion.
        
           | yboris wrote:
           | I think your distrust in nutrition studies might stem from
           | the fact there are nefarious entities publishing things.
           | Various industries have a financial interest in making it
           | look like their product, because it contains substance X, is
           | beneficial to people. So they can design the most flimsy
           | experiment with no pre-registration, and re-run it numerous
           | times until they get the result they want.
           | 
           | Lots of conclusions from nutrition studies (especially meta
           | analyses) are robust _and_ useful to follow.
           | 
           | Consider the _NutritionFacts_ website as a good starting
           | point: a non-profit which has no ads, no industry
           | "partners", etc - focused on distilling well-designed studies
           | to see what everyday people can use from them.
           | 
           | https://nutritionfacts.org/
        
             | gumby wrote:
             | I was not even considering the issue of bad actors. Simply
             | that longitudinal, multi-variate studies of sufficient
             | scale are essentially impossible to conduct.
             | 
             | Even though nutrition is one of the very oldest, and
             | perhaps _the_ very oldest, fields of human study, it still
             | remains in the "butterfly collecting" phase of development
             | as a science. It's very very hard. I'm glad some people
             | try.
        
       | brimble wrote:
       | I've got some pretty good predictive powers, myself.
       | 
       | I predict that in twenty years, no matter how thoroughly this is
       | debunked, I'll still see this treated as true _constantly_ , and,
       | even when what's under discussion is _taking action_ based on its
       | being true, I 'll only get eye-rolls and head-pats and plain
       | disapproval/loss-of-face for bringing up that it's questionable
       | at best, then everyone will go on treating it as true.
        
       | suzzer99 wrote:
       | I've never understood the marshmallow test. I'm supposed to sit
       | there and stare at a delicious marshmallow for some indeterminate
       | amount of time in order to get _one extra marshmallow_? Offer me
       | a whole bag and we 'll talk.
       | 
       | I've always wondered if this test measures more of the child's
       | willingness to please the researcher, and not so much their
       | capacity for delayed gratification.
        
       | mansoon wrote:
       | This deserves better study.
        
       | jl2718 wrote:
       | "Adding the marshmallow test results to the index does virtually
       | nothing to the prognosis, the study finds."
       | 
       | This does not mean that the test is not predictive. It means that
       | the index (a bunch of measurements) contains statistical
       | dependencies. From a practical view, the marshmallow test result
       | depends on many cognitive factors unrelated to self-control. The
       | child must understand the instructions, remember them for the
       | duration of the test, trust the provider, value the second
       | marshmallow, and then make a decision. To be of any value, it
       | should have been tested against a standard cognitive battery,
       | which it almost certainly would have failed to improve upon.
       | Cognitive tests have worked extremely well to predict life
       | outcomes for decades now if not centuries.
        
       | ramesh31 wrote:
       | Sure. And fifty years worth of other studies have shown its'
       | effectiveness. This is meaningless noise in the absence of meta
       | analysis.
        
         | awb wrote:
         | Related meta analysis:
         | 
         | https://addictions.psych.ucla.edu/wp-content/uploads/sites/1...
         | 
         | > _Conclusions_ These results provide strong evidence of
         | greater DRD in individuals exhibiting addictive behavior in
         | general and particularly in individuals who meet criteria for
         | an addictive disorder.
         | 
         | They don't draw conclusions about causative success, just a
         | correlation with addiction.
        
       | rilezg wrote:
       | I think the 'golden goose award' page from 2015 (linked in the
       | article) gives a better overview of the original research than
       | the article:
       | https://www.goldengooseaward.org/01awardees/marshmallowtest
       | 
       | A small quote: "But this is not a story of fate - of children's
       | long-term success being determined by their self-control as four-
       | year-olds. It is a story about how children can change: those who
       | are "low delayers" can in fact learn to be "high delayers," and
       | gain the life benefits that self-control imparts."
       | 
       | So this is more olds than news, but perhaps it is good to be
       | reminded that we all have room to grow (or shrink) from who we
       | were at age 4. I personally would bet high-delayers can also
       | learn to become low-delayers, and I also would bet there are
       | times in life when you would be better off eating the marshmallow
       | now instead of investing it for another 30 years at 5% because
       | the man in the suit told you to.
        
       ___________________________________________________________________
       (page generated 2022-02-21 23:00 UTC)