[HN Gopher] Propagation of mistakes in papers
       ___________________________________________________________________
        
       Propagation of mistakes in papers
        
       Author : greghn
       Score  : 82 points
       Date   : 2022-07-26 16:03 UTC (6 hours ago)
        
 (HTM) web link (databasearchitects.blogspot.com)
 (TXT) w3m dump (databasearchitects.blogspot.com)
        
       | woliveirajr wrote:
       | > Judging by publication date the source seems to be this paper
       | (also it did not cite any other papers with the incorrect value,
       | as far as I know). And everybody else just copied the constant
       | from somewhere else, propagating it from paper to paper.
       | 
       | And the Scheuermann and Mauve paper mentions that they picked the
       | value (0.775351) from the Philippe Flajolet paper that only
       | mentions it without the extra 5. It's not that it was calculated
       | again, reviewed or something like that. It was simple picked up
       | and typed wrong.
        
       | sebastianconcpt wrote:
       | Have you thought what would be needed and what would imply to
       | have a kind of CI/CD pipeline for unit testing assertions on
       | papers?
        
         | radus wrote:
         | How do you CI/CD assertions in papers using animal models in
         | experiments that take months to years?
        
         | _Algernon_ wrote:
         | How would that work? You can't automate testing of papers.
         | However flawed the process is, this is what peer review is
         | intended to do.
        
       | tinalumfoil wrote:
       | Reminds me of https://en.wikiquote.org/wiki/Oil_drop_experiment,
       | famously described by Feynman,
       | 
       | > Millikan measured the charge on an electron by an experiment
       | with falling oil drops, and got an answer which we now know not
       | to be quite right. It's a little bit off because he had the
       | incorrect value for the viscosity of air. It's interesting to
       | look at the history of measurements of the charge of an electron,
       | after Millikan. If you plot them as a function of time, you find
       | that one is a little bit bigger than Millikan's, and the next
       | one's a little bit bigger than that, and the next one's a little
       | bit bigger than that, until finally they settle down to a number
       | which is higher.
        
         | m-watson wrote:
         | That experiment is also used to teach about selective data
         | exclusion (potential scientific fraud) as well as a resistance
         | to challenging an already published value (future experiments
         | searching to verify Millikan's value rather than show it was
         | incorrect or off) in a lot of experimental classes.
         | 
         | https://en.wikipedia.org/wiki/Oil_drop_experiment
        
         | hinkley wrote:
         | Is that part of the genesis for these conversations about how
         | perhaps the physical constants of the universe are slowly
         | changing over time? If you look at the 'right' experiments, the
         | speed of light slowly crept up over time too, IIRC. When the
         | movement is all in one direction it's easy to speculate that
         | maybe that's because the target keeps moving.
        
         | antognini wrote:
         | Another example I recently came across was the early
         | measurements of the AU using radar. The first two experiments
         | tried to bounce radar off of Venus had very noisy data, but
         | they seemed to have a detection that implied a distance that
         | was pretty close to the earlier measurements that had been done
         | using parallax. But after the equipment was upgraded, the
         | detections went away and it turned out that they had just been
         | noise. Later on an even more powerful radar system was able to
         | successfully bounce a radar signal off of Venus and it turned
         | out that the AU was quite a bit different from its earlier
         | value.
        
       | btrettel wrote:
       | Researchers as a whole need to do more checking. While I agree
       | that errors like the one identified in the link are rare, they
       | are not so rare that one shouldn't spend the time looking for
       | them or assume that everything was done properly.
       | 
       | I've speculated before that peer review gives researchers false
       | confidence in published results [0]. A lot of academics seems to
       | believe that peer review is much better at finding errors than it
       | actually is. (Here's one example of a conversation I had on HN
       | that unfortunately was not productive: [1].) To be clear, I think
       | getting through peer review _is_ evidence that a paper is good,
       | albeit weak evidence. I would give the fact that a paper is peer
       | reviewed little weight compared against my own evaluation of the
       | paper.
       | 
       | [0] https://news.ycombinator.com/item?id=22290907
       | 
       | [1] https://news.ycombinator.com/item?id=31485701
        
         | 11101010001100 wrote:
         | I just completed a paper review as a reviewer. After I think 4
         | rounds, the author finally ran the calculation I had asked for
         | in the initial review and admitted I was right. We got there in
         | the end, but I had to sit on my hands.
        
         | pcrh wrote:
         | Peer review can help improve a paper (and it has improved some
         | of mine); however, contrary to some popular notions, it doesn't
         | lend "truth" to a paper.
         | 
         | Peer reviewers are not monitoring how experiments were
         | conducted, they only have access to a data set that is by
         | necessity already highly selected from all the work that went
         | into producing the final manuscript. The authors thus bear
         | ultimate responsibility.
         | 
         | When considering published work close to mine, I use my own
         | judgement of the work, regardless of peer review or which
         | journal it is published in (for example it may be in a PhD
         | thesis). For work where I am not so familiar with the
         | methodologies, I prefer to wait for independent
         | verification/replication (direct or indirect) from a different
         | research group, which ideally used different methods.
        
           | throwawaymaths wrote:
           | Well, to be fair, there _is_ the journal  "organic syntheses"
           | 
           | https://en.m.wikipedia.org/wiki/Organic_Syntheses
        
         | NegativeLatency wrote:
         | Somewhere near 100% of my shipped bugs have been peer reviewed
         | so that makes a lot of sense to me.
        
         | michaelmior wrote:
         | > To be clear, I think getting through peer review is evidence
         | that a paper is good
         | 
         | I think this depends on how you define _good_. I 'm sure
         | there's some variation across fields, but peer review generally
         | seeks to establish that what is presented in the paper is
         | plausible, logically consistent, well-presented, meaningful,
         | and novel. That list is non-exhaustive, but _correct_ is very
         | hard to establish in a peer review process. In my experience,
         | it would be rare for a reviewer to repeat calculations in a
         | paper unless something seems fairly obviously off.
         | 
         | As a computer scientist, it would be even more rare for a peer
         | reviewer to examine the code written for a paper (if it is
         | available) to check for bugs. Point being, there are a lot of
         | reasons a paper that appears good may be completely incorrect.
         | Although this is typically for reasons that I as a casual
         | reader would be even less likely to distinguish than a reviewer
         | who is particularly knowledgeable about that particular field.
        
         | magicalhippo wrote:
         | I've got a close relative who reviews papers all the time in
         | their field (not CS). Based on that my take is that if a paper
         | passes peer review it is a good indicator there's nothing
         | egregiously wrong with the stuff that's written.
        
       | hinkley wrote:
       | I wonder if there's a trick we're missing related to the dead-
       | tree history of papers that we could address.
       | 
       | Namely, paper references always reach back in time. Papers don't
       | reference papers that were written after they were written. And
       | if that sounds stupid, bear with me a second.
       | 
       | We've talked a lot about the reproducibility problem, and that's
       | part of propagation errors in papers (I didn't prove this value,
       | I just cribbed it from [5]). If we had a habit of peer reviewing
       | papers and then adding the peer review retroactively to the
       | original paper, both for positive and negative results, would we
       | slow this merry-go-round down a little bit and reduce the head-
       | rush? Would that help prevent people from citing papers that have
       | been debunked?
        
         | renewiltord wrote:
         | Solid point. The paper is a delta-mapper: it provides a p -->
         | [?]p prior to posterior-change. However, it does not tell you
         | anything about p or p'=p+[?]p itself. To get true value of p^n,
         | we sum over all [?]p in some way (affected by the path we take
         | through papers addressing).
         | 
         | You're modifying the thing so that future [?]p^{i+k} are added
         | to the delta-mapper so that [?]p is appropriately modified
         | accounting for that [?]p^{i+k}. It's like path-compression in a
         | union-find structure.
         | 
         | It is interesting as a helpful approach but does suffer from
         | the pingback spam problem, right? And I have a slightly
         | sneaking suspicion that it is not an accidental oversight in
         | science that leads to these problems.
        
       | jdougan wrote:
       | A different kind of replication crisis.
        
       | bluenose69 wrote:
       | I don't think this sort of thing is all that unusual.
       | 
       | I once did a web-of-science search for citations to a
       | foundational paper in my field. It was published in volume 13 of
       | a particular journal, and that was listed in a little over 90% of
       | the citations, but the other citations all listed the journal as
       | 113. My assumption is that somebody cited it in error, and that
       | others were basically copying the citation from the bibliography,
       | rather than going back to the original paper to get the original
       | metadata.
       | 
       | Does this mean that about 10% of writers were basically lying
       | about having read the original paper? Well, maybe. But I fear
       | that the number might be higher than 10%, because the correct
       | citations might also have resulted from just copying from a
       | bibliography.
       | 
       | I tell this story to my students, in hopes that they will
       | actually _read_ the original papers. Quite a few take my advice
       | to heart. Alas, not all do.
        
         | marcosdumay wrote:
         | Or maybe that's because somebody published a bibtex entry for
         | that paper that got that volume number wrong and those people
         | just copied and pasted the entry without reading.
        
         | RC_ITR wrote:
         | This is somewhat a criticism of how contemporary citations work
         | though.
         | 
         | Primitive science (or even pre-publishing science) doesn't get
         | cited because humanity figured it out before our current system
         | was in place.
         | 
         | It may sound silly, but no one feels the need to cite
         | Eratosthenes when implying the world is round.
         | 
         | But many people _do_ feel the need to cite the colorimetric
         | determination for phosphorus (an SCI top 100 paper) even though
         | it was published 100 years ago and is generally considered
         | "base-level science."
         | 
         | It is certainly an interesting paper to read, but I'm not sure
         | I need every scientist to read it in order to believe they know
         | how to do colorimetric analysis.
        
         | actuallyalys wrote:
         | I'd be curious to know whether the percentage of incorrect
         | citations varies over time. I would guess more recent authors
         | would be more likely to search by title in Google Scholar or
         | SciHub (or use the DOI link, if available) rather than actually
         | use the volume and page number, which could result in more
         | authors who _did_ read the article nonetheless getting the
         | volume number wrong.
        
         | gwern wrote:
         | There's a semi-famous line of research by Simkin which uses
         | citation copying errors as 'radioactive tracers' to estimate
         | the rate of copying & nonreading, under the logic that (in a
         | pre-digital age), you could not possibly have repeated the
         | '113' error if you got an ILL copy or physically consulted
         | volume '13' (if only because you would be pissed at wasting
         | your time either checking volume 113 first or verifying there's
         | no such thing as volume 113):
         | 
         | https://www.gwern.net/Leprechauns#citogenesis-how-often-do-r...
         | 
         | Your 10% isn't far off from the 10-30% estimates people get, so
         | not bad.
        
       | hunglee2 wrote:
       | same thing happens in the news - we assume due diligence has been
       | satisfactorily (and honestly) conducted by publishers we hold in
       | high esteem, and happily propagate without scrutiny, so long as
       | it fits our preferred narrative
        
       ___________________________________________________________________
       (page generated 2022-07-26 23:00 UTC)