(TXT) View source
       
       # 2024-08-12 - How To Spot The Truth
       
       ## 1 INTRODUCTION
       
       'Truth' is under attack, more so now than ever before, and for many
       reasons one of which is social media. We hear and read remarkable,
       often preposterous claims from many sources. This may be in political
       debate, the presentation of new products, or new health-enhancing
       exercises ranging from hot water pools to cold water swimming. These
       frequently claim to be 'scientific findings' often reporting 'new
       studies have shown' stories, underpinned by 'expert'opinion. They are
       amplified in the media until the next fad comes along.
       
       This pervasive form of persuasion is a war of beliefs, which in many
       cases may contradict accepted knowledge. It is always possible, in
       fact likely, that some of the more absurd claims may not involve, or
       even be properly aware of, current scientific understanding, in which
       case these claims may be logical, but based on incorrect assumptions
       or understanding. Flat earthers have a consistent world view, which
       is probably logical to them; it just is not compatible with other
       known facts. But truth is the first casualty of war, and now more
       than ever, we must equip ourselves and others with the skills needed
       to judge how valid the information we are presented with is.
       
       This is not as simple as it might appear. The context is
       all-important. Interestingly, there are far fewer exact rules, firm
       guidelines and exact cut-off levels than people might imagine for
       establishing the truth. Most scientific knowledge is rarely expressed
       in terms of utter validity, but rather expressed as 'fits' or 'is not
       inconsistent with' what we know already, or 'suitable for predicting
       performance'. For example, we now know that gravity can be bent; but
       Newton's simple straight-line approximation has taken astronauts to
       the moon and back (sorry, flat earthers). In addition, although
       statisticians use words consistently and exactly, they do not use
       words such as 'population' and 'sample' in the way they are used in
       general parlance. Nor is the logic of statistics straightforward. For
       example, the most commonly used tests of likelihood assume 'if, and
       only if, these random samples were drawn from a single population,
       then…' Logical and consistent, yes, but not well understood, even by
       some scientists. For example, in one study, trainee doctors, who
       should be reading this sort of stuff all the time, were given a
       simple statement using this test. When asked to choose the correct
       conclusion out of four possibilities, almost half made a wrong choice
       (Windish et al., 2007).
       
 (HTM) https://jamanetwork.com/journals/jama/fullarticle/208638
       
       ## 2 WHY IS GETTING AS CLOSE AS POSSIBLE TO THE TRUTH IMPORTANT?
       
       The truth helps you make 'adequately correct' decisions and act
       accordingly. Such decisions depend on the situation, and the risks of
       making a correct or incorrect decision. Uncertainty doesn't mean we
       know nothing, or that anything could be true: it just means you don't
       bet your house on an outsider.
       
       Some years ago, a district court decided that a particular vaccine
       was responsible for an adverse outcome (which was scientifically
       doubtful). This triggered a disastrous decrease in child vaccinations
       for a whole range of diseases. It also showed convincingly that the
       transmission of the faulty conclusion was related to internet
       broadband access: more broadband, greater decrease in vaccinations
       (Carrieri et al., 2019).
       
 (HTM) https://onlinelibrary.wiley.com/doi/10.1002/hec.3937
       
       In another case, however, a US court rejected a manufacturer's
       defence that there were insufficient data to meet the usual
       scientific criteria to demonstrate a causal link between a drug and a
       serious, but rare, adverse event; and this is why the drug was
       marketed without a warning. The court was unwilling to accept this
       statistical threshold, preferring to heed the reports of infrequent,
       but important, adverse events after the use of the drug, and thus
       awarded damages (Matrixx initiatives, Inc. et al. vs Siracusano et
       al., 2011).
       
 (HTM) https://supreme.justia.com/cases/federal/us/563/27/
       
       Here, we shall try to show the reader the processes applied in
       scientific evaluation, in the hope that you can apply them in your
       day-to-day decision-making. Facts don't speak for themselves--context
       is vital. An experienced scientist, who "knows the ropes", is more
       likely to use their knowledge, experience and judgement to tease out
       the full story. The central question is not 'can we be certain?', but
       rather 'can we process this information and adjust our ideas?'
       Uncertainty is always present, but we may be able to be 'confidently
       uncertain'.
       
       ## 3 A CHECKLIST FOR TRUTH
       
       (ELEMENTS OF THE CONTEXT AND QUESTIONS THAT SHOULD BE ASKED OF ANY
       CLAIM)
       
       * Who is making the statement, and what is their qualification for
         making it?
       
       * What was the original question? Has it been correctly framed?
       
       * What is the underpinning evidence for the statement? What is the
         provenance of the supporting data? Where has it been published? Are
         there alternative explanations, have these been explored, how
         possible are they?
       
       * Has the best measure been used? The best way to express 'typical'
         is as the median value, as is done by the Office for National
         Statistics. However, many reports use the average, which could be
         far from the same thing and make, for example, the 'typical' person
         apparently better off (if we put incomes in order of size, from the
         least to the greatest, the 'median' is the one closest to the
         halfway point in this order. Many more incomes are small, only a
         few are whopping, so the median is closer to the bottom. The
         'average' or 'mean' is the sum of all the money in the incomes
         [lots of paltry ones, some whopping ones] divided by all the
         incomes considered in the sample. For example, median UK household
         disposable income in the financial year ending 2022 was about £32K,
         and the average was £40K.)
       
       * Have basic scientific principles been used: for example, how was
         the sample of people that was tested obtained? The concept of a
         'random' sample, scientifically, is that it will contain people
         from all walks of life, ages, states of health of the target
         population: so that the results can be applied to that population.
         If we study healthy students, then the answer may only apply to
         healthy students.
       
       * Were sufficient people tested to reliably and confidently find an
         effect? The most reliable and frequent (but rather clumsy) study
         design is a 'randomised controlled trial', often used to test new
         drugs against old ones. Such studies often need hundreds of
         participants if the drugs aren't that different in effect. Smaller
         studies may not reliably find an effect: if they do, by chance,
         then this change exaggerates the benefit (this is known as the
         'winner's curse' [Sidebotham & Barlow, 2024]--attempts to verify or
         replicate this first observed effect often fail!).
 (HTM) https://associationofanaesthetists-publications.onlinelibrary.wiley.com/doi/10.1111/anae.16161
       
       * It is not easy to prove that something does not exist, and a large
         study is needed to reach valid conclusions. This is important if
         you are investigating a rare but serious complication or a new
         technique. For example, if a new surgical procedure is carried out
         20 times without a problem, it is not necessarily safe. If the same
         procedure were carried out 100 times, and the death risk were
         randomly distributed in the same way as for the first 20, there is
         a 95% chance that the number of deaths will be between 0 and 16
         (and it is likely that fitter patients were selected first in the
         original study--see 'bias' below).
       
       * Was there a 'control group'? If an intervention is being assessed
         (e.g., the health benefits of cold-water swimming), then a control
         group is needed that will carry out the same activities but without
         the hypothesised 'active ingredient' (e.g., cold). The control
         group should include all other factors that could be at work, such
         as similar locations, similar companions, same food, same exercise,
         same bedtime and sleep profile, etc.
       
       * Humans vary a great deal, so experiments comparing human
         participants are difficult. This is particularly obvious in
         responses to medication, and can lead to unexpectedly different
         results. An elegant way of getting around this is to 'cross-over' a
         treatment and compare the same individuals, each given both the
         'control' and the 'active' treatment. However, without care this
         can also lead to complexities. Ideally half the participants should
         start with the active treatment, and half with a 'neutral'
         (control) treatment, but how can we be sure that the active
         treatment has worn off ('washed out') before testing the control
         treatment? For example, hormones may have effects that last long
         after the actual drug has left the body, and some
         psychophysiological changes can be long-lasting. Indeed, some would
         argue that, in some studies, with some people, wash out may never
         fully occur (Tipton & Mekjavic, 2000).
 (HTM) https://link.springer.com/article/10.1007/s004210000255
       
       * What measurements are made? Are these measurements, like blood
         pressure, blood levels of hormones? Or questionnaires? What
         questions get asked? It is very easy to ask leading questions,
         particularly if the person taking part believes something is doing
         them good. A far better (but far less likely) outcome would be
         health assessments a year after an intervention! Do the scientists
         making the measurements know the treatment, and what do they expect
         to find? In one study, when a pain-killer was tested, the testers
         (who were kept unaware of the drug being tested) found different
         effects if the tester had different expectations of the drug's
         effects (Gracely et al., 1985).
 (HTM) https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(85)90984-5/fulltext
       
       * Are tests being used as 'proxy' or 'surrogate' measurements for
         something that is more important but not as easy to measure?
         Examples include using exam scores as an index of ability, or body
         mass index (BMI) for health assessments. How reliable, and exact,
         are such surrogate assessments?
       
       * Does the proponent have any conflict of interest? Does what they
         argue benefit them?
       
       * Is there any 'bias'? Bias can creep in at lots of stages in the
         process of getting information and presenting it. Scientific
         publications are very varied: papers in highly regarded journals
         have met demanding acceptance standards, with stringent peer
         assessment, compared with some 'open access' journals, where papers
         are also assessed, but the author pays, or 'vanity journals' where
         the author only has to pay to get published! However, all journals
         are looking to attract readers and citations, and there is nothing
         better than controversy to boost readership and citations.
         Additionally, presentations at conferences often turn up as
         'publications' but have had virtually no peer assessment, and such
         conferences can be international, national or local.
       
       * The funding of research affects what gets published. Published
         research papers funded by companies and dealing with available
         products are more likely to give a "positive" result than studies
         independently funded (Bourgeois et al., 2010). Product evaluation
         can be designed to be flattering in terms of the variables
         assessed, avoiding observing later adverse effects, and selecting
         those tested (age, sex, race). It is now necessary to register
         clinical studies before they start: but lots of studies funded by
         drug companies are not published. Even trivial effects can be
         'statistically significant' if the study is large enough.
         Regulatory oversight of large scale, urgent studies can be limited
         and poor practice can be concealed (Powell-Smith & Goldacre, 2016).
 (HTM) https://www.acpjournals.org/doi/10.7326/0003-4819-153-3-201008030-00006
 (HTM) https://f1000research.com/articles/5-2629/v1
       
       * Survival bias is relevant. Are the data already selected? A
         salutary application of the study of survivors was the analysis of
         damage found on aircraft returning to base after combat. Clearly, a
         returning aircraft could take damage in those areas and still fly
         well enough to return safely to base. Thus, it would be best, if
         possible, to protect areas that were not seen to be damaged in
         these aircraft. Hits in undamaged areas presumably were more
         crippling (Mangel & Samaniego, 1984).
 (HTM) https://www.tandfonline.com/doi/abs/10.1080/01621459.1984.10478038
       
       Overall, as a result of failure to meet some of the requirements
       listed above, about half of published medical papers are unlikely to
       be true (Ioannidis, 2005). In 2023, the number of retractions for
       research articles internationally reached a new record of over 10,000
       (Noorden, 2023) due to an increase in sham papers and peer-review
       fraud. Furthermore, despite a requirement for disclosure, a lot of
       government research is never released, or is delayed until interest
       in the topic has declined.
       
 (HTM) https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124
       
 (HTM) https://www.nature.com/articles/d41586-023-03974-8
       
       A recent study (Briganti et al., 2023) reviewed the papers published
       on the health and recovery benefits of cold-water exposure. They
       found 931 articles, and then carefully weeded out irrelevant studies.
       The authors were left with 24 papers, and in these the risk of bias
       was 'high' in 15 and 'gave concern' in four. Thus, only five papers
       had a 'low' risk of bias: three of these looked at cold water
       immersion after exercise and two at cognitive function. So, a very
       small percentage of the studies examined had anything really useful
       to say.
       
 (HTM) https://onlinelibrary.wiley.com/doi/10.1111/apha.14056
       
       ## 4 WHAT ABOUT THE 'FINDINGS' YOU ARE PRESENTED WITH?
       
       Watch out for percentages (Bolton, 2023). A simple change is easily
       understood as a percentage, but 'scientific' studies involving
       comparisons between groups can require more careful consideration.
       These comparisons should always trigger the question 'percentage of
       what, exactly?' The headline, 'New drug/product/intervention cuts
       mortality by 50%' sounds impressive, and attracts attention, but the
       reality could be less spectacular. Perhaps using the old drug, the
       death rate was 20 per 1000 patients, and when the new drug was first
       used, the rate became 10 per 1000 patients: a 50% reduction. But the
       absolute risk reduction in death rate was 10 per 1000, or 1%, a less
       impressive headline.
       
 (HTM) https://commonslibrary.parliament.uk/research-briefings/sn04446/
       
       Also, beware of correlations. Just because two things relate to each
       other, for example, a diet and a sense of well-being, does not mean
       that one causes the other. The world is full of accidental (spurious)
       correlations (Van Cauwenberge, 2016). One of our favourites is the
       high correlation between the divorce rate in Maine, USA and the per
       capita consumption of margarine! Also, ask the question 'how many
       false positives and negatives will I get if I use this correlation to
       make a decision' (Tipton et al., 2012).
       
 (HTM) https://www.datasciencecentral.com/spurious-correlations-15-examples/
       
 (HTM) https://link.springer.com/article/10.1007/s004210000255
       
       For the moment at least, artificial intelligence cannot quantify
       uncertainty very well. Generally, AI uses stuff from 'out there' as
       if it were true. Thus, a high proportion of garbage in will give you
       garbage out (which increases the proportion of garbage that AI uses
       next time round)!
       
       We hope that, armed with the above checklist, you can challenge and
       interrogate the polarising information, from 'spin' to the outright
       falsehoods presented to you on a daily basis. We are at risk of being
       overwhelmed by an increasing number of dubious, unregulated and
       disparate sources. The next time you hear phrases like 'they say this
       is great' or 'this is scientifically proven' start by asking 'who are
       they?' and 'which scientists, using which methods?' Be cautious and
       questioning; snake oil and its vendors still exist, they come in many
       guises.
       
 (HTM) From: https://physoc.onlinelibrary.wiley.com/doi/10.1113/EP092160
       
       See also:
       
 (HTM) Carl Sagan's Baloney Detection Toolkit
       
       tags: article,science
       
       # Tags
       
 (DIR) article
 (DIR) science