[HN Gopher] AI models that predict disease are not as accurate a...
       ___________________________________________________________________
        
       AI models that predict disease are not as accurate as reports might
       suggest
        
       Author : arkj
       Score  : 118 points
       Date   : 2022-10-21 18:07 UTC (4 hours ago)
        
 (HTM) web link (www.scientificamerican.com)
 (TXT) w3m dump (www.scientificamerican.com)
        
       | deltree7 wrote:
       | Yet.
       | 
       | One thing media consistently gets wrong is the rate of innovation
       | that is happening. Media also doesn't have access to state-of-
       | the-art models, only from the trigger-happy startups too eager to
       | release half-baked version.
       | 
       | It's akin to downloading Image Generation tools from the App
       | Store and concluding that's state of the art
        
         | bpodgursky wrote:
         | It baffles me that people can watch the trendline of
         | 
         | "Job X can be automated in 40 years" (5 years ago)
         | 
         | "Job X can be automated in 10 years" (2 years ago)
         | 
         | "Job X can be automated in 5 years" (1 week ago)
         | 
         | And feel comfortable poking holes in the AI models, pointing
         | out where it fails. Obviously? But nobody 3 years ago thought
         | that graphic design or creative writing was on death's row
         | either.
         | 
         | You have to spend a modicum of effort looking at how
         | predictions have _evolved_ over the past couple years, but once
         | you do, it 's very clear that mocking current AI systems makes
         | you look like a clown.
        
           | ckemere wrote:
           | There's also the timeline that:
           | 
           | "Radiology will be automatized in 5 years" (10 years ago)
           | "Radiology will be automatized in 5 years" (5 years ago)
           | "Radiology will be automatized in 5 years" (last year)
           | 
           | or
           | 
           | "Full self driving will arrive within 5 years" (5 years ago)
           | "Full self driving is still a ways off" (last year)
           | 
           | Assuming you're referring to generative models, I don't think
           | that anyone (knowledgable) thinks that graphic design or
           | creative writing are on death's door. They might change with
           | new tools, but skilled practitioners are still required.
           | That's basically the point of the article.
        
             | chatterhead wrote:
             | We are 18 years from the DARPA Grand Challenge and none of
             | the vehicles finished.
             | 
             | Do you think a self-driving car can make it from LA to NYC
             | by itself now?
             | 
             | What do you think 2040 AI will look like?
        
             | lostlogin wrote:
             | Having seen some of the automation available in radiology,
             | I'm a bit baffled as to why I still have a job as an MRI
             | tech.
             | 
             | 5 years ago I watched automated cardiac MRI, and it worked
             | well. I was told about a site that were having good results
             | with fetal cardiac MRI via a related bit of software.
             | 
             | These scans are hard to do, and the machines did well. In
             | some cases they got confused and did a good functional
             | analysis but of the stomach, not the heart. Oops, but
             | easily fixed by almost anyone after a few minutes of
             | explanation.
             | 
             | Why are basic MSK scans still done by a tech with years of
             | training?
             | 
             | I don't know the answer to that as it's basic stuff and if
             | I end my career without machines having taken over the
             | basic stuff, I'll be a bit disappointed.
        
               | esel2k wrote:
               | But even if all these analysis would be dine
               | automatically, I guess you won't be out of a job soon
               | (good news I guess). But just different: I did in the
               | automation of the diagnostic lab and what happened is
               | that from a detective style job, today it is more about
               | running a factory. 24h running a business, turn around
               | times and have less and less qualified personnel to fill
               | the machines...
        
       | dr_dshiv wrote:
       | I'm shocked. SHOCKED.
        
       | bitL wrote:
       | OK, AI is bad but compare it to human doctors/radiologists that
       | are often worse. I still remember stats from some X-ray detection
       | where AI diagnosed with 40% accuracy and the best human doctors
       | with 38% accuracy (and median human doctors with 32% accuracy).
       | Now what are we supposed to do?
        
         | pcurve wrote:
         | Can you cite the source? Is it not possible to improve the 40%
         | rate by AI? Obviously someone eventually figured out the 100%
        
           | dmurray wrote:
           | They might have "figured it out" by cutting the patient open.
        
         | yrgulation wrote:
         | Oh god (science for some of us) the same kind of logic for
         | defending tesla's fsd system. Both crappy and dangerous, but
         | with cult like following.
        
       | srvmshr wrote:
       | I worked in healthcare ML solutions, as part of my PhD & also as
       | consultant to a telemedicine company.
       | 
       | My experience in dealing with data (we had sufficient, and
       | somewhat well labeled) & methods made me realize that a lot of
       | the prediction human doctors make are multimodal - and that is
       | something deep learning will struggle for the time being. For
       | example, say in detection of a disease _X_ , physicians factor in
       | blood work, family history, imaging, racial genealogy, general
       | symptoms (like hoarseness, gait, sweating etc), even texture &
       | palpitations of affected regions sometimes before narrowing down
       | on a set of assessments & making diagnostic decisions.
       | 
       | If we just add in more dimensions of data to model, it just makes
       | the search space sparser, not easier. Throwing in more data will
       | likely just fit more common patterns & classes well, whereas a
       | large number of symptoms may be treated as outliers and
       | mispredicted.
       | 
       | We humans are incredibly good at elimination of factors &
       | differential diagnosis. The findings don't surprise me. There is
       | much more work needing to be covered. For straightforward, and
       | conditions with limited, clear cut symptoms they are showing
       | promising advancements, but it cannot be trusted to wide arrays
       | of diagnosis - especially when models don't know what 'they do
       | not know'.
        
         | dekhn wrote:
         | are you really sure the doctors are doing a better job when
         | they go through the motions of incorporating a wide range of
         | data? Or do we just convince ourselves they're better?
         | 
         | I suspect we massively underestimate the amount of misdiagnosis
         | due to incorrect analysis of data using fairly naive medical
         | mental models of disease.
        
           | majormajor wrote:
           | My view on this is framed a bit differently but probably a
           | similar ultimate perspective:
           | 
           | I think it's probably going to be a long time before models
           | only using quantifiable measurements can even meet the
           | performance of top doctors. I can't recommend enough that
           | someone experiencing issues doctor-shop if they haven't
           | gotten a well-explained diagnosis from their current doctor.
           | 
           | But I'm very curious how good one has to be in order to be
           | better than a below-average doctor, or a 50th-percentile
           | doctor, or a 75th...
           | 
           | But I also think there may be weird failure modes similar to
           | today's not-fully-self-driving cars along the lines of "if
           | even the 75th-percentile-doctor uses the tool and sees an
           | output that stops them from asking a question they otherwise
           | might have, can it hurt things too?"
        
             | srvmshr wrote:
             | > But I'm very curious how good one has to be in order to
             | be better than a below-average doctor, or a 50th-percentile
             | doctor, or a 75th.
             | 
             | In dermatology, on which I was working, models were better
             | (at detecting skin cancers) than 52% of the GPs, going by
             | just images. In a famous Nature paper by Esteva et al., the
             | TPR was at 74% for detecting Melanomas. There is a catch
             | which probably got underreported (The skin cancer
             | positivity rate was strongly correlated to clinical
             | markings in photos. Their models didn't do quite as well
             | when 'clean', holdout data were used).
             | 
             | But the nature of information in all these models were skin
             | deep (pun intended). They were designed with a calibrated
             | objective in place unlike how we approach clinical
             | diagnostics as open ended problems for the doctors.
        
           | srvmshr wrote:
           | > _Are you really sure the doctors are doing a better job
           | when they go through the motions of incorporating a wide
           | range of data? Or do we just convince ourselves they 're
           | better?_
           | 
           | Personal story: I was diagnosed with a rare genetic disease
           | in 2019. If I ran the symptoms through a ML gauntlet, I would
           | be sure they would cancel each other out or make little
           | sense. Chest CT (clean), fever (high), TB test (negative),
           | latent TB marker (positive), vision difficulty (Nothing
           | unusual yet), edema in eye socket (yes), WBC count (normal),
           | tumors (none), hormones (normal) & retina images (severely
           | abnormal)
           | 
           | My condition was zeroed in within 5 minutes of a visitation
           | to a top retina specialist, after regular opthalmologists
           | were in a fix about two conflicting conditions. This was
           | differential diagnosis based even though genetic assay hadn't
           | returned yet, which also later came in favor. I cannot
           | overemphasize enough how good human brain is in recalling
           | information & connecting the sparse dots to logical
           | conclusions
           | 
           | (I am one of 0.003% unlucky ones among all opthalmological
           | cases & the only active patient with that affliction in one
           | of the busiest hospitals in the country. My data is part of
           | the 36 people in a NIH study & opthalmo residents are
           | routinely called in to see me as case study when I go for
           | follow up quarterly).
        
             | pixl97 wrote:
             | How many specialists did you go to before it was
             | identified?
             | 
             | How many other people with the condition were
             | misidentified?
             | 
             | I only say this because of a family member with a rare
             | genetic condition. For years they were told it was
             | something else, or told 'it was in their head'. The family
             | member started a journal of their medical conditions and
             | experiences that was detailed then brought that to their PC
             | which whom then sent them to a specialist, this specialist
             | wasn't sure and sent them to another specialist that had a
             | 3 month wait. After 5+ years of living with increasing
             | severity of the condition it was identified.
             | 
             | So, just saying, it's as much likely that the condition was
             | identified because you kept a detailed list (on paper or in
             | your mind) of the aliments and presented them in a manner
             | that helped with the final diagnosis.
        
               | srvmshr wrote:
               | > How many specialists did you go to before it was
               | identified?
               | 
               | 2 opthalmo, 1 internal medicine, 1 retina super-
               | specialist & finally someone from USC Davey
               | 
               | > How many other people with the condition were
               | misidentified?
               | 
               | Historical data: I don't know. It is fairly divided
               | between two types, one being zoonotic & other to IL2
               | gene. I am told this distinction of pathways was
               | identified in 2007.
               | 
               | > [..] you kept a detailed list (on paper or in your
               | mind) of the aliments and presented them in a manner that
               | helped with the final diagnosis.
               | 
               | I might have been a better informed patient but I went
               | with a complaint of pink eye, flu & mild light
               | sensitivity. Never imagined that visit would change my
               | life forever. Thank you though, for expressing your
               | concern & support
        
         | nostromo wrote:
         | I'm confused by your comment because these are exactly the type
         | of problems that humans generally really do a poor job
         | classifying.
        
           | jeffreyrogers wrote:
           | Most modern ML techniques do a poor job on these types of
           | problems too unless they have a lot of data (hence the
           | reference to sparsity) or assume structure that requires
           | domain specific modeling to capture.
        
         | fallingknife wrote:
         | The system itself should be built around these capabilities,
         | not the other way around. Instead of collecting data at regular
         | intervals we wait until symptoms to go to the doctor. This is
         | why the dataset is so sparse.
        
           | IdealeZahlen wrote:
           | Exactly this. The features (or limitations) of medical data
           | is inherent in the process of clinical practice, but this
           | seems to be oftentimes overlooked.
        
       | planetsprite wrote:
       | The solution to failures of AI in heathcare is transparency of
       | data. OpenAI's models work because they have virtually unlimited
       | data to train on. The scale of training data for doctor bots is
       | one millionth the size. Different countries, organizations,
       | universities need to be as open as possible sharing and
       | collaborating, realizing improvements in medicine benefits all of
       | humanity with almost no downsides.
        
         | dirheist wrote:
         | There should be a standardization committee tasked with
         | standardizing the collection of anonymized, semi-synthetic
         | medical data from hospitals/hospital networks. It seems like so
         | much research is just locked up in the IMS systems the
         | hospitals use for their patients and that never see the light
         | of day.
        
           | dekhn wrote:
           | You cannot imagine just how deep the medical data rabbit hole
           | goes.
           | 
           | Already plenty of institutions have semi-standardized their
           | collect and do multi-hospital (typically research hospitals)
           | aggregation. Whether this data is any good as training data
           | for supervised or unsupervised algorithms is really
           | questionable.
        
       | tomrod wrote:
       | Technical (Honest) Solution: two holdouts
       | 
       | 1. Involved in the build process
       | 
       | 2. Never touched until paper metrics are being written, only run
       | once
       | 
       | Realistically, unlikely to occur however due to the incentives
       | causing publication bias.
        
         | jerpint wrote:
         | Third (better) option: have a regulating body have a separate,
         | undisclosed test set. If you can't beat it, you can't deploy
         | your model. If you can beat it, you still need to have your
         | models peer reviewed and scrutinized
        
           | tomrod wrote:
           | This sounds simple yet I expect data governance will be the
           | bottleneck.
        
         | cm2187 wrote:
         | so the models that fail this one test never get published and
         | the models that succeed get published. And all you have done is
         | to publish a model that predicts that particular history, in
         | other words data fitting.
        
       | chatterhead wrote:
       | 'Brunelleschi had just the solution. To get around the issue, the
       | contest contender proposed building two domes instead of one --
       | one nested inside the other. "The inner dome was built with four
       | horizontal stone and chain hoops which reinforced the octagonal
       | dome and resisted the outward spreading force that is common to
       | domes, eliminating the need for buttresses," Wildman says. "A
       | fifth chain made of wood was utilized as well. This technique had
       | never been utilized in dome construction before and to this day
       | is still regarded as a remarkable engineering achievement.'
       | 
       | Brunelleschi was not an engineer he was a goldsmith. AI will
       | advance in the same way architecture did during the Renaissance.
       | By those with the winning ideas not with the right credentials.
       | 
       | https://science.howstuffworks.com/engineering/architecture/b...
        
       | dm319 wrote:
       | As someone who works in healthcare, so much of what I read about
       | AI makes me think that the people who are enthusiastic about
       | healthcare AI don't have much experience doing it.
       | 
       | The scenarios rarely seem to fit with what I'm actually
       | practicing. Most of medicine is boring, it is largely routine,
       | and if we don't know what's going on, it's because we're not the
       | right person to be managing the patient. Most of my time is spent
       | talking to people - patients, colleagues, family. I explain the
       | diagnosis, I talk about the plan, I am getting ideas of what the
       | patient wants and values, and then actioning it. I spend very
       | little of my time like Dr House pondering what the next most
       | important test to perform is for a patient who is confounding us.
        
         | lostlogin wrote:
         | I work in radiology with MRI as a tech. We use AI slightly
         | differently to the examples here, but it's changing a lot of
         | what we do. It's more about enhancing images than directly
         | about diagnosing.
         | 
         | The image is denoised 'intelligently' in k-space and then the
         | resolution is doubled via another AI process in the image
         | domain (or maybe the resolution is quadrupled, as it depends on
         | how you measure it. Our pixel count doubles in x and y
         | dimensions).
         | 
         | These are 2 distinct processes which we can turn on or off and
         | have some parameters which with we can alter the process.
         | 
         | The result is amazing and image quality has gone up a lot.
         | 
         | We haven't got a full grasp yet and have a few theories. The
         | vendors are also still getting to grips.
         | 
         | We think the training data set turns out to have some weird
         | influences on requires acquisition parameters. For example,
         | parallel imaging factor 4 works well, 3 and 2 less so, which is
         | not intuitive. More acceleration being better for image quality
         | is not how MRI used to work (except in a few edge cases).
         | 
         | Bandwidth, averages, square pixel, turbo factor and appropriate
         | TE matter a bit more than they did pre-AI.
         | 
         | Images are now acquired faster, look better and sequence
         | selection can be better tailored to the patient as we have less
         | of a time pressure.
         | 
         | I'd put our images up against almost anything I've seen before
         | as examples of good work. We are seeing anatomy and pathology
         | that we didn't previously appreciate. Sceptics ask if the
         | things we see are really there, but after some time with the
         | images the concern goes away and the pre-AI images just look
         | broken.
         | 
         | In the below link, ignore Gain (it isn't that great), Boost and
         | Sharp are the vendor names for the good stuff. The brochure
         | undersells it.
         | 
         | https://www.siemens-healthineers.com/magnetic-resonance-imag...
        
           | srvmshr wrote:
           | I did my Masters in NMR. Can confirm a lot of ML based plug-
           | and-play solutions are helping denoising k-space.
           | 
           | Trivia: I am also one of the pulse sequence developers
           | affiliated to Siemens LiverLab package on Syngo platform :)
           | [Specifically the multiecho Dixon fat-water sequence]. SNR
           | improvement was a big headache for rapid Dixon echos.
        
             | lostlogin wrote:
             | Ha, small world. Thanks for your work, I used to use this
             | daily until a year ago, now my usage is less frequent.
             | 
             | I guess Dixons are still a headache with their new k-space
             | stuff as Boost (the denoising) isn't compatible with it
             | yet. Gain is but looks distinctly lame when you compare it
             | Boost.
             | 
             | We are yet to see the tech applied to breath hold sequences
             | (haste, vibe etc), Dixon, 3D, gradient sequences and
             | probably others.
             | 
             | I'm looking forward to seeing it on haste and 3D T2s
             | (space) in particular. MRI looks very different today
             | compared to how it looked just 6 months ago.
             | 
             | I'd compare it to the change we saw going from 1.5T to 3T,
             | just accelerated in how quickly progress is being made.
        
               | srvmshr wrote:
               | I have long since left collaboration with team at Cary,
               | NC. But all I can say there was a great deal of interest
               | in 3D sequence improvement by interpolation with known
               | k-space patterns like in the GRASE or PROPELLR sequence
               | for e.g. They also learned a good deal from working with
               | NYU's fastMRI
        
           | ggm wrote:
           | My partner had a clinician review her paperwork and say "why
           | are you here" explaining the enhanced imaging was leading to
           | tentative concerns being raised about structural change so
           | small it was below the threshold for safe surgical treatment.
           | 
           | Moral of the story: the imaging has got so good that
           | diagnostics is now on the fringe of over diagnosing and the
           | stats need to catch up
        
             | lostlogin wrote:
             | This has been a thing for a long time, with MRI in
             | particular.
             | 
             | It gets quite philosophical. To diagnose something you need
             | some pattern on the images. As resolution and tissue
             | contrast improves you see more things, and the radiologist
             | gets to decide if the appearance is something.
             | 
             | When a clinician says there is a problem in some area of
             | anatomy and there is something on the scan, the radiologist
             | has to make a call.
             | 
             | The great thing about being a tech is that making the call
             | isn't my job. I have noticed that keeping the field of view
             | small tends to make me more friends.
             | 
             | A half imaged liver haemangioma, a thyroid nodule or a
             | white matter brain lesion as an incidental finding are a
             | daily occurrence at least.
        
         | esel2k wrote:
         | So much this. I just interviewed about 10 doctors in the space
         | of neurology and radiology to start some new projects. The
         | truth is most of the headaches are from insurance coverage
         | check or for radiologist for filling out correct reports. The
         | fancy AI stuff is with maybe a few exceptions due to the great
         | advancement imaging still far away from validation and I didn't
         | even start about it's usage and gotomarket.
         | 
         | Most of the cases the doctors sees are boring / regular cases -
         | and problems like access to medical history is way more basic
         | but more prevalent.
        
         | [deleted]
        
         | ericmcer wrote:
         | That scenario sounds like it lends itself more to AI automation
         | than a Dr. House type one.
        
           | kbenson wrote:
           | I don't know, compassion and understanding and nuanced
           | understanding of individual desires when talking to someone
           | is not what I associate AI with in my mind, but being able to
           | assess sociological and cultural taboos and try to what a
           | patient actually wants rather then what they might initially
           | express seems like something I good doctor would get to
           | through explorative conversation.
        
             | junipertea wrote:
             | Maybe removing a human from the equation would lead to more
             | honest outcome? E.g. people google all sorts of issues more
             | earnestly than they would describe it to the doctors. The
             | bottleneck would be properly understanding what the user
             | intends, which might be out of reach.
        
               | ben_w wrote:
               | Indeed. Language has been historically difficult for AI,
               | but I think it's even tougher here -- language is less
               | and less reliable the further we get from a shared
               | experience, and this is a problem when describing our
               | experiences of our own bodies, and much worse when
               | describing our own minds.
               | 
               | For example, when I was coming off an SSRI, I was
               | forewarned that I might get a sensation of "electric
               | shocks"; the actual experience wasn't like that, though I
               | could tell why they chose to describe it like that.
               | 
               | How different is the tightness in the chest during a
               | heart attack from the tightness in the chest from
               | exercising chest muscles?
               | 
               | I have no idea how doctors, GPs, and nurses manage this,
               | though they seem to have relatively little trouble.
        
             | dm319 wrote:
             | My experience of chatting with an internet chat-bot when
             | trying to get some help with a product gives me little
             | confidence we are close here.
             | 
             | Edit: wording
        
         | [deleted]
        
       | rvz wrote:
       | Of course. No surprise there. Especially the ones made with 'Deep
       | Learning'.
       | 
       | At this point, Each time AI and 'Deep Learning' is applied and
       | then scrutinised, it almost always concludes and tends towards
       | pure hype generated by investors and output garbage unexplainable
       | results from broken models. The exact same goes for the self-
       | driving scam.
       | 
       | 'AI' is slowing starting to be getting outed as an exit scam.
        
       | johannes_ne wrote:
       | I recently published a paper, where we explain how an FDA
       | approved prediction model, build into a widely used cardiac
       | monitor was developed with an incredibly biased method.
       | 
       | https://doi.org/10.1097/ALN.0000000000004320
       | 
       | Basically, the training and validation data was engineered so an
       | important range for one of the predictor variables was only
       | present in one of the outcomes, making perfect prediction
       | possible for these cases.
       | 
       | I summarize the paper in this Twitter thread:
       | https://twitter.com/JohsEnevoldsen/status/156164115389992960...
        
         | baxtr wrote:
         | Sorry for asking, but how is this relevant to the article?
        
           | NovemberWhiskey wrote:
           | Sorry for asking, but how is it _not_?
        
             | baxtr wrote:
             | Do you agree that it's ok to pose a question whenever you
             | don't understand?
        
               | csallen wrote:
               | Ironically, that's exactly what NovemberWhiskey is doing
               | here :)
        
           | johannes_ne wrote:
           | Fair question. The model we comment on both suffer from the
           | problem described in the article but also a more severe
           | problem:
           | 
           | The developers sampled obvious cases og hypotension and
           | nonhypotension, and trained the model to distinguish those.
           | And also validated it on data that was similarly dichotomous.
           | In reality the outcome is often between these two scenarios.
           | 
           | But worse, they also introduce a more severe problem where as
           | range of an important predictor is only available in the
           | hypotension outcome.
        
             | baxtr wrote:
             | Thanks for explaining!
        
         | roflyear wrote:
         | This has to be intentional no?
        
       | yrgulation wrote:
       | So many in ai chasing software solutions when the problem is
       | hardware. Limited power means limited learning. Mix lab grown
       | neurons with software and you have a wining proposition.
        
       | bjt2n3904 wrote:
       | This is what freaks me out about AI.
       | 
       | People will use it for years in various fields, and one by one,
       | after a decade or so of use, they'll come to find it was complete
       | garbage information, and they were just putting their trust in a
       | magic 8 ball.
       | 
       | But the damage is already done.
        
       | hdhdhsjsbdh wrote:
       | While the notion of treating these systems as "sociotechical"
       | rather than purely technical is probably a good move wrt actually
       | improving people's lives, I can say from my own experience in
       | academia that there are still way too many academics working in
       | this field who don't think it's their problem. I've personally
       | raised these types of issues before and been told "we're computer
       | scientists, not social scientists", as if "social scientist" is a
       | derogatory term. The biggest impediment here is, in my opinion,
       | overcoming the bloated egos of the people who think the social
       | impacts of their work are somehow out of scope. All is well as
       | long as you can continue to publish.
        
         | JHonaker wrote:
         | Preach.
         | 
         | There are way too many people that conflate MSE or other
         | abstract technical measurements of model performance like they
         | actually represent the impact any model has on a problem. Even
         | if we could somehow perfectly predict an actual realization
         | instead of a conditional expectation that still forgets to ask
         | the question of why we predicted that. Are we exploiting
         | systemic biases, like historically racist policies? Almost
         | definitely (unless we've consciously tried to adjust for them,
         | and still we've probably done that incorrectly). I've become
         | much less interested in models that basically just interpolate
         | (very well I might add), and more in frameworks that attempt to
         | answer why we see particular patterns.
        
       | drtgh wrote:
       | My humble opinion; AI is supposed to be the acronym for
       | artificial intelligence, but marketing has usurped it to refer to
       | machine learning, which is nothing more than a neo-language for
       | defining statistical equations in a semi-automated way. An
       | attempt to dispense with mathematicians to develop models.
       | 
       | What amount of energy is necessary for an event to be reflected
       | in a statistic? You have a box of 2x2 meters with balls of data,
       | and a string with a diameter of 1 meter with which to surround
       | the highest concentration of balls possible, and those that
       | remain outside, there they stay. Statistics and lack of precision
       | are concepts that go hand in hand (someones say even it is not an
       | science).
        
         | [deleted]
        
         | spywaregorilla wrote:
         | > My humble opinion; AI is supposed to be the acronym for
         | artificial intelligence, but marketing has usurped it to refer
         | to machine learning, which is nothing more than a neo-language
         | for defining statistical equations in a semi-automated way.
         | 
         | Sure. Hardly controversial.
         | 
         | > An attempt to dispense with mathematicians to develop models.
         | 
         | What...? No. Definitely not.
         | 
         | > What amount of energy is necessary for an event to be
         | reflected in a statistic? You have a box of 2x2 meters with
         | balls of data, and a string with a diameter of 1 meter with
         | which to surround the highest concentration of balls possible,
         | and those that remain outside, there they stay. Statistics and
         | lack of precision are concepts that go hand in hand (someones
         | say even it is not an science).
         | 
         | I have no idea what this is saying. It sounds like you're
         | shitting on statistics all of a sudden, which is weird, given
         | that you seemed to favor mathematicians in the first part.
        
           | drtgh wrote:
           | >I have no idea what this is saying. It sounds like you're
           | shitting on statistics all of a sudden, which is weird, given
           | that you seemed to favor mathematicians in the first part.
           | 
           | Mathematicians are specialized in problem solving, and as
           | humans, their ability to predict and analyze data makes them
           | more reliable developing models than a statistical equation.
           | They have quite more tools than statistics one.
           | 
           | Someway, it is like if using the acronym AI to define
           | statistical algorithms leads to a false sense of greater
           | reliability than such human review, or even that it is not
           | needed a deep human review. ML statistics takes algorithms
           | out of the oven long before mathematicians does, at the
           | expense of a big in accuracy difference.
           | 
           | The problem I think is people may take important decisions
           | based in the result of such statistical algorithms without
           | questioning
        
             | spywaregorilla wrote:
             | I don't think most mathematicians have spent a great deal
             | of time analyzing data tbh. Unless you mean statisticians.
        
         | naniwaduni wrote:
         | It's not that AI has been conflated with machine learning--
         | those are words that are _supposed_ to refer to the same thing.
         | The confusion is conflating either with slapdash applied
         | statistics.
        
         | dekhn wrote:
         | Statistics is not science- it's an application of probability
         | theory and some other forms of math to hypothesis selection
         | (among other things).
         | 
         | It's scientific. We only use stats because that's the best
         | method for dealing with imprecise and noisy data.
         | 
         | Statistical thermodynamics contains all the necessary tools you
         | need to answer your balls in a box question.
        
           | drtgh wrote:
           | >Statistical thermodynamics contains all the necessary tools
           | you need to answer your balls in a box question
           | 
           | The balls in a box example shows how ML statistics work. The
           | string is adjustable, it can be adapted to different
           | contours, but you have to discard data.
           | 
           | How do you compensate for the inclusion of data in the model
           | without discarding others? The string has a limit in diameter
           | by design, and you need to know the content of most of the
           | data to make good decisions.
        
         | charcircuit wrote:
         | >which is nothing more than a neo-language for defining
         | statistical equations in a semi-automated way.
         | 
         | That's why it's called artificial intelligence.
        
         | jfghi wrote:
         | Having built models, I'd claim that it's art based upon
         | science, perhaps not too different than engineering a building.
         | At every stage there are decisions to be made with tradeoffs.
         | Over time, the resulting model could be invalidated or perhaps
         | perform better. It's remarkably difficult to approach or even
         | define a "best" model.
         | 
         | What's most peculiar to me is that somehow AI is becoming more
         | distinct from math or stats and that there's a notion by
         | running pytorch one is able to play god and create sentience.
        
         | ben_w wrote:
         | > Statistics and lack of precision are concepts that go hand in
         | hand (someones say even it is not an science).
         | 
         | Statistics is the mathematics of being precise about your level
         | of imprecision. It's fairly fundamental to all science, and has
         | been for a while now.
        
       | Grothendank wrote:
       | Color _me_ , personally, surprised. Between publication bias and
       | the general public ignorance of AI and its evolving capabilities,
       | and over a decade of results in AI health being overblown before
       | transformers, how could we have predicted that post-transformer
       | results in AI health would _continue_ to be overblown?
        
       | fasthands9 wrote:
       | It is still unclear to me exactly what data they were looking
       | at/referring to in this article.
       | 
       | If you take into account bloodwork, family history, demographics,
       | etc. then it seems like you are still only getting a few dozen
       | data points. At this scale it seems like traditional statistics
       | or human checks for abnormalities are going to be about as good.
       | 
       | Although I personally know very little (apologies for
       | conjecturing) it does seem like there could be a lot of uses for
       | AI for specific diagnosis. For example, when they take your blood
       | pressure/heartbeat they only get data for one particular moment
       | where you are sitting in a controlled environment. I would think
       | if you had a year's worth of data (along with activity data from
       | an apple watch) you might be able to diagnose/predict things that
       | traditional doctors/human analysis could not.
       | 
       | I would also imagine anything that deals with image analyzing
       | (like looking for tumors in scans) will be vastly better with
       | computer AI systems than humans.
        
       | bmh100 wrote:
       | The issue with data leakage can be handled through k-fold cross-
       | validation, in which all of the data takes turns as either
       | training data or test data.
        
       | photochemsyn wrote:
       | Not that surprising. AI learning seems to do best with fairly
       | predictable systems, and when it comes to individual outcomes in
       | medicine, there's a lot of mystery involved. A group of people
       | with similar genetic makeup and exposure history to carcinogens
       | or pathogens won't all respond identically - some get persistent
       | cancers, some get nasty infections, and some don't.
       | 
       | For example, training an AI on historical tidal data would likely
       | lead to very good future tide timing and height predictions,
       | without any explicit mechanistic model needed. Tides have high
       | predictability and relatively low variability (things like
       | unusual wind patterns accounting for most of that).
       | 
       | In contrast, there are some current efforts to forecast
       | earthquakes by training an AI on historical seismograph data, but
       | whether or not these will be of much use is similarly
       | questionable.
       | 
       | https://sciencetrends.com/ai-algorithms-being-used-to-improv...
        
       | tensor wrote:
       | This is entirely unsurprising and has a very simple solution:
       | keep adding more data. Our measurements of the accuracy of AI
       | systems are only as good as the test data, and if the test data
       | is too small, then the reported accuracies won't reflect the true
       | accuracies of the model applied to wild data.
       | 
       | Basically, we need an accurate measure of whether the test data
       | set is statistically representative of wild data. In healthcare,
       | this means that the individuals that make of the test dataset
       | must be statistically representative of the actual population
       | (and also have enough samples).
       | 
       | An easy solution here is that any research that doesn't pass a
       | population statistics test must be up-front declared to be "not
       | representative of real word usage" or something.
        
         | blackbear_ wrote:
         | From the article:
         | 
         | > Here's why: As researchers feed data into AI models, the
         | models are expected to become more accurate, or at least not
         | get worse. However, our work and the work of others has
         | identified the opposite, where the reported accuracy in
         | published models decreases with increasing data set size.
        
           | spywaregorilla wrote:
           | That's not a contradiction per se. It's easier to get
           | spurriously high test scores with smaller datasets. It does
           | not clearly demonstrate that the models are actually getting
           | worse.
        
             | dirheist wrote:
             | But if diagnosis are multimodal and rely upon large,
             | multidimensional analysis of symptoms/bloodwork/past
             | medical history, wouldn't adding more dimensions just
             | increase dimensional sparsity and decrease the useful
             | amount of conclusions you are able to draw from your
             | variables?
             | 
             | It's been a long time since I remember learning about the
             | curse of dimensionality but if you increase the amount of
             | datapoints you collect by half you would have to quadruple
             | the amount of samples you have to retrieve any meaningful
             | benefit, no?
        
               | tappio wrote:
               | You are right, but I feel you misunderstood op.
               | 
               | I understood that op meant increase number of samples,
               | not variables.
        
               | spywaregorilla wrote:
               | I did mean samples (n size) not the number of features.
               | But also, no your point isn't right. If you have a ton of
               | variables, you'll be better able to overfit your models
               | to a training set (which is bad). However, that's not to
               | say that a fairly basic toolkit can't help you avoid
               | doing that even with a ton of variables. What really
               | matter is the effect size of the the variables you're
               | adding. That is, whether or not they can actually help
               | you predict the answer, distinctly from the other
               | variables you have.
               | 
               | Stupid example: imagine trying to predict the answer of a
               | function that is just the sum of 1,000,000 random
               | variables. Obviously having all 1,000,000 variables will
               | be helpful here, and the model will learn to sum them up.
               | 
               | In the real world, a lot of your variables either don't
               | matter or are basically saying the same thing as some of
               | your other variables so you don't actually get a lot of
               | value from trying to expand your feature set mindlessly.
               | 
               | > if you increase the amount of datapoints you collect by
               | half you would have to quadruple the amount of samples
               | you have to retrieve any meaningful benefit, no?
               | 
               | I think you might be thinking about standard error. Where
               | you divide the standard deviation of your data by sqrt of
               | the number of samples. So quadrupling your sample size
               | will cut the error in half?
        
         | rscho wrote:
         | > This is entirely unsurprising and has a very simple solution:
         | keep adding more data
         | 
         | Nope. Won't work. Biased data made bigger results only in bias
         | confirmation. Which is the real problem.
        
         | rjdagost wrote:
         | If there's one thing I learned with biomedical data modeling
         | and machine learning, it's that "it's complicated". For
         | biomedical scenarios, getting more data is often not simple at
         | all. This is especially the case for rare diseases. For areas
         | like drug discovery, getting a single new data point (for
         | example, the effect of a drug candidate in human clinical
         | settings) may require a huge expenditure of time and money.
         | Biomedical results are often plagued with confounding
         | variables, hidden and invisible, and simply adding in more data
         | without detection and consideration of these bias sources can
         | be disastrous. For example, measurements from lab #1 may show
         | persistent errors not present in lab #2, and simply adding in
         | more data blindly from lab #1 can make for worse models.
         | 
         | My conclusion is that you really need domain knowledge to know
         | if you're fooling yourself with your great-looking modeling
         | results. There's no simple statistical test to tell you if your
         | data is acceptable or not.
        
         | dm319 wrote:
         | I think this is a key point - the training set is very
         | important, because biases, over-curation, or wrong contexts
         | will mean the model may perform very poorly for particular
         | scenarios or demographics.
         | 
         | I can't find the reference now of a radiology AI system which
         | had a good diagnosis rate of finding a pneumothorax on a chest
         | x ray (air in the lining of the lung). This can be quite a
         | serious condition, but is easy to miss. Turns out that the
         | training set had a lot of 'treated' pneumothorax. The outcome
         | was correct - they did indeed have a pneumothorax, but they
         | also had a chest drain in, which was helping the prediction.
         | 
         | Similar to asking what the demographic of training set is, is
         | what the recorded outcome was. How was the diagnosis made.
         | There is often no 'gold standard' of diagnosis, and some are
         | made with varying degrees of confidence. Even a post-mortem
         | can't find everything...
        
       | cm2187 wrote:
       | So a model calibrated on a backtest says nothing about its
       | predictive capacity. Who would have thought? Well, at least I
       | think anyone who worked even a little bit in quantitative
       | finance. The only way to validate a model is to make predictions
       | and test if those predictions actually happen in a repeatable
       | way, which in certain circles is refered to as "experiment".
       | 
       | That's why I distrust any model built purely on backtested data
       | unless they can be shown to predict something else than history.
       | And AI is not the only area that blindly trusts those kind of
       | models.
        
       | rscho wrote:
       | Surprise, surprise. People hugely overestimate the data retrieval
       | capabilities of healthcare systems. And if you really put
       | clinical 'AI' systems to the test in day-to-day settings (which
       | is in fact never done), results would be much, much worse.
       | 
       | Shit data in, shit prediction out.
        
       ___________________________________________________________________
       (page generated 2022-10-21 23:00 UTC)