[HN Gopher] AI models that predict disease are not as accurate a... ___________________________________________________________________ AI models that predict disease are not as accurate as reports might suggest Author : arkj Score : 118 points Date : 2022-10-21 18:07 UTC (4 hours ago) (HTM) web link (www.scientificamerican.com) (TXT) w3m dump (www.scientificamerican.com) | deltree7 wrote: | Yet. | | One thing media consistently gets wrong is the rate of innovation | that is happening. Media also doesn't have access to state-of- | the-art models, only from the trigger-happy startups too eager to | release half-baked version. | | It's akin to downloading Image Generation tools from the App | Store and concluding that's state of the art | bpodgursky wrote: | It baffles me that people can watch the trendline of | | "Job X can be automated in 40 years" (5 years ago) | | "Job X can be automated in 10 years" (2 years ago) | | "Job X can be automated in 5 years" (1 week ago) | | And feel comfortable poking holes in the AI models, pointing | out where it fails. Obviously? But nobody 3 years ago thought | that graphic design or creative writing was on death's row | either. | | You have to spend a modicum of effort looking at how | predictions have _evolved_ over the past couple years, but once | you do, it 's very clear that mocking current AI systems makes | you look like a clown. | ckemere wrote: | There's also the timeline that: | | "Radiology will be automatized in 5 years" (10 years ago) | "Radiology will be automatized in 5 years" (5 years ago) | "Radiology will be automatized in 5 years" (last year) | | or | | "Full self driving will arrive within 5 years" (5 years ago) | "Full self driving is still a ways off" (last year) | | Assuming you're referring to generative models, I don't think | that anyone (knowledgable) thinks that graphic design or | creative writing are on death's door. They might change with | new tools, but skilled practitioners are still required. | That's basically the point of the article. | chatterhead wrote: | We are 18 years from the DARPA Grand Challenge and none of | the vehicles finished. | | Do you think a self-driving car can make it from LA to NYC | by itself now? | | What do you think 2040 AI will look like? | lostlogin wrote: | Having seen some of the automation available in radiology, | I'm a bit baffled as to why I still have a job as an MRI | tech. | | 5 years ago I watched automated cardiac MRI, and it worked | well. I was told about a site that were having good results | with fetal cardiac MRI via a related bit of software. | | These scans are hard to do, and the machines did well. In | some cases they got confused and did a good functional | analysis but of the stomach, not the heart. Oops, but | easily fixed by almost anyone after a few minutes of | explanation. | | Why are basic MSK scans still done by a tech with years of | training? | | I don't know the answer to that as it's basic stuff and if | I end my career without machines having taken over the | basic stuff, I'll be a bit disappointed. | esel2k wrote: | But even if all these analysis would be dine | automatically, I guess you won't be out of a job soon | (good news I guess). But just different: I did in the | automation of the diagnostic lab and what happened is | that from a detective style job, today it is more about | running a factory. 24h running a business, turn around | times and have less and less qualified personnel to fill | the machines... | dr_dshiv wrote: | I'm shocked. SHOCKED. | bitL wrote: | OK, AI is bad but compare it to human doctors/radiologists that | are often worse. I still remember stats from some X-ray detection | where AI diagnosed with 40% accuracy and the best human doctors | with 38% accuracy (and median human doctors with 32% accuracy). | Now what are we supposed to do? | pcurve wrote: | Can you cite the source? Is it not possible to improve the 40% | rate by AI? Obviously someone eventually figured out the 100% | dmurray wrote: | They might have "figured it out" by cutting the patient open. | yrgulation wrote: | Oh god (science for some of us) the same kind of logic for | defending tesla's fsd system. Both crappy and dangerous, but | with cult like following. | srvmshr wrote: | I worked in healthcare ML solutions, as part of my PhD & also as | consultant to a telemedicine company. | | My experience in dealing with data (we had sufficient, and | somewhat well labeled) & methods made me realize that a lot of | the prediction human doctors make are multimodal - and that is | something deep learning will struggle for the time being. For | example, say in detection of a disease _X_ , physicians factor in | blood work, family history, imaging, racial genealogy, general | symptoms (like hoarseness, gait, sweating etc), even texture & | palpitations of affected regions sometimes before narrowing down | on a set of assessments & making diagnostic decisions. | | If we just add in more dimensions of data to model, it just makes | the search space sparser, not easier. Throwing in more data will | likely just fit more common patterns & classes well, whereas a | large number of symptoms may be treated as outliers and | mispredicted. | | We humans are incredibly good at elimination of factors & | differential diagnosis. The findings don't surprise me. There is | much more work needing to be covered. For straightforward, and | conditions with limited, clear cut symptoms they are showing | promising advancements, but it cannot be trusted to wide arrays | of diagnosis - especially when models don't know what 'they do | not know'. | dekhn wrote: | are you really sure the doctors are doing a better job when | they go through the motions of incorporating a wide range of | data? Or do we just convince ourselves they're better? | | I suspect we massively underestimate the amount of misdiagnosis | due to incorrect analysis of data using fairly naive medical | mental models of disease. | majormajor wrote: | My view on this is framed a bit differently but probably a | similar ultimate perspective: | | I think it's probably going to be a long time before models | only using quantifiable measurements can even meet the | performance of top doctors. I can't recommend enough that | someone experiencing issues doctor-shop if they haven't | gotten a well-explained diagnosis from their current doctor. | | But I'm very curious how good one has to be in order to be | better than a below-average doctor, or a 50th-percentile | doctor, or a 75th... | | But I also think there may be weird failure modes similar to | today's not-fully-self-driving cars along the lines of "if | even the 75th-percentile-doctor uses the tool and sees an | output that stops them from asking a question they otherwise | might have, can it hurt things too?" | srvmshr wrote: | > But I'm very curious how good one has to be in order to | be better than a below-average doctor, or a 50th-percentile | doctor, or a 75th. | | In dermatology, on which I was working, models were better | (at detecting skin cancers) than 52% of the GPs, going by | just images. In a famous Nature paper by Esteva et al., the | TPR was at 74% for detecting Melanomas. There is a catch | which probably got underreported (The skin cancer | positivity rate was strongly correlated to clinical | markings in photos. Their models didn't do quite as well | when 'clean', holdout data were used). | | But the nature of information in all these models were skin | deep (pun intended). They were designed with a calibrated | objective in place unlike how we approach clinical | diagnostics as open ended problems for the doctors. | srvmshr wrote: | > _Are you really sure the doctors are doing a better job | when they go through the motions of incorporating a wide | range of data? Or do we just convince ourselves they 're | better?_ | | Personal story: I was diagnosed with a rare genetic disease | in 2019. If I ran the symptoms through a ML gauntlet, I would | be sure they would cancel each other out or make little | sense. Chest CT (clean), fever (high), TB test (negative), | latent TB marker (positive), vision difficulty (Nothing | unusual yet), edema in eye socket (yes), WBC count (normal), | tumors (none), hormones (normal) & retina images (severely | abnormal) | | My condition was zeroed in within 5 minutes of a visitation | to a top retina specialist, after regular opthalmologists | were in a fix about two conflicting conditions. This was | differential diagnosis based even though genetic assay hadn't | returned yet, which also later came in favor. I cannot | overemphasize enough how good human brain is in recalling | information & connecting the sparse dots to logical | conclusions | | (I am one of 0.003% unlucky ones among all opthalmological | cases & the only active patient with that affliction in one | of the busiest hospitals in the country. My data is part of | the 36 people in a NIH study & opthalmo residents are | routinely called in to see me as case study when I go for | follow up quarterly). | pixl97 wrote: | How many specialists did you go to before it was | identified? | | How many other people with the condition were | misidentified? | | I only say this because of a family member with a rare | genetic condition. For years they were told it was | something else, or told 'it was in their head'. The family | member started a journal of their medical conditions and | experiences that was detailed then brought that to their PC | which whom then sent them to a specialist, this specialist | wasn't sure and sent them to another specialist that had a | 3 month wait. After 5+ years of living with increasing | severity of the condition it was identified. | | So, just saying, it's as much likely that the condition was | identified because you kept a detailed list (on paper or in | your mind) of the aliments and presented them in a manner | that helped with the final diagnosis. | srvmshr wrote: | > How many specialists did you go to before it was | identified? | | 2 opthalmo, 1 internal medicine, 1 retina super- | specialist & finally someone from USC Davey | | > How many other people with the condition were | misidentified? | | Historical data: I don't know. It is fairly divided | between two types, one being zoonotic & other to IL2 | gene. I am told this distinction of pathways was | identified in 2007. | | > [..] you kept a detailed list (on paper or in your | mind) of the aliments and presented them in a manner that | helped with the final diagnosis. | | I might have been a better informed patient but I went | with a complaint of pink eye, flu & mild light | sensitivity. Never imagined that visit would change my | life forever. Thank you though, for expressing your | concern & support | nostromo wrote: | I'm confused by your comment because these are exactly the type | of problems that humans generally really do a poor job | classifying. | jeffreyrogers wrote: | Most modern ML techniques do a poor job on these types of | problems too unless they have a lot of data (hence the | reference to sparsity) or assume structure that requires | domain specific modeling to capture. | fallingknife wrote: | The system itself should be built around these capabilities, | not the other way around. Instead of collecting data at regular | intervals we wait until symptoms to go to the doctor. This is | why the dataset is so sparse. | IdealeZahlen wrote: | Exactly this. The features (or limitations) of medical data | is inherent in the process of clinical practice, but this | seems to be oftentimes overlooked. | planetsprite wrote: | The solution to failures of AI in heathcare is transparency of | data. OpenAI's models work because they have virtually unlimited | data to train on. The scale of training data for doctor bots is | one millionth the size. Different countries, organizations, | universities need to be as open as possible sharing and | collaborating, realizing improvements in medicine benefits all of | humanity with almost no downsides. | dirheist wrote: | There should be a standardization committee tasked with | standardizing the collection of anonymized, semi-synthetic | medical data from hospitals/hospital networks. It seems like so | much research is just locked up in the IMS systems the | hospitals use for their patients and that never see the light | of day. | dekhn wrote: | You cannot imagine just how deep the medical data rabbit hole | goes. | | Already plenty of institutions have semi-standardized their | collect and do multi-hospital (typically research hospitals) | aggregation. Whether this data is any good as training data | for supervised or unsupervised algorithms is really | questionable. | tomrod wrote: | Technical (Honest) Solution: two holdouts | | 1. Involved in the build process | | 2. Never touched until paper metrics are being written, only run | once | | Realistically, unlikely to occur however due to the incentives | causing publication bias. | jerpint wrote: | Third (better) option: have a regulating body have a separate, | undisclosed test set. If you can't beat it, you can't deploy | your model. If you can beat it, you still need to have your | models peer reviewed and scrutinized | tomrod wrote: | This sounds simple yet I expect data governance will be the | bottleneck. | cm2187 wrote: | so the models that fail this one test never get published and | the models that succeed get published. And all you have done is | to publish a model that predicts that particular history, in | other words data fitting. | chatterhead wrote: | 'Brunelleschi had just the solution. To get around the issue, the | contest contender proposed building two domes instead of one -- | one nested inside the other. "The inner dome was built with four | horizontal stone and chain hoops which reinforced the octagonal | dome and resisted the outward spreading force that is common to | domes, eliminating the need for buttresses," Wildman says. "A | fifth chain made of wood was utilized as well. This technique had | never been utilized in dome construction before and to this day | is still regarded as a remarkable engineering achievement.' | | Brunelleschi was not an engineer he was a goldsmith. AI will | advance in the same way architecture did during the Renaissance. | By those with the winning ideas not with the right credentials. | | https://science.howstuffworks.com/engineering/architecture/b... | dm319 wrote: | As someone who works in healthcare, so much of what I read about | AI makes me think that the people who are enthusiastic about | healthcare AI don't have much experience doing it. | | The scenarios rarely seem to fit with what I'm actually | practicing. Most of medicine is boring, it is largely routine, | and if we don't know what's going on, it's because we're not the | right person to be managing the patient. Most of my time is spent | talking to people - patients, colleagues, family. I explain the | diagnosis, I talk about the plan, I am getting ideas of what the | patient wants and values, and then actioning it. I spend very | little of my time like Dr House pondering what the next most | important test to perform is for a patient who is confounding us. | lostlogin wrote: | I work in radiology with MRI as a tech. We use AI slightly | differently to the examples here, but it's changing a lot of | what we do. It's more about enhancing images than directly | about diagnosing. | | The image is denoised 'intelligently' in k-space and then the | resolution is doubled via another AI process in the image | domain (or maybe the resolution is quadrupled, as it depends on | how you measure it. Our pixel count doubles in x and y | dimensions). | | These are 2 distinct processes which we can turn on or off and | have some parameters which with we can alter the process. | | The result is amazing and image quality has gone up a lot. | | We haven't got a full grasp yet and have a few theories. The | vendors are also still getting to grips. | | We think the training data set turns out to have some weird | influences on requires acquisition parameters. For example, | parallel imaging factor 4 works well, 3 and 2 less so, which is | not intuitive. More acceleration being better for image quality | is not how MRI used to work (except in a few edge cases). | | Bandwidth, averages, square pixel, turbo factor and appropriate | TE matter a bit more than they did pre-AI. | | Images are now acquired faster, look better and sequence | selection can be better tailored to the patient as we have less | of a time pressure. | | I'd put our images up against almost anything I've seen before | as examples of good work. We are seeing anatomy and pathology | that we didn't previously appreciate. Sceptics ask if the | things we see are really there, but after some time with the | images the concern goes away and the pre-AI images just look | broken. | | In the below link, ignore Gain (it isn't that great), Boost and | Sharp are the vendor names for the good stuff. The brochure | undersells it. | | https://www.siemens-healthineers.com/magnetic-resonance-imag... | srvmshr wrote: | I did my Masters in NMR. Can confirm a lot of ML based plug- | and-play solutions are helping denoising k-space. | | Trivia: I am also one of the pulse sequence developers | affiliated to Siemens LiverLab package on Syngo platform :) | [Specifically the multiecho Dixon fat-water sequence]. SNR | improvement was a big headache for rapid Dixon echos. | lostlogin wrote: | Ha, small world. Thanks for your work, I used to use this | daily until a year ago, now my usage is less frequent. | | I guess Dixons are still a headache with their new k-space | stuff as Boost (the denoising) isn't compatible with it | yet. Gain is but looks distinctly lame when you compare it | Boost. | | We are yet to see the tech applied to breath hold sequences | (haste, vibe etc), Dixon, 3D, gradient sequences and | probably others. | | I'm looking forward to seeing it on haste and 3D T2s | (space) in particular. MRI looks very different today | compared to how it looked just 6 months ago. | | I'd compare it to the change we saw going from 1.5T to 3T, | just accelerated in how quickly progress is being made. | srvmshr wrote: | I have long since left collaboration with team at Cary, | NC. But all I can say there was a great deal of interest | in 3D sequence improvement by interpolation with known | k-space patterns like in the GRASE or PROPELLR sequence | for e.g. They also learned a good deal from working with | NYU's fastMRI | ggm wrote: | My partner had a clinician review her paperwork and say "why | are you here" explaining the enhanced imaging was leading to | tentative concerns being raised about structural change so | small it was below the threshold for safe surgical treatment. | | Moral of the story: the imaging has got so good that | diagnostics is now on the fringe of over diagnosing and the | stats need to catch up | lostlogin wrote: | This has been a thing for a long time, with MRI in | particular. | | It gets quite philosophical. To diagnose something you need | some pattern on the images. As resolution and tissue | contrast improves you see more things, and the radiologist | gets to decide if the appearance is something. | | When a clinician says there is a problem in some area of | anatomy and there is something on the scan, the radiologist | has to make a call. | | The great thing about being a tech is that making the call | isn't my job. I have noticed that keeping the field of view | small tends to make me more friends. | | A half imaged liver haemangioma, a thyroid nodule or a | white matter brain lesion as an incidental finding are a | daily occurrence at least. | esel2k wrote: | So much this. I just interviewed about 10 doctors in the space | of neurology and radiology to start some new projects. The | truth is most of the headaches are from insurance coverage | check or for radiologist for filling out correct reports. The | fancy AI stuff is with maybe a few exceptions due to the great | advancement imaging still far away from validation and I didn't | even start about it's usage and gotomarket. | | Most of the cases the doctors sees are boring / regular cases - | and problems like access to medical history is way more basic | but more prevalent. | [deleted] | ericmcer wrote: | That scenario sounds like it lends itself more to AI automation | than a Dr. House type one. | kbenson wrote: | I don't know, compassion and understanding and nuanced | understanding of individual desires when talking to someone | is not what I associate AI with in my mind, but being able to | assess sociological and cultural taboos and try to what a | patient actually wants rather then what they might initially | express seems like something I good doctor would get to | through explorative conversation. | junipertea wrote: | Maybe removing a human from the equation would lead to more | honest outcome? E.g. people google all sorts of issues more | earnestly than they would describe it to the doctors. The | bottleneck would be properly understanding what the user | intends, which might be out of reach. | ben_w wrote: | Indeed. Language has been historically difficult for AI, | but I think it's even tougher here -- language is less | and less reliable the further we get from a shared | experience, and this is a problem when describing our | experiences of our own bodies, and much worse when | describing our own minds. | | For example, when I was coming off an SSRI, I was | forewarned that I might get a sensation of "electric | shocks"; the actual experience wasn't like that, though I | could tell why they chose to describe it like that. | | How different is the tightness in the chest during a | heart attack from the tightness in the chest from | exercising chest muscles? | | I have no idea how doctors, GPs, and nurses manage this, | though they seem to have relatively little trouble. | dm319 wrote: | My experience of chatting with an internet chat-bot when | trying to get some help with a product gives me little | confidence we are close here. | | Edit: wording | [deleted] | rvz wrote: | Of course. No surprise there. Especially the ones made with 'Deep | Learning'. | | At this point, Each time AI and 'Deep Learning' is applied and | then scrutinised, it almost always concludes and tends towards | pure hype generated by investors and output garbage unexplainable | results from broken models. The exact same goes for the self- | driving scam. | | 'AI' is slowing starting to be getting outed as an exit scam. | johannes_ne wrote: | I recently published a paper, where we explain how an FDA | approved prediction model, build into a widely used cardiac | monitor was developed with an incredibly biased method. | | https://doi.org/10.1097/ALN.0000000000004320 | | Basically, the training and validation data was engineered so an | important range for one of the predictor variables was only | present in one of the outcomes, making perfect prediction | possible for these cases. | | I summarize the paper in this Twitter thread: | https://twitter.com/JohsEnevoldsen/status/156164115389992960... | baxtr wrote: | Sorry for asking, but how is this relevant to the article? | NovemberWhiskey wrote: | Sorry for asking, but how is it _not_? | baxtr wrote: | Do you agree that it's ok to pose a question whenever you | don't understand? | csallen wrote: | Ironically, that's exactly what NovemberWhiskey is doing | here :) | johannes_ne wrote: | Fair question. The model we comment on both suffer from the | problem described in the article but also a more severe | problem: | | The developers sampled obvious cases og hypotension and | nonhypotension, and trained the model to distinguish those. | And also validated it on data that was similarly dichotomous. | In reality the outcome is often between these two scenarios. | | But worse, they also introduce a more severe problem where as | range of an important predictor is only available in the | hypotension outcome. | baxtr wrote: | Thanks for explaining! | roflyear wrote: | This has to be intentional no? | yrgulation wrote: | So many in ai chasing software solutions when the problem is | hardware. Limited power means limited learning. Mix lab grown | neurons with software and you have a wining proposition. | bjt2n3904 wrote: | This is what freaks me out about AI. | | People will use it for years in various fields, and one by one, | after a decade or so of use, they'll come to find it was complete | garbage information, and they were just putting their trust in a | magic 8 ball. | | But the damage is already done. | hdhdhsjsbdh wrote: | While the notion of treating these systems as "sociotechical" | rather than purely technical is probably a good move wrt actually | improving people's lives, I can say from my own experience in | academia that there are still way too many academics working in | this field who don't think it's their problem. I've personally | raised these types of issues before and been told "we're computer | scientists, not social scientists", as if "social scientist" is a | derogatory term. The biggest impediment here is, in my opinion, | overcoming the bloated egos of the people who think the social | impacts of their work are somehow out of scope. All is well as | long as you can continue to publish. | JHonaker wrote: | Preach. | | There are way too many people that conflate MSE or other | abstract technical measurements of model performance like they | actually represent the impact any model has on a problem. Even | if we could somehow perfectly predict an actual realization | instead of a conditional expectation that still forgets to ask | the question of why we predicted that. Are we exploiting | systemic biases, like historically racist policies? Almost | definitely (unless we've consciously tried to adjust for them, | and still we've probably done that incorrectly). I've become | much less interested in models that basically just interpolate | (very well I might add), and more in frameworks that attempt to | answer why we see particular patterns. | drtgh wrote: | My humble opinion; AI is supposed to be the acronym for | artificial intelligence, but marketing has usurped it to refer to | machine learning, which is nothing more than a neo-language for | defining statistical equations in a semi-automated way. An | attempt to dispense with mathematicians to develop models. | | What amount of energy is necessary for an event to be reflected | in a statistic? You have a box of 2x2 meters with balls of data, | and a string with a diameter of 1 meter with which to surround | the highest concentration of balls possible, and those that | remain outside, there they stay. Statistics and lack of precision | are concepts that go hand in hand (someones say even it is not an | science). | [deleted] | spywaregorilla wrote: | > My humble opinion; AI is supposed to be the acronym for | artificial intelligence, but marketing has usurped it to refer | to machine learning, which is nothing more than a neo-language | for defining statistical equations in a semi-automated way. | | Sure. Hardly controversial. | | > An attempt to dispense with mathematicians to develop models. | | What...? No. Definitely not. | | > What amount of energy is necessary for an event to be | reflected in a statistic? You have a box of 2x2 meters with | balls of data, and a string with a diameter of 1 meter with | which to surround the highest concentration of balls possible, | and those that remain outside, there they stay. Statistics and | lack of precision are concepts that go hand in hand (someones | say even it is not an science). | | I have no idea what this is saying. It sounds like you're | shitting on statistics all of a sudden, which is weird, given | that you seemed to favor mathematicians in the first part. | drtgh wrote: | >I have no idea what this is saying. It sounds like you're | shitting on statistics all of a sudden, which is weird, given | that you seemed to favor mathematicians in the first part. | | Mathematicians are specialized in problem solving, and as | humans, their ability to predict and analyze data makes them | more reliable developing models than a statistical equation. | They have quite more tools than statistics one. | | Someway, it is like if using the acronym AI to define | statistical algorithms leads to a false sense of greater | reliability than such human review, or even that it is not | needed a deep human review. ML statistics takes algorithms | out of the oven long before mathematicians does, at the | expense of a big in accuracy difference. | | The problem I think is people may take important decisions | based in the result of such statistical algorithms without | questioning | spywaregorilla wrote: | I don't think most mathematicians have spent a great deal | of time analyzing data tbh. Unless you mean statisticians. | naniwaduni wrote: | It's not that AI has been conflated with machine learning-- | those are words that are _supposed_ to refer to the same thing. | The confusion is conflating either with slapdash applied | statistics. | dekhn wrote: | Statistics is not science- it's an application of probability | theory and some other forms of math to hypothesis selection | (among other things). | | It's scientific. We only use stats because that's the best | method for dealing with imprecise and noisy data. | | Statistical thermodynamics contains all the necessary tools you | need to answer your balls in a box question. | drtgh wrote: | >Statistical thermodynamics contains all the necessary tools | you need to answer your balls in a box question | | The balls in a box example shows how ML statistics work. The | string is adjustable, it can be adapted to different | contours, but you have to discard data. | | How do you compensate for the inclusion of data in the model | without discarding others? The string has a limit in diameter | by design, and you need to know the content of most of the | data to make good decisions. | charcircuit wrote: | >which is nothing more than a neo-language for defining | statistical equations in a semi-automated way. | | That's why it's called artificial intelligence. | jfghi wrote: | Having built models, I'd claim that it's art based upon | science, perhaps not too different than engineering a building. | At every stage there are decisions to be made with tradeoffs. | Over time, the resulting model could be invalidated or perhaps | perform better. It's remarkably difficult to approach or even | define a "best" model. | | What's most peculiar to me is that somehow AI is becoming more | distinct from math or stats and that there's a notion by | running pytorch one is able to play god and create sentience. | ben_w wrote: | > Statistics and lack of precision are concepts that go hand in | hand (someones say even it is not an science). | | Statistics is the mathematics of being precise about your level | of imprecision. It's fairly fundamental to all science, and has | been for a while now. | Grothendank wrote: | Color _me_ , personally, surprised. Between publication bias and | the general public ignorance of AI and its evolving capabilities, | and over a decade of results in AI health being overblown before | transformers, how could we have predicted that post-transformer | results in AI health would _continue_ to be overblown? | fasthands9 wrote: | It is still unclear to me exactly what data they were looking | at/referring to in this article. | | If you take into account bloodwork, family history, demographics, | etc. then it seems like you are still only getting a few dozen | data points. At this scale it seems like traditional statistics | or human checks for abnormalities are going to be about as good. | | Although I personally know very little (apologies for | conjecturing) it does seem like there could be a lot of uses for | AI for specific diagnosis. For example, when they take your blood | pressure/heartbeat they only get data for one particular moment | where you are sitting in a controlled environment. I would think | if you had a year's worth of data (along with activity data from | an apple watch) you might be able to diagnose/predict things that | traditional doctors/human analysis could not. | | I would also imagine anything that deals with image analyzing | (like looking for tumors in scans) will be vastly better with | computer AI systems than humans. | bmh100 wrote: | The issue with data leakage can be handled through k-fold cross- | validation, in which all of the data takes turns as either | training data or test data. | photochemsyn wrote: | Not that surprising. AI learning seems to do best with fairly | predictable systems, and when it comes to individual outcomes in | medicine, there's a lot of mystery involved. A group of people | with similar genetic makeup and exposure history to carcinogens | or pathogens won't all respond identically - some get persistent | cancers, some get nasty infections, and some don't. | | For example, training an AI on historical tidal data would likely | lead to very good future tide timing and height predictions, | without any explicit mechanistic model needed. Tides have high | predictability and relatively low variability (things like | unusual wind patterns accounting for most of that). | | In contrast, there are some current efforts to forecast | earthquakes by training an AI on historical seismograph data, but | whether or not these will be of much use is similarly | questionable. | | https://sciencetrends.com/ai-algorithms-being-used-to-improv... | tensor wrote: | This is entirely unsurprising and has a very simple solution: | keep adding more data. Our measurements of the accuracy of AI | systems are only as good as the test data, and if the test data | is too small, then the reported accuracies won't reflect the true | accuracies of the model applied to wild data. | | Basically, we need an accurate measure of whether the test data | set is statistically representative of wild data. In healthcare, | this means that the individuals that make of the test dataset | must be statistically representative of the actual population | (and also have enough samples). | | An easy solution here is that any research that doesn't pass a | population statistics test must be up-front declared to be "not | representative of real word usage" or something. | blackbear_ wrote: | From the article: | | > Here's why: As researchers feed data into AI models, the | models are expected to become more accurate, or at least not | get worse. However, our work and the work of others has | identified the opposite, where the reported accuracy in | published models decreases with increasing data set size. | spywaregorilla wrote: | That's not a contradiction per se. It's easier to get | spurriously high test scores with smaller datasets. It does | not clearly demonstrate that the models are actually getting | worse. | dirheist wrote: | But if diagnosis are multimodal and rely upon large, | multidimensional analysis of symptoms/bloodwork/past | medical history, wouldn't adding more dimensions just | increase dimensional sparsity and decrease the useful | amount of conclusions you are able to draw from your | variables? | | It's been a long time since I remember learning about the | curse of dimensionality but if you increase the amount of | datapoints you collect by half you would have to quadruple | the amount of samples you have to retrieve any meaningful | benefit, no? | tappio wrote: | You are right, but I feel you misunderstood op. | | I understood that op meant increase number of samples, | not variables. | spywaregorilla wrote: | I did mean samples (n size) not the number of features. | But also, no your point isn't right. If you have a ton of | variables, you'll be better able to overfit your models | to a training set (which is bad). However, that's not to | say that a fairly basic toolkit can't help you avoid | doing that even with a ton of variables. What really | matter is the effect size of the the variables you're | adding. That is, whether or not they can actually help | you predict the answer, distinctly from the other | variables you have. | | Stupid example: imagine trying to predict the answer of a | function that is just the sum of 1,000,000 random | variables. Obviously having all 1,000,000 variables will | be helpful here, and the model will learn to sum them up. | | In the real world, a lot of your variables either don't | matter or are basically saying the same thing as some of | your other variables so you don't actually get a lot of | value from trying to expand your feature set mindlessly. | | > if you increase the amount of datapoints you collect by | half you would have to quadruple the amount of samples | you have to retrieve any meaningful benefit, no? | | I think you might be thinking about standard error. Where | you divide the standard deviation of your data by sqrt of | the number of samples. So quadrupling your sample size | will cut the error in half? | rscho wrote: | > This is entirely unsurprising and has a very simple solution: | keep adding more data | | Nope. Won't work. Biased data made bigger results only in bias | confirmation. Which is the real problem. | rjdagost wrote: | If there's one thing I learned with biomedical data modeling | and machine learning, it's that "it's complicated". For | biomedical scenarios, getting more data is often not simple at | all. This is especially the case for rare diseases. For areas | like drug discovery, getting a single new data point (for | example, the effect of a drug candidate in human clinical | settings) may require a huge expenditure of time and money. | Biomedical results are often plagued with confounding | variables, hidden and invisible, and simply adding in more data | without detection and consideration of these bias sources can | be disastrous. For example, measurements from lab #1 may show | persistent errors not present in lab #2, and simply adding in | more data blindly from lab #1 can make for worse models. | | My conclusion is that you really need domain knowledge to know | if you're fooling yourself with your great-looking modeling | results. There's no simple statistical test to tell you if your | data is acceptable or not. | dm319 wrote: | I think this is a key point - the training set is very | important, because biases, over-curation, or wrong contexts | will mean the model may perform very poorly for particular | scenarios or demographics. | | I can't find the reference now of a radiology AI system which | had a good diagnosis rate of finding a pneumothorax on a chest | x ray (air in the lining of the lung). This can be quite a | serious condition, but is easy to miss. Turns out that the | training set had a lot of 'treated' pneumothorax. The outcome | was correct - they did indeed have a pneumothorax, but they | also had a chest drain in, which was helping the prediction. | | Similar to asking what the demographic of training set is, is | what the recorded outcome was. How was the diagnosis made. | There is often no 'gold standard' of diagnosis, and some are | made with varying degrees of confidence. Even a post-mortem | can't find everything... | cm2187 wrote: | So a model calibrated on a backtest says nothing about its | predictive capacity. Who would have thought? Well, at least I | think anyone who worked even a little bit in quantitative | finance. The only way to validate a model is to make predictions | and test if those predictions actually happen in a repeatable | way, which in certain circles is refered to as "experiment". | | That's why I distrust any model built purely on backtested data | unless they can be shown to predict something else than history. | And AI is not the only area that blindly trusts those kind of | models. | rscho wrote: | Surprise, surprise. People hugely overestimate the data retrieval | capabilities of healthcare systems. And if you really put | clinical 'AI' systems to the test in day-to-day settings (which | is in fact never done), results would be much, much worse. | | Shit data in, shit prediction out. ___________________________________________________________________ (page generated 2022-10-21 23:00 UTC)