(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Credibility assessment of patient-specific computational modeling using patient-specific cardiac modeling as an exemplar [1] ['Suran Galappaththige', 'Center For Devices', 'Radiological Health', 'Us Food', 'Drug Administration', 'Silver Spring', 'Maryland', 'United States Of America', 'Richard A. Gray', 'Caroline Mendonca Costa'] Date: 2022-11 Abstract Reliable and robust simulation of individual patients using patient-specific models (PSMs) is one of the next frontiers for modeling and simulation (M&S) in healthcare. PSMs, which form the basis of digital twins, can be employed as clinical tools to, for example, assess disease state, predict response to therapy, or optimize therapy. They may also be used to construct virtual cohorts of patients, for in silico evaluation of medical product safety and/or performance. Methods and frameworks have recently been proposed for evaluating the credibility of M&S in healthcare applications. However, such efforts have generally been motivated by models of medical devices or generic patient models; how best to evaluate the credibility of PSMs has largely been unexplored. The aim of this paper is to understand and demonstrate the credibility assessment process for PSMs using patient-specific cardiac electrophysiological (EP) modeling as an exemplar. We first review approaches used to generate cardiac PSMs and consider how verification, validation, and uncertainty quantification (VVUQ) apply to cardiac PSMs. Next, we execute two simulation studies using a publicly available virtual cohort of 24 patient-specific ventricular models, the first a multi-patient verification study, the second investigating the impact of uncertainty in personalized and non-personalized inputs in a virtual cohort. We then use the findings from our analyses to identify how important characteristics of PSMs can be considered when assessing credibility with the approach of the ASME V&V40 Standard, accounting for PSM concepts such as inter- and intra-user variability, multi-patient and “every-patient” error estimation, uncertainty quantification in personalized vs non-personalized inputs, clinical validation, and others. The results of this paper will be useful to developers of cardiac and other medical image based PSMs, when assessing PSM credibility. Author summary Patient-specific models are computational models that have been personalized using data from a patient. After decades of research, recent computational, data science and healthcare advances have opened the door to the fulfilment of the enormous potential of such models, from truly personalized medicine to efficient and cost-effective testing of new medical products. However, reliability (credibility) of patient-specific models is key to their success, and there are currently no general guidelines for evaluating credibility of patient-specific models. Here, we consider how frameworks and model evaluation activities that have been developed for generic (not patient-specific) computational models, can be extended to patient specific models. We achieve this through a detailed analysis of the activities required to evaluate cardiac electrophysiological models, chosen as an exemplar field due to its maturity and the complexity of such models. This is the first paper on the topic of reliability of patient-specific models and will help pave the way to reliable and trusted patient-specific modeling across healthcare applications. Citation: Galappaththige S, Gray RA, Costa CM, Niederer S, Pathmanathan P (2022) Credibility assessment of patient-specific computational modeling using patient-specific cardiac modeling as an exemplar. PLoS Comput Biol 18(10): e1010541. https://doi.org/10.1371/journal.pcbi.1010541 Editor: Daniel A. Beard, University of Michigan, UNITED STATES Received: April 21, 2022; Accepted: September 2, 2022; Published: October 10, 2022 This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication. Data Availability: There are no primary data in the paper; code used in the simulation studies is available at https://doi.org/10.5281/zenodo.6476245. Funding: PP received funding for this study from FDA's Critical Path Program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. 1. Introduction Patient-specific computational models (PSMs) are computational human models that have been personalized to specific patients, rather than representing a generic, synthetic, or average patient. For example, over the course of many decades the field of computational heart modeling has transitioned from modeling electrical activity in single cells only, to modeling propagation of electrical waves in slabs of tissue, and then modeling the whole organ. Previously, generating a new model of a unique individual was a very time-consuming process. Now, it is possible to generate a personalized model from clinical imaging data rapidly and automatically, and then simulate the patient’s heart activity, all over the course of a few hours. Patient-specific modeling has also reached maturity in the field of fractional flow reserve estimation, for example software devices that generate personalized CT image-based models of coronary flow for functional evaluation of coronary arterial disease have been cleared in the US market [1]. Other applications of PSMs with devices on the market include preoperational planning and sizing for neurovascular surgery [2], and non-invasive mapping of heart surface electrical activity [3]; for more information see [4]. The technological advances that underly these devices and others have opened the door for patient-specific models being used as standalone medical device software, within medical devices, or as tools for evaluating medical products. Patient-specific modeling also form the basis of digital twins. In healthcare applications, the term ‘digital twin’ is sometimes used synonymously with patient-specific model. Alternatively, it has been defined more precisely as a “comprehensive, virtual tool that integrates coherently and dynamically the clinical data acquired over time for an individual using mechanistic and statistical models” [5]. The potential healthcare benefits of PSMs across all of these applications is enormous. However, using PSMs in safety-critical applications requires careful evaluation of model credibility–defined as the trust, based on available evidence in the predictive capability of a computational model [6]. Credibility of computational models for medical products has been the subject of considerable recent interest for the medical device industry. Credibility assessment involves several activities including verification (the process of determining if a computational model is an accurate implementation of the underlying mathematical model), validation (the process of determining the extent to which a computational model is an accurate representation of the real-world system that is being modeled) and uncertainty quantification (UQ; the process of characterizing uncertainties in the model, e.g., in model parameter values due to measurement error or population variability, and then computing the resultant uncertainty in model outputs). Verification can be broken down into code verification, which tests for potential software bugs, and calculation verification, which estimates numerical errors due to spatial or temporal discretization. A milestone event was the publication of the ASME V&V40 Standard in 2018 [6], which was the culmination of a multi-year collaboration involving modeling experts across the medical device industry and the Center for Devices and Radiological Health (CDRH) at the US Food and Drug Administration (FDA). This Standard provides a risk-based framework for evaluating the credibility of a computational model across device applications and was the first (and remains the only) such Standard in the medical product space. Briefly, the workflow in V&V40 is as follows. First, there are three preliminary steps: (i) defining the ‘question of interest’, that is, the specific question about the real world (e.g., regarding the medical device or a patient) that a model will be used to address; (ii) defining the ‘context of use’ (COU), that is, how exactly the model will be used to address the question of interest, and (iii) performing a risk assessment to characterize the risk to patients in using the model to address the question of interest. V&V40 then defines a number of ‘credibility factors’, which are factors to be considered when planning verification, validation and uncertainty quantification (VVUQ) activities. The list of categories of credibility factors is: Code verification credibility factors: software quality assurance and numerical code verification. Calculation verification credibility factors: discretization error, numerical solver error and use error. Validation credibility factors regarding the model: model form and model inputs (both broken down in sub-factors). Validation credibility factors regarding the comparator (i.e., the real-world data the model is compared to): test samples and test conditions (both broken down in sub-factors). Validation credibility factors regarding the comparison process: equivalency of inputs and output comparison (latter broken down in sub-factors). Applicability credibility factors which assess the relevance of the validation results to the COU: relevance of the quantities of interest and relevance of the validation activities to the COU. For each credibility factor, V&V40 describes how users should define a ‘gradation’ of activities of increasing level of investigation. For example, for the ‘software quality assurance’ (SQA) credibility factor, a simple gradation is: (a) no SQA performed; (b) unit testing performed; (c) full SQA adhering to SQA Standards. V&V40 provides example gradations for each credibility factor. After defining a gradation, V&V40 describes how users should select a target level from the gradation based on the risk assessment. See (6) for full details. A wide range of medical device models were considered during the development of V&V40; however, while it is clear how to apply the preliminary steps of credibility assessment approaches as in V&V40 regarding the question of interest, risk and COU to PSMs, it may not be clear what the unique characteristics of PSMs are when performing credibility assessments and how they can be considered in the subsequent stages of V&V40. In fact, how best to evaluate the credibility of PSMs has largely been unexplored; we are not aware of any article in the literature that identifies and discusses the unique considerations that arise when evaluating PSMs. The aim of this paper is to understand the unique considerations of credibility assessment of medical image based PSMs. By medical image based PSMs, we refer to a class of commonly developed PSMs that use medical imaging data (and potentially other patient data) as input, generate a patient-specific geometry and simulate a physical or physiological process on that geometry, typically using partial differential equations. For the remainder of this document, ‘PSMs’ refers to medical imaged based PSMs. The following applications are in scope: PSMs developed as software tools that can be applied to any new patient, and PSMs developed to create a ‘virtual cohort’ of patients for in silico medical device testing. However, virtual cohorts that have been extended by generating new ‘synthetic patients’ (e.g., by varying parameter values within ranges observed in the real patient population) are not within the scope of this paper, since in this case the synthetic patients do not correspond to any real patient (i.e., are not technically PSMs). We use cardiac electrophysiological (EP) modeling as an exemplar. Cardiac EP modeling is a mature field with applications in clinical tools, medical device evaluation and drug safety evaluation, that requires processing of disparate sources of data and solving complex multiscale models. We believe that many of the challenges/nuance in assessing other medical image based PSMs will arise for cardiac PSMs, justifying using this field as an exemplar. Fig 1 illustrates the components of a model of cardiac EP. A fundamental component is the cell model, which is generally a system of ordinary differential equations that predict the time course of the transmembrane voltage (the action potential) and other cellular quantities, sometimes in response to external electrical stimuli (i.e., pacing or defibrillation). Many cell models have been developed, differing in which sub-cellular processes are included [7]. Cell models may have dozens of state variables and hundreds of parameters. To simulate electrical activity in tissue, the cell model is coupled to partial differential equations which govern electrical propagation in excitable tissue, typically the monodomain or bidomain equations [8]. To simulate electrical activity in the organ (atria, ventricles, or entire heart), the following are usually specified. (i) A computational mesh of the anatomy, typically generated from imaging data. (ii) Regions of non-excitable infarct scar, also determined from imaging. (iii) Regions of border zone (BZ) tissue. These are transient regions between scar and healthy tissue that are excitable but have different properties to healthy tissue. (iv) Fiber and sheet directions. These are orthogonal vector fields on the geometry indicating principal directions of conductivity. (v) Tissue conductivities in the fiber, sheet and normal-to-sheet directions. (vi) A stimulus protocol, such as apical pacing, pacing to replicate normal sinus rhythm, pacing at cardiac resynchronization therapy lead locations, etc. Other factors such as regional heterogeneities may also be included in the model. Previously, we considered VVUQ of generic cardiac models [9–12]; here we extend these works to cardiac PSMs. PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. Overview of cardiac electrophysiological models. https://doi.org/10.1371/journal.pcbi.1010541.g001 We begin in Section 2 by reviewing cardiac EP PSMs, identifying modeling options that are unique to PSM but not applicable to generic models. We then consider how VVUQ applies to PSMs. In Section 3 we perform two simulation studies motivated by the results of Section 2 using a virtual cohort of 24 models incorporating patient-specific left ventricular geometry. The first is related to verification and assesses if discretization error varies across patients, and the second involves investigating how uncertainty in personalized and non-personalized inputs impacts conclusions of a virtual cohort simulation study. The survey and study results are then used in Section 4 to identify how important characteristics of PSMs can be considered when assessing credibility with the approach of the ASME V&V40 Standard. 4. Evaluating patient-specific models using ASME V&V40 In this section we use the results of Sections 2 and 3 to identify how important characteristics of PSMs can be considered when assessing credibility with the approach of ASME V&V40. We consider how PSM characteristics apply to ASME V&V40 credibility factors and provide example gradations for PSMs (recall that ASME V&V40 requires the user to define gradations; all gradations we provide are examples only). Given the differences between PSM-CTs vs PSM-VCs as discussed in Section 2, we consider both cases separately. The below is not cardiac EP specific and we expect it to be applicable to PSMs in many other disciplines, especially other medical-imaging based PSMs. However, since we have only considered cardiac PSMs there may be missing considerations for other disciplines. Table 5 summarizes our observations which are discussed below. PPT PowerPoint slide PNG larger image TIFF original image Download: Table 5. Summary of observations on considerations for PSM credibility assessment in relation to V&V40 credibility factors and gradations. PSMs as clinical tools (PSM-CT) or PSM virtual cohort studies (PSM-VC) are considered separately. https://doi.org/10.1371/journal.pcbi.1010541.t005 For code verification credibility factors, software quality assurance and numerical code verification, our review in Section 2.2.1 does not raise any unique PSM considerations. However, for calculation verification, we discussed in Section 2.2.1 and demonstrated in Section 3.1 how calculation verification results can vary across patients, which means it may be important to consider number and range of patients when assessing discretization error and numerical solver error. Example gradations for the discretization error and numerical solver credibility factor are provided in Table 6 and S1 Text, Section S2, respectively, that account for these. PPT PowerPoint slide PNG larger image TIFF original image Download: Table 6. Original V&V40 gradation for the ‘Discretization Error’ credibility factor, and example possible gradations for PSM-CT and PSM-VC. Changes highlighted in bold. [Original V&V40 example gradations reprinted from ASME V&V 40–2018, by permission of The American Society of Mechanical Engineers. All rights reserved]. https://doi.org/10.1371/journal.pcbi.1010541.t006 Regarding the use error credibility factor, we discussed in Section 2.2.1 how for PSM-CTs with manual stages, the user (either clinician or remote operator) may need to make subjective decisions for some inputs (e.g., in the image segmentation stage), in which case there is potential for both intra- and inter-user variability. Therefore, for PSM-CT with manual stages, this is a potential source of unreliability. To ensure that this is accounted for, a use error gradation could include assessment of both intra- and inter-user variability at the higher levels of rigor. Alternatively, the factor could be broken down in two sub-factors, for example use error–objective inputs and use error–subjectively-chosen inputs. An example gradations for the latter is provided in S1 Text, Section S2. Next, we consider the ‘validation–model’ credibility factors. For the model form credibility factor our review in Section 2.2.3 does not raise any unique PSM considerations. However, we observed in our review in Section 2.2.3 the myriad possibilities for performing sensitivity analysis (SA) and uncertainty quantification (UQ) for PSMs, and discussed the importance of distinguishing between SA/UQ for personalized vs non-personalized inputs. We performed a virtual cohort UQ study in Section 3.2, which illustrates how the level of rigor in SA/UQ is dependent on the number of inputs analyzed, the rigor in quantifying the input uncertainty, the number of patients considered, and the outputs considered. One option is to define a one gradation which covers all of these; an alternative option is to define a single SA/UQ credibility factor with multiple sub-factors. S1 Text, Section S2 provides example gradations for the latter option. The next credibility factors are those related to the comparator. ASME V&V40 defines comparator credibility factors related to ‘Test samples’ and ‘Test conditions’. Each are broken into 4 sub-factors, listed in Table 5. In Section 2.2 we discussed how there are a variety of approaches that can be taken to validate a PSM-CT or a PSM-VC, with three examples provided for each. In situations such as cases 1, 2, 4 and 5 in Section 2.2, the test samples sub-factors could be interpreted as listed in Table 5. For example, the ‘quantity of test samples’ sub-factor corresponds to the number of validation subjects used. Example gradations for these sub-factors are provided in S1 Text, Section S2. However, in general, how to interpret the comparator sub-factors, and appropriate gradations, will be dependent on the specific validation activities chosen. The same applies for the test conditions sub-factors. For medical device models, greater credibility is possible when the validation experiments subject the device under test to a wide range of external conditions (e.g., external loading or heating). In clinical studies, the feasibility of controlling test conditions will vary significantly between studies; for many studies it may not be possible or ethical to vary the imposed condition. Consequently, we have not attempted to provide example gradations for the test conditions sub-factors because we expect appropriate gradations to be heavily dependent on the specific validation activities performed. For the remaining validation credibility factors (‘validation comparison’), no unique PSM considerations were identified from Sections 2 and 3. Finally, considering applicability credibility factors, for the factor “relevance of validation activities to COU”, there may be unique PSM-specific considerations, though it depends on the specific validation approach chosen. When validating against clinical data, an applicability question is “how representative are the validation subjects to the full patient population?” (PSM-CT; cases 1–3 in Section 2.2) or “how representative are the validation subjects to the full virtual cohort?” (PSM-VC; cases 4–5 in Section 2.2). A gradation for the relevance of validation activities to COU factor could be defined to account for this. Alternatively, a new applicability sub-factor “Representativeness of validation subjects” could be defined. 5. Discussion Patient-specific modeling is a new frontier in computational modeling for which the topic of credibility assessment has largely been unexplored. While there are many publications evaluating the predictive capability of a particular patient-specific model [56–59], we are unaware of any previous work on the general topic of PSM credibility. Here we address this gap by providing the first, to our knowledge, general treatise on credibility assessment of general PSMs, within our scope of medical image based PSMs. Capitalizing on the maturity of cardiac modeling, we used this field as an exemplar to understand the nuance and complexities of evaluating PSMs. First, we reviewed methods utilized in the development of cardiac PSM, and applications of these models. We determined that the differences between PSMs workflows developed as clinical tools (PSM-CTs) vs a set of pre-computed PSMs forming a virtual cohort (PSM-VC), are so fundamental that all the subsequent analysis and discussion should distinguish between these two cases. The two cases are not comprehensive; other applications of PSMs that do not fit neatly into either category are possible. However, all publications reviewed fell into one of these categories, and important PSMs from other fields do so as well, for example Heartflow [1] is a PSM-CT and the Virtual Family [60] is a PSM-VC. Our review illuminated the range of approaches possible for generating PSMs. We then considered what each of verification, validation and uncertainty quantification means for PSMs. Verification was relatively straightforward, although we identified the importance of assessing error arising from inter- and intra-user variability for manual stages of PSM workflows. It is more complicated to characterize the validation and UQ process for PSMs. There are many potential approaches that could be taken to validate a PSM (some examples are provided in Section 2), and even categorizing these approaches is challenging, let alone defining general rules of good practice. Similarly, there is a range of options for performing UQ for a patient-specific model, and numerous choices need to be made, such as which inputs to explore (anatomical inputs, material parameters, functional parameters; personalized or non-personalized); how to estimate the input uncertainty; how many patients to consider; and what outputs should be analyzed. We emphasized the importance that any PSM-CT-derived clinical recommendation (e.g., implant an ICD vs do not impact an ICD) is insensitive to uncertainty in personalized inputs, and any PSM-VC-derived conclusion is insensitive to input uncertainties. In Section 3.1 we used a set of 24 personalized ventricular models to investigate how discretization errors vary across patients. Large differences in errors between patients would have indicated that single-patient mesh resolution studies are not sufficient to ensure credibility of PSMs (especially for high-risk applications). While we did not observe large differences in errors between patients on the lower resolution meshes, there was more variability for the higher resolution meshes. Of course, conclusions may be different for different models or different outputs; similar analyses for other cardiac EP outputs of interest, and for other PSMs, need to be performed before general conclusions can be drawn about the importance of multi-patient verification with PSMs. In general, our results support the need for multi-patient verification for higher-risk applications of PSMs, due to the variability observed using the higher-resolution meshes. Still, conducting mesh resolution studies with multiple patients is computationally expensive and for lower-risk applications there is a trade-off between insight gained vs computational expense. In Section 3.2 we explored the process of demonstrating that scientific results using a PSM-VC are not impacted by input uncertainty. We considered two inputs, one personalized (border zone region) and one non-personalized (conductivity). We first reproduced the findings of [22] that pacing in vicinity to scar increases repolarization dispersion versus distant pacing. (Note that we used the same geometries and pacing sites as [22], but different models, meshes and numerical solvers (see S1 Text, Section S1), and therefore being able to reproduce the findings of [22] demonstrates that with public data sets it is possible to reproduce simulation study results across platforms). In our statistical analysis of the results, we emphasized that performing a statistical test with only a perturbed set of results (e.g., only with the expanded BZ results) implicitly assumes that the BZ measurement error is biased in the same direction for all patients. A perhaps more likely case is unbiased errors–e.g., BZ too large for some patients, too small for others–which requires a statistical test accounting for uncertainty in measurements. We used a sampling approach to achieve this, but other methods are possible, see [61] and references therein for a discussion. It is interesting to observe that the uncertainty in the non-personalized input (conductivity) led to similar or greater output uncertainty compared to the personalized input (BZ extent). For PSMs, the ideal scenario is to choose which parameters to personalize based on sensitivity analysis during early-stage model development. Parameters that do not impact the output, when varied across their population range, do not need to be personalized, other parameters should be personalized. However, this is typically not feasible, instead which parameters are personalized is at least partially motivated by data-collection constraints. Therefore, post-development UQ, as performed here, may reveal if uncertainty in the fixed (non-personalized) inputs impacts predictions. That was indeed the case in this example. The gradations developed in Section 4 account for this observation. In Section 4 we identified how important characteristics of PSMs can be considered when assessing credibility with the approach of ASME V&V40. This section was based on our findings for cardiac models in Sections 2 and 3, but we expect it to be relevant to other medical image based PSMs. We considered which ASME credibility factors have unique PSM-related considerations, including providing example gradations for some factors, that cover a range of activities that could be performed when evaluating a PSM. General observations were mostly constrained to the verification and UQ related credibility factors. PSM considerations for validation will be heavily dependent on the specific validation approach taken. Since Section 4 was based on our review of cardiac PSMs in Section 2 and the two case studies in Section 3, there is potential for improving or refining the recommendations of Section 4 based on review of other modeling fields and results from further case studies, cardiac and otherwise. We hope such efforts are pursued in the future so that the relative importance of different credibility activities for PSMs continues to be uncovered. Overall, we believe the results of this paper will be useful to developers of cardiac and other medical image based PSMs, when assessing PSM credibility, and thereby will contribute to increased reliability and confidence in PSMs across medical specialties and for a wide range of PSM applications, from evaluating medical products to serving as clinical decision-making tools. Supporting information S1 Text. Supplementary material. The supplementary material document contains further details on the simulation studies and potential gradations for applying ASME V&V40 with patient specific models. https://doi.org/10.1371/journal.pcbi.1010541.s001 (PDF) Acknowledgments The authors would like to thank Martin Bishop (King’s College London), Brent Craven (FDA) and Kenneth Aycock (FDA) for information and feedback provided. Disclaimer The mention of commercial products, their sources, or their use in connection with material reported herein is not to be construed as either an actual or implied endorsement of such products by the Department of Health and Human Services. [END] --- [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010541 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/