(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Automated reporting of cervical biopsies using artificial intelligence [1] ['Mahnaz Mohammadi', 'School Of Computer Science', 'University Of St Andrews', 'St Andrews', 'United Kingdom', 'Christina Fell', 'David Morrison', 'Sheeba Syed', 'Department Of Pathology', 'Queen Elizabeth University Hospital'] Date: 2024-05 5.1 Patch level results discussion Comparing the patch level confusion matrices for balanced and imbalanced datasets with patch size 256 × 256, in Figs 7 and 8, the trained model on larger dataset (imbalanced) has an overall higher accuracy for both training and validation sets. However, looking at the percentage of correctly classified patches in each category, the model trained on the balanced dataset has performed better on classifying the low grade and high grade categories. This is expected as the number of the patches in all categories are similar in the balanced case and therefore the model ins not biased to a specific category. But as we want to use all the malignant patches in the training procedure, and there is not enough patches in low grade and high grade categories compared to malignant and normal categories, we make use of FocalLoss to handle the imbalanced class training by adjusting the weights for hard or easily misclassified examples. Although cell characteristics can be extracted from individual patches, but higher level structural information, such as the shape or extent of a tumour, can only be captured when analysing larger regions. Due to the limitation of the input size of the images that can be fed to CNNs, optimization challenges and graphical memory constraints, the whole WSI can not be fed to the CNN. To be able to make use of higher level structural information in the training, we decided to train the model with the largest patch size possible, extracted from WSIs. Hence, patches of 1024 pixels, the maximum size we could pass as input to the model considering the memory limitations, were extracted and used for training. Comparison of the results of training the model on two different patch sizes, in Figs 8 and 9, show the improvement in overall accuracy over the validation set for larger patch sizes. The model trained on larger patch sizes is also able to better distinguish between the normal and malignant patches, while there is still a lot of confusion classifying neighbouring categories, specifically low grade and high grade categories. The high grade and low grade categories are clinically so close and have a lot of histological overlap, and therefore confusion in distinguishing the patches of these categories was expected. In summary, some reasons for the problems in accurately classifying the categories appropriately can be outlined as following: The histological overlapping of features at patch level makes it difficult for the AI algorithms to distinct different categories or sub-categories from each other, which results in misclassifying the patches and therefore affecting the final slide level prediction. In case of imbalanced datasets, the number of normal and malignant patches extracted and used for training have been far more than the low grade and high grade patches and therefore the trained model have learned the distinction features of these categories better. Each category contains sub-categories, and patches extracted for each category can be imbalanced between the subcategories as well. Imbalanced subcategories patches can affect the ability of the model to learn one subcategory better than the other with fewer patches in the dataset. The annotation precision also can affect the patch level results. Due to the fragmented nature of cervical biopsies, it is quite difficult to annotate them at pixel level precisely and most of the time annotations from one category may contain other subcategories or even categories in them. This leads to more complexity in distinction between the patches, specifically with smaller patch sizes. Heatmaps generated at patch level, can support explainability of deep learning predictions in medical image analysis and provide clinicians with crucial visual cues that could ease their decision to accept or reject a deep learning based diagnosis. All the cases shown in subsection 2.3.2, are examples of the slides of different categories, that have been classified correctly. Comparing the truth label with prediction heatmaps of all these cases show, quite high percentage of patches have been classified correctly at patch level for slides containing only one sub-category in them (i.e. Figs 2, 3 and 4). This is an evidence of how well the patch classifier is able to distinguish between each of the categories and normal category. Figs 5 and 6 are multi-label case containing more than one category in them. Comparing the truth label and prediction heatmaps for these cases show, how patch level classifier have been confused classifying some patches from neighbouring classes. These confusions are most likely due to histological overlap of neighbouring classes. The other reason is due to fewer number of patches in training set for some categories, the patch level has not been able to learn the distinction between features of the patches of neighbouring categories. Slide level classifier has been able to resolve these confusions and make the correct final slide level classification for these cases. The slides, in Fig 10, are malignant slides that are misclassified as high grade. High-grade glandular abnormality (cervical glandular intraepithelial neoplasia (CGIN)), was put into the malignant category as it is usually treated more aggressively, but there can be histological overlap with some well differentiated adenocarcinomas that make it challenging to be differentiated. One possible solution was to consider CGIN and Adenocarcinoma as separate categories, but due to fewer cases of these two types, trying out this did not improve accuracy of our classification, and therefore we decided to keep the categories and sub-categories as were defined in Table 1. The first and third rows in Fig 10 show heatmaps for malignant slides (CGIN) misclassified as high grade. CGIN is an in-situ/preinvasive lesion and at patch level and has some nuclear features similar to the high grade areas. The red box in Fig 22 shows a fragmented CGIN area on the surface of the cervix. The features of it at patch level histologically overlaps with high grade CIN. The blue box shows a CGIN area which is not fragmented and correctly classified as malignant at patch level. There are malignant areas wrongly classified as high grade, as shown in Fig 10f in first and third rows, in blue colour. The histological overlapping of features at patch level as is mentioned in the beginning of this paragraph is the reason for this prediction and the probable reason for the slide level classifier to classify the slide wrongly at slide level. The second case (second row) in Fig 10, is also a malignant case (squamous carcinoma) which is the most common type of cervical cancer. Comparing the truth label (Fig 10a in second row) with the prediction heatmap (Fig 10f in second row) show that the patch level classifier has performed well in picking up the high grade and malignant abnormalities. There is a small invasion in the malignant area, pointed by arrows in the red box in Fig 23. The low grade areas predicted (green areas in Fig 10f in second row), are just normal squamous with reactive changes that can be histological overlap with low grade lesions at a patch level. The fourth case (fourth row) is a malignant (CGIN) slide. The predicted heatmap in Fig 10f in the fourth row, shows that malignant abnormality has been picked up correctly at patch level (the area shown by the red box in Fig 24). The squamous area underneath the malignant area is low grade/viral. The area annotated as high grade in Fig 10a in the fourth row, contains some low grade CIN/HPV (the area shown by the blue box in Fig 24) which is predicted correctly in Fig 10f in the fourth row. The other parts of high grade wrongly classified as low grade can be the result of histological overlap between low grade and high grade at patch level. Fig 11 shows examples of malignant slides are misclassified as normal. Most of these slides contain small fragments of tumour which are picked up correctly by the patch level classifier (small tumour fragments within blood and fibrinoid material, red boxes in Fig 25). There are other areas picked up by the classifier as malignant tissue that are non-diagnostic in isolation (blue box with necrosis in Fig 25). The proportion of malignant patches on these slides is much smaller than the proportion of normal patches. Even though the patch classifier has been able to classify the malignant patches on these slides, the slide level classifier has not been able to draw the final correct decision for these slides. [END] --- [1] Url: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000381 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/