1. Introduction
Cervical cancer has been responsible for registering 605,000 new cases in 2020, resulting in approximately 342,000 deaths worldwide. It is the fourth most frequently diagnosed cancer and the fourth leading cause of cancer death in women [
1,
2]. Over the past years, cytology screening tests have enabled a strong decrease in cervical cancer deaths, contributing to reducing its incidence by 60–90% and the death rate by 90% [
3]. Nevertheless, difficulties experienced by health facilities due to a shortage of specialized staff and equipment are increasing the interest in developing computer-aided diagnosis (CADx) systems for cervical screening.
A recent review article [
4] analyzed the approaches used for the tasks associated with examining microscopic images from cervical cytology smears, namely focus and adequacy assessment, region of interest segmentation, and lesion classification. Regarding segmentation and classification tasks, the authors point out that, despite the relatively good performance exhibited by binary or low-class classification approaches, the slow processing times and the considerable quantity of misclassifications or false positives reported for multi-class problems can make the algorithms unusable in a clinical environment. Additionally, the authors concluded that most works disregard adequacy assessment, while others only implement some techniques to detect and remove unwanted objects such as inflammatory cells or blood, with this topic being scarcely addressed in the literature.
Considering the limitations identified in the existing literature, the authors of the current work proposed a nucleus-based approach for the automated adequacy assessment of cervical cytology smears [
5]. In this work, major focus was given to the cellularity evaluation of the cytological samples since low squamous cellularity is the most common cause for the identification of specimens as unsatisfactory. In particular, the proposed approach automatically detects, counts, and calculates the average number of squamous nuclei in images from liquid-based cytology (LBC) samples and consequently classifies it as adequate or inadequate based on the cellularity threshold established in The Bethesda System (TBS)—a minimum of 5000 well-preserved squamous nuclei (3.8 per microscopic field at 40× magnification) to consider a specimen as adequate for diagnosis [
5].
Therefore, this paper aims to study the impact and feasibility of using a nucleus-based deep learning approach to detect different TBS classes of cervical lesions in mobile-acquired microscopic images of LBC samples. In particular, this work starts by contributing with a new annotated dataset of images from LBC samples digitalized with a portable smartphone-based microscope, which supported the development of a novel approach to detect the nuclei of cervical lesions on mobile-acquired microscopic images of cytology samples. Several experiments were conducted to optimize the performance of the developed lesion detection network, namely (i) hyperparameter optimizations, namely learning rate (LR) and batch size (BS); (ii) transfer learning optimizations through weight initialization from networks pre-trained on closer and distant application domains; (iii) detected class optimizations through the inclusion of normal squamous nuclei as a class detected by the model; and (iv) per-class tuning of post-processing parameters, like score threshold. A comparison between the performance achieved by the proposed nucleus-based methodology and a previous region-based work (which considered entire cells and cell aggregates as regions to detect) [
6] that used the same dataset for cervical lesion detection is provided, thus supporting the contributions of this work.
This paper is structured as follows:
Section 1 summarizes the motivation and objectives of the work;
Section 2 outlines the relevant related work present in the literature; in
Section 3, the datasets used are described;
Section 4 presents the methodology, including the system overview and the proposed approach to expand its capabilities from sample adequacy assessment to nucleus-based detection and classification of cervical lesions; and throughout
Section 5, results are drawn alongside the discussion. Finally,
Section 6 summarizes the developed work, followed by a conclusion and future work.
2. Related Work
Cell detection, segmentation, and counting are computer vision tasks well addressed in the literature. While all these tasks allow us to obtain the number of cells, the most suitable approach for a specific problem depends on the target goal. In particular, the detection task provides the localization in the form of a bounding box and the respective class, while density estimation only gives the final number of objects. Alternatively, segmentation approaches allow for obtaining the mask and respective class of the detected objects.
For such tasks, the state-of-the-art approaches proposed in the literature mostly rely on machine learning and deep learning methods. Works such as [
7,
8,
9] propose approaches based on deep learning, such as U-Net and feature pyramid network (FPN) networks, to perform cell detection and segmentation. A single-shot detector (SSD) in pair with a convolutional neural network (CNN) to localize and count different blood cell types was addressed in [
10]. Furthermore, microscopy cell counting based on density estimation employing fully convolutional regression networks was proposed in [
11]. In all approaches mentioned above, the authors reported results with performances comparable with human specialists.
Until [
5], a less explored field in the literature was related to automated smear adequacy assessment, i.e., the development of computational methods to ensure that cervical samples are adequate for further analysis. In [
12], the authors describe an AI-assistive diagnostic solution to improve cervical liquid-based thin-layer cell-smear diagnosis according to clinical TBS criteria. The developed system consists of five AI models which are employed to detect and classify the lesions. A You-Only-Look-Once (YOLO)v3 model was used for target detection, Xception, and patch-based models to cope with the high number of false positives detected and U-net for nucleus segmentation. The final classification was performed via two ensembled XGboost models, being developed and evaluated using a dataset of 80,000 LBC samples collected from five medical institutes. Regarding quality assessment, the procedure is applied to the entire sample and comprises focus, contrast, and quantitative cell evaluations. For this task, a simpler approach was followed: the Otsu thresholding method is initially used to separate the cells from the background, and the cell-to-overall-area ratio is then used to obtain a rough number of cells in the sample. The average accuracy of 99.11% was reported on the task of sample classification as satisfactory or unsatisfactory. However, it must be noted that TBS guidelines were not strictly followed since this method estimates the total number of cells in the sample, including other types of nuclei, aside from squamous nuclei, that should not be considered to assess sample cellularity.
Still on the topic of using deep-learning-based approaches to detect and classify cervical lesions, several recent works proved its feasibility to support cervical cancer screening, with proposed approaches that explored the usage of different deep convolutional neural networks ([
13,
14]) and architectures, such as MobileNet [
15,
16], EfficientNet [
15,
17], as well as newly proposed networks, like the series-parallel fusion network (SPFNet) [
18], Cervical Ensemble Network (CEENET) [
19], or EfficientNet Fuzzy Extreme-Learning Machine (EN-FELM) [
20]. Despite the promising results of these previous works, it should be noted that the vast majority do not take into account limitations like restricted computational resources to run the models. Additionally, most of them require the usage of high-end digital pathology whole-slide imaging (WSI) scanners, which are equipment not generally accessible in areas with limited access and restricted financial resources. As an alternative to regular microscopes and WSI scanners, the development of low-cost, portable microscopes that enable microscopy-based diagnosis has also emerged in the literature. In particular, and leveraged by the impressive evolution in the quality of the cameras, processing power, and memory, smartphone-based solutions are being explored to implement cost-effective platforms for microscopic inspection of samples [
21]. A wide range of applications has also been used to test the feasibility of affordable approaches based on smartphones, including the screening of blood smears [
22] or the detection of parasites [
23,
24,
25] and viruses [
26].
Regarding the cervical cytology use case, a device called µSmartScope [
27,
28] was adapted for the digitalization of cervical cytology samples [
29,
30], (
Figure 1). This device is a fully automated, 3D-printed smartphone microscope tailored to support microscopy-based diagnosis in areas with limited access. The device aims to decrease the burden of manual microscopy examination by being fully powered and controlled by a smartphone, in addition to the motorized stage.
Supported by such devices for cervical cytology, a recent work [
6] proposed a region-based approach for the mobile detection of cervical lesions. This work used the public dataset SIPAKMED and a new private dataset acquired with the uSmartScope (hereinafter referred as Region-Based Cervical Lesion Dataset), depicted in
Figure 1. Promising results for cervical cancer screening have been achieved using a Faster R-CNN model with a ResNet50 backbone while also focusing on being a cost-effective Internet of Things (IoT)-based solution. Nevertheless, the authors identified the low data volume and high structure variability of the region-based dataset as the major bottlenecks of the study. Thus, this work follows this research stream and explores the development of a nucleus-based approach for automated adequacy assessment and cervical lesion detection using LBC samples digitalized with the µSmartScope device that strictly follows the TBS guidelines on both tasks.
3. Dataset
Although there are some publicly available datasets with cervical cell annotations, such as Herlev [
31], SIPaKMeD [
32], Cervix93 [
33], and ISBI Challenges [
34,
35], they are not adequate for the tasks of nucleus and lesion detection. On the one hand, Herlev only contains isolated images of cells, with annotations by abnormality of the cell. Some datasets, such as the Cervix93, the ISBI Challenges, and the SipakMeD databases, comprise images of microscopic fields with information regarding the nucleus regions but do not provide annotations regarding the cervical lesions in those fields. In contrast, the more recent CRIC [
36] dataset includes images of microscopic fields with annotated cervical lesions, yet with no information concerning the nuclei structures.
In view of the shortcomings of the existing public datasets, recent works presented two additional datasets acquired with the µSmartScope device: (i) the Adequacy Assessment Dataset [
5], which consists of 41 samples with 42,387 manually annotated nuclei in terms of cell type, and (ii) the Region-Based Cervical Lesion Dataset [
6], which consists on 21 samples with 927 manually annotated regions in terms of cervical lesions (single cells and cell aggregates). Given that this work aims to develop a nucleus-based approach for cervical lesion detection, none of these datasets entirely fulfilled that purpose. Thus, a new dataset was created, hereinafter referred to as the Nucleus-Based Cervical Lesion Dataset.
3.1. Nucleus-Based Cervical Lesion Dataset
To create a new dataset annotated in terms of nuclei with cervical lesions, the following TBS classes were considered: atypical squamous cell of undetermined significance (ASC-US); low-grade squamous intraepithelial lesion (LSIL); atypical squamous cell, cannot rule out high-grade lesion (ASC-H); high-grade squamous intraepithelial lesion (HSIL); and squamous cell carcinoma (SCC). It should be noted that the Region-Based Cervical Lesion Dataset already provides annotations for these same classes, although not for nuclei but for entire cells and cell aggregates. Contrarily, the Adequacy Assessment Dataset provides nucleusi annotations but in terms of cell types, not cervical lesions.
Given that around 30% of the images in the Region-Based Cervical Lesion Dataset are also present in the Adequacy Assessment Dataset, the overlap between the annotation of these two datasets was explored. Only squamous nucleus annotations from the Adequacy Assessment Dataset were considered since the previously referred cervical lesion classes are only present on this type of cells. In the new dataset, the cervical lesion class of each squamous nucleus annotation inside an annotated cervical lesion region was considered equal to the class of the region annotation that encompassed it, as shown in
Figure 2.
To take advantage of the full extent of cervical lesion annotations from the Region-Based Cervical Lesion Dataset, an automatic annotation strategy was applied to the subset of images without overlap with the Adequacy Assessment Dataset. In particular, the best-performing model for squamous nucleus detection proposed in [
5] was used to detect the squamous nuclei on that subset, with the same process mentioned above being further applied to attribute the cervical lesion label to all nucleus detections inside annotated cervical lesion regions.
Figure 3 provides examples of nucleus and region annotations for each TBS class.
Table 1 depicts the final number of region and nucleus annotations per TBS lesion class for the Nucleus-Based Cervical Lesion Dataset. The higher number of nucleus annotations is justified by annotated regions in the Region-Based Cervical Lesion Dataset that encompass more than one cell, leading to several nucleus lesion annotations per region. Nevertheless, it can be observed that this dataset suffers from class imbalance for both nucleus and region annotations.
Regarding dataset split, the train/test division will be equal to the previously reported for the Region-Based Cervical Lesion Dataset [
6], including the usage of a patch-slicing operation (i.e., images are sliced into patches of fixed dimensions). Similarly to that previous work, the SCC and HSIL types of lesions were merged in a single class (HSIL-SCC) due to the demarcated under-representation of the SCC class and similar clinical diagnosis flow for both classes. The number of empty patches (i.e., the patches with no annotations or only normal annotations) used for training was balanced through the downsampling of these patches in the training data, and
Table 2 shows the final data distribution for the nucleus annotations. Even though the annotation type was refactored from regions to nuclei, the test set images are exactly the same, which allows us to make a fair comparison between the performance of the region-based approach from that previous work [
6] and the nucleus-based approach proposed in this work.
5. Results and Discussion
The results obtained during model training at the patch level on the validation set are depicted in
Figure 5. This figure merges the results achieved through three-fold cross-validation for the different optimization steps, namely hyperparameter tuning, transfer learning strategy, and detected classes adjustments. The results for each optimization step are separately discussed in the following sub-sections.
5.1. Training Optimizations
Considering the five different LR-BS combinations tested (see
Table 3), the hyperparameter combinations with indexes 1, 2, and 5 clearly provided the best results, with minor
[email protected] differences between them. Nevertheless, it was considered that the best and most consistent performance was achieved by combination 5 (LR = 4.862 × 10
−5 and BS = 16).
5.2. Transfer Learning Optimizations
Regarding the results of the transfer learning experiments through weight initialization, the fine-tuning of the network pre-trained on the Adequacy Assessment Dataset (NUCLEI) yielded better results when compared to the COCO experiments. With these experiments, it was concluded that transferring knowledge from a smaller dataset of a closer applications domain (i.e., a similar cervical cytology context) brought more benefits than using large-scale public dataset with categories distant from the target application domain. It should be noted that the COCO Dataset, despite its myriad object classes, does not include any microscopy images or cellular structures.
5.3. Detected Class Optimizations
To assess the impact of including and excluding normal squamous nuclei as a detected class, the per-class
[email protected] results were also examined (see
Figure 6), given the demarcated data imbalance imposed by the inclusion of this class (see
Table 1).
The experiments with the normal squamous nucleus class (+Sqm.) achieved slightly better
[email protected] for the ASC-US and ASC-H classes while simultaneously providing a lower overall standard deviation between cross-validation folds. Thus, the inclusion of the normal squamous nucleus class, which seems to help the learning process toward more robust discrimination of lesions’ nuclei, was selected
5.4. Data Augmentation
As depicted in
Table 2, the Nucleus-Based Cervical Lesion Dataset is highly imbalanced, with a positive correlation between the number of annotations for each class and the respective detection performance (see
Figure 6) being observable. Therefore, the underrepresented classes were augmented via random basic image transformations, such as 90-degree rotation, horizontal and vertical flips, blur, and sharpening.
Table 4 shows the number of annotations before and after the data augmentation procedure.
Using the augmented version of the dataset, a new set of experiments was performed by using the best transfer learning approach previously found (NUCLEI pre-trained model) and training with the five different LR-BS combinations detailed in
Table 3. The results depicted in
Figure 7 indicate that the data augmentation strategy did not improve detection performance.
One possible cause for this outcome might be the large volume of instances generated through basic image manipulation that was added to the highly underrepresented classes, which probably provided mostly redundant information. This leads us to conclude that the usage of data augmentation via basic image manipulations in the target scenario negatively affects the model’s generalization capability by potentially infusing bias during model training. Nevertheless, alternative data augmentation approaches that promote higher and more reliable per-class variability should be explored in the future, for instance, generative deep learning, such as generative adversarial networks (GANs) or latent diffusion models.
5.5. Nucleus-Based versus Region-Based Approaches
This section provides a comparative analysis between the performance achieved by the proposed nucleus-based methodology and the previously reported region-based approach [
6]. To allow the benchmarking of these two deep learning strategies for mobile-based cervical lesion detection, the test set used is the same for both approaches, thus allowing a fairer comparison. Nevertheless, it should be noted that the patch size and the number of annotations per class are not the same due to the annotation refactor detailed in
Section 3.1. The results regarding
[email protected] and AR@10 can be observed in
Figure 8.
Analyzing the relative performance gain for each metric, it is possible to see that the proposed nucleus-based approach allows an
[email protected] increase ranging from 53.6% to 1216.2%, except for the HSIL class, with a decrease of 20.6%. Regarding AR@10, the nucleus-based approach also brought clear performance improvements, from 95.8% to 267.2% for the different classes.
5.6. Post-Processing Optimizations
The results reported in the previous sections allowed us to select the best combination of optimization steps during model training, which maximized the cervical lesion detection performance at the patch-level on the validation set. Using the best-performing model achieved after training, the class-wise optimization of the prediction score threshold was further explored to improve the model’s image-level performance on the test set. The confusion matrices obtained for the best-performing model before and after the score threshold optimization are depicted in
Figure 9.
To support the critical analysis of these results, additional performance metrics were extracted from both confusion matrices, namely accuracy, specificity, F1 score, and Youden’s index (see
Table 5). Due to the class imbalance of the dataset, the F1 score was used as the primary criterion for selecting the optimal score threshold for each class. This led to F1 score improvements in all classes, except for LSIL, with still no TPs detected. In particular, the F1 score of ASC-US class increased 47% due to a favorable decrease of 887 FPs, but with the compromise of losing 22 TPs. On the other hand, the F1 score of normal squamous nuclei (Sqm. Normal) increased 5% due to a favorable increase of 1376 TPs, but with the shortcoming of detecting 871 additional FPs. However, similar behavior was verified in the ASC-H and HSIL classes, with just residual increases in both TPs and FPs.
From the clinical point of view, the described trade-offs between TPs and FPs seem to benefit the usage of the model after optimization for screening purposes. First, because Sqm. Normal detections are not cervical lesions, they do not represent priority findings that need to be mandatorily reviewed by the screening cytopathologists. For this reason, the significant increase of FPs for the Sqm. After optimization, the normal class on the model does not necessarily cause overhead in the clinical flow. And second, the demarcated decrease in ASC-US FPs on the model after optimization could actually have a relevant impact on its suitability to support clinical decisions because these are abnormal findings that need to be reviewed by cytopathologists, forcing them to constantly review large numbers of FPs, which can lead to an unfeasible overhead in the screening process. Thus, the performance improvements verified for the lesion classes demonstrated the benefits of the data and model-centric optimizations applied during training and post-processing steps, resulting in a more robust lesion detection system with potential to streamline the clinical workflow of cervical screening processes. Some illustrative examples of correctly and incorrectly classified images on the test set are shown in
Figure 10.
In summary, this work contributes to the advancement of mobile-based cervical cytology screening by demonstrating the potential of using nucleus-based approaches that can represent a complementary crucial tool to increase the wide spread of cervical cytology screening and the early and accurate diagnosis of cervical dysplasias. By expanding the coverage of screening programs in underserved areas, these tools can contribute to decreasing the extension of excisional treatments, which are responsible for adverse effects like increased risk of preterm delivery, lower birth weight, or preterm premature rupture of membrane before 37 weeks of pregnancy [
40]. These tools can also be coupled to the analysis of risk factors like HPV persistence and the positivity of surgical resection to leverage a more widespread early detection of recurrence after surgical treatment [
41]. At the same time, the authors acknowledge that future work needs to be carried out to improve the performance of the proposed method. Currently, the state of the art for mobile-based solutions for cervical cancer screening is still far from reaching clinical usage due to the particular limitations of this scenario, like lack of large publicly mobile-acquired datasets, limited mobile-acquired image quality, and the restricted computational power of mobile-based solutions for portable microscopic screening. In particular, the requirement of locally executing the detection models on mobile devices dramatically limits the selection of suitable meta-architecture/backbone combinations, which must be simultaneously lightweight and compatible to run on mobile devices. These limitations should be considered when comparing the performance and maturity of such mobile-based solutions with solutions that operate under optimal, well-controlled laboratory conditions, which usually have access to images acquired with high-end microscopic equipment and unrestricted computational resources.
6. Conclusions and Future Work
In this paper, a new nucleus-based deep learning approach was proposed to detect different TBS classes of cervical lesions on mobile-acquired microscopic images of LBC samples. A RetinaNet model with a ResNet50 backbone was used, and several experiments were conducted to optimize the detection model’s performance, starting by optimizing the learning rate and batch size hyperparameters. In terms of transfer learning, transferring knowledge from networks pre-trained on a smaller dataset closer to the target application domain brought more benefits when compared with an experiment with a large-scale public dataset with categories distant from the tarobtain application domain. Detected classes optimizations were also explored by including normal squamous nuclei as a class detected by the model, which improved the learning process toward more robust discrimination of lesions’ nuclei. Finally, the per-class tuning of the score threshold in the post-processing step also allowed us to obtain a model more suitable to support screening procedures, allowing performance improvements in terms of the F1 score in most of the considered classes.
A comparison between the performance achieved by the proposed nucleus-based methodology and a region-based approach previously proposed was also provided, achieving clear performance improvements regarding both
[email protected] and AR@10 metrics on the same dataset. Despite the apparent success of the proposed approach, it should be noted that the Nucleus-Based Cervical Lesion Dataset created in the ambit of this work still has clear limitations. While the reported experiments regarding data augmentation through basic image manipulations did not improve the detection performance, alternative strategies should be explored in the future like state-of-the-art generative deep learning approaches such as GANs and latent diffusion models to promote higher and more reliable per-class variability. The authors also aim to experiment with additional meta-architecture/backbone combinations, such as object detection models from the YOLO series.
In summary, the proposed nucleus-based strategy for cervical lesion detection presents a step further in developing a cost-effective mobile framework for cervical cancer screening. Although further improvements are still required to embed the proposed approach in a reliable and robust decision support system for cervical cancer screening, this work reinforces the potential of using AI-powered portable solutions to automatically scan and analyze LBC samples. Such solutions can significantly impact screening programs worldwide, particularly in areas with limited access and restricted healthcare resources.