1 Introduction
In 2018, an estimated 4.8 million people were diagnosed with cancer in the gastrointestinal tract worldwide, representing
Projections based on current trends predict an increase of
During this process, a medical linear accelerator (LINAC) delivers high doses of radiation to cancer cells to kill them, possibly damaging nearby healthy cells in the worst case.
The damage to healthy cells causes side effects such as hearing loss, vomiting, and extreme tiredness, among other side effects [18]. To reduce collateral damage, oncologists try to direct X-rays at tumors avoiding the organs at risk.
Magnetic Resonance Imaging Guided Linear Accelerator (MR-Linac) allows observation of tumors and organs in real-time to adjust the radiation direction; however, oncologists must manually segment organs, extending treatment sessions up to an hour, during which time the patient must remain immobile.
In recent years, Artificial Intelligence techniques such as convolutional neural networks have been able to perform auto-segmentation in cases of brain tumors [6], neck cancer [11], and prostate cancer [9, 8], halving the time of treatment sessions [3]; however, there are few advances in the segmentation of gastrointestinal (GI) tract organs, mainly because soft tissue surrounds abdominal organs, and such organs can vary in shape and location throughout the day due to digestive and respiratory movements [10].
In this work, we propose a methodology, based on deep learning, for pre-processing and segmentation of magnetic resonance images of the digestive tract. The architecture of our approach is a weighted ensemble based on U-Net models and two-dimensional Hidden Markov Models (2D-HMM) that performs semantic segmentation of the stomach, and small and large bowels.
The proposed methodology has the potential to help implement more effective and efficient treatments for patients by speeding up the segmentation process.
We evaluated the proposed methodology using a dataset of images from the UW-Madison Carbone Center, provided publicly on the Kaggle platform as part of the UW-Madison GI Tract Image Segmentation Competitionfn, without compromising the run-time and memory space requirements of the segmentation process. This work is organized as follows. In the next section, we present a review of the literature.
Section 3 describes the proposed methodology illustrating the different stages of the process. Then, Section 4 discusses the results obtained from the generated models. In the last section, we present our conclusions.
2 Related Work
Recent studies in Biomedical Engineering use Artificial Intelligence deep learning techniques to assist in the segmentation of medical images for diagnostics and treatment processes [16]; in particular, variants of U-Net architectures.
Deep learning models have good performance in medical image segmentation because they have the ability to simultaneously combine high and low-level information to extract complex image features.
However, segmentation of the GI tract organs remains a challenging task [7], since these organs have a high capacity to deform by body movement and respiratory functions of individuals.
Due to the above, there are few studies on the successful and extensive use of MR-Linac for cases of stomach cancer [21], and on the application of U-Net architectures for this type of imaging, most studies are based on complex models such as 3D U-Net.
In [12], authors proposed a U-Net to segment the liver, stomach, duodenum, and kidney on 3D patch-based computed tomography (CT) images. Their results were promising for the stomach, reaching a score of 0.813 for the Dice Similarity Coefficient (DSC), but less significant for the duodenum where they obtained 0.595.
Other works proposed a similar approach to segment the organs of the GI tract in 2022 [19]. In a preliminary report, their work compares the performance of different encoders for a classical U-Net architecture, with the Resnet34 encoder reporting the best results.
Additional work presents a U-Net and Region-based Convolutional Neural Networks (Mask R-CNNs) to perform segmentation of GI tract organs [5], on the same UW-Madison dataset we used in this study.
The authors report that their Mask R-CNN model achieved a DSC score of 0.73 in their validation data. Other works use Vision Transformers to segment, in the same way, the images of UW-Madison [15].
The proposed model is hybrid. It uses a LeViT architecture as the encoder and a U-Net++ as the decoder. The resulting model obtains a score of 0.79 for DSC and 0.72 for IoU.
In [7], an automatic contour refinement (ACR) method based on probability maps for correcting self-segmented contours in magnetic resonance-guided radiation therapy is described.
Self-segmentation was generated by a 3D deep CNN architecture (a modified 3D-ResUNet), the DSC changed from 0.44 to 0.56, from 0.33 to 0.55, and from 0.34 to 0.54, in the stomach, small bowel, and large bowel, respectively.
Furthermore, there are works that explore the use of Hidden Markov Models (HMM) for multi-class image segmentation [17], in which the hidden states of a Markov model represent the true segmentation of the image.
In addition, in [2], authors used two-dimensional Markov models (2D-HMM) for effective segmentation of radiographs, multispectral and synthetic images. Despite the potential of HMMs, there are no comprehensive studies of their application in the segmentation of magnetic resonance images.
There are recent works in the literature that combine the use of convolutional operators with adaptive HMMs to segment brain images [14, 20].
However, to the best of our knowledge, no method incorporates HMMs in the segmentation of GI tract images as we propose in this work. In summary, deep learning approaches, especially U-Net variants, are the most explored methods in the literature to analyze biomedical images [16].
The application of these methods to segment images of the gastrointestinal tract remains a challenge and an open area of research.
3 Methodology
In this section, we present a methodology that consists of three phases.
The first phase includes pre-processing of the images of the dataset (3.1), the second is the design and construction of the segmentation models (3.2), and finally, the validation phase of the models through experimentation and analysis of results (4). In Fig. 1, you can see the general stages of the proposed methodology.
3.1 Data Pre-Processing
As we can see in Fig. 1, the first phase of the methodology consists of preparing the data. The dataset used in this research is public and was provided by the UW-Madison Carbone Cancer Center.
The data repository consists of 272 MRI sets in 16-bit grayscale PNG format from 85 cancer patients during radiation treatment. Each scan has 144 slices, which gives a total of 39,168 images.
The training annotations are RLE (Run-Length Encoding) encoded masks for the segmentation of three organs of the GI tract: stomach, large bowel, and small bowel.
The images are of different dimensions; therefore, it was necessary to standardize them. Consequently, we normalized them and their respective RLE-encoded masks to a size of 128 × 128 px.
Furthermore, to visualize the pattern in the distribution of the organs in the sample, we plotted the heatmaps of each organ (see Fig. 2).
3.2 Design and Construction of Segmentation Models
As previously mentioned, our proposed methodology considers creating two models for organ segmentation and an ensemble that integrates both models. The first model considers a U-Net++ type architecture, while the second one is a two-dimensional HMM (2D-HMM).
The individual processes for the construction and training of both models are described below, as well as the process of their integration for the ensemble.
3.2.1 U-Net++ Model
Such architecture was designed to solve limitations of the base U-Net model in the segmentation of medical images [22] by including a series of additional connections to the original U-Net for the effective recovery of the fine granularity details of the objects, including deep supervision that allows establishing different configurations of its parameters.
The additional connections of the U-Net++ follow a pyramid rule, where the shape of U is filled with convolutional blocks, each one consisting of a certain number of layers that vary according to the network nodes. The original U-Net++ diagram from [22] is shown in figure 3.
In this work, the network was implemented in Python 3.8 following the version proposed in [22].
The hyper-parameters of the model were adjusted with the Keras API grid search, selecting relu as the activation function in the hidden layers, 0.1 as dropout rate,
Finally, sigmoid was used as the activation function in the last layer instead of softmax to assign probabilities to each class instead of distributing them.
For the hyper-parameters determination, a partition of 80% of the total images was made for training and 20% for validation.
We used the DICE coefficient optimized per organ as a loss function, integrated as a weighted sum given class imbalance.
Let
Therefore
Finally, let
3.2.2 Two-Dimensional Markov Model (2D-HMM)
Hidden Markov Models are a statistical technique that allows the creation of a model with observed and hidden events as causal factors in a probabilistic model.
An HMM consists of two stochastic processes, a hidden state process, and an observable symbol process, where the hidden states form a Markov chain, and the probability distribution of the observed symbol depends on the underlying states. In the case of image segmentation, the intuition is that pixels in an image depend on those surrounding them; that is, they share common characteristics such as color and spatial location.
Therefore, it is possible to treat this pixel dependency as a Markov Random Field, which relates two main probabilities: transition
Intuition indicates that pixels
That is,
where:
For all
On the other hand, the likelihood that the neighboring pixels correspond to the other organs is practically null.
The observation probability is calculated based on the color of each pixel
Let
As can be seen, making a maximum likelihood estimate would be imprecise because the observation probabilities for the stomach and small bowel are similar. More information needs to be integrated.
In the current work, the state of a pixel, in addition to being conditioned by the previously described probabilities, is also influenced by the spatial position in the image; that is, there are high-probability zones in which an organ can appear, as shown in the heatmaps (see Fig. 2).
Thus, the probability
Therefore, the final calculation of the probability that the state of the pixel
The proposed calculation takes into account the spatial, observational, and transition factors of the pixels. The way to incorporate the calculations for the segmentation of a new image is shown in algorithm 1.
In the present work, the multiplication of probabilities was replaced by the logarithmic sum of the probabilities to avoid a problem of negative overflow or underflow.
It is important to note that the purpose of the 2D-HMM is not the segmentation itself, but the calculation of efficient probabilities to improve the performance of U-Net++. For this reason, we omit to include an adaptation of the Viterbi algorithm in 2D.
3.2.3 Ensemble
To integrate the information from the U-Net++ and 2D-HMM models, a weighted ensemble layer is proposed, which uses the probabilities given by both models to enhance the classification.
Let
The value of
4 Results and Analysis
The experimentation was carried out on the Google Colab platform using a Colab notebook with an Intel(R) Xeon(R) 6-core CPU @ 2.20GHz, NVIDIA A100-SXM GPU, and 12 GB of RAM.
For the training and validation phase of the models, the Leave-One-Out Cross-Validation method was followed, which is one of the recommended methods in biomedical sciences to improve the predictive rate of models for clinical studies [4].
The method consists of testing the model on a set of
One of the first tasks of the evaluation phase was to adjust the weight parameter
Metric | Weight |
|||||
0.05 | 0.10 | 0.20 | 0.30 | 0.40 | ||
Dice | General | 0.811 | 0.799 | 0.788 | 0.771 | 0.742 |
Stomach | 0.888 | 0.885 | 0.872 | 0.844 | 0.749 | |
Small Bowel | 0.812 | 0.791 | 0.759 | 0.701 | 0.601 | |
Large Bowel | 0.817 | 0.814 | 0.804 | 0.786 | 0.747 | |
IoU | 0.777 | 0.770 | 0.748 | 0.709 | 0.628 |
We can observe in table 2 the results of all the proposed models. It is noteworthy that the U-Net models that incorporate information from the Markov process report better results in both evaluation metrics, satisfying our intuition for their integration.
Metric | Models | ||||
2D-HMM.U-Net++ | 2D-HMM.U-Net | U-Net++ | U-Net | ||
Dice | General | 0.811 (32%) | 0.723 (34%) | 0.610 | 0.538 |
Stomach | 0.888 (10%) | 0.803 (26%) | 0.808 | 0.635 | |
Small Bowel | 0.812 (38%) | 0.711 (29%) | 0.585 | 0.548 | |
Large Bowel | 0.817 (5.6%) | 0.774 (43%) | 0.773 | 0.538 | |
IoU | 0.777 (18%) | 0.696 (36%) | 0.657 | 0.511 |
For example, in the general Dice for the U-Net++ ensemble, there is an improvement percentage of 32% over U-Net++, while the U-Net ensemble obtains an improvement of 34% with respect to its individual model. In the case of the IoU metric, the improvement percentage is 18% for the U-Net++ ensemble and 36% for the one based on U-Net.
Finally, table 3 compares the results of the proposed 2D-HMM U-Net++ model with recent works from the literature on the problem of segmentation of biomedical images of the GI tract, which were discussed in Section 2.
Metric | Models in the Literature | |||||||
2D-HMM U-Net++ | U-Net | Mask R-CNN | Resnet34 | LeViT384-UNet++ | 3D-ResUnet* | 3D U-Net* | ||
Dice | General | 0.81 | 0.51 | 0.72 | 0.79 | 0.79 | ||
Stomach | 0.88 | 0.77 | 0.81 | |||||
Small Bowel | 0.81 | 0.75 | ||||||
Large Bowel | 0.81 | 0.76 | ||||||
IoU | 0.77 | 0.85 | 0.72 |
Notice that not every approach reports results on the segmentation of individual organs as we do, considering that segmenting the bowels is a harder task due to their physiology.
In addition, there are a couple of models that use a different GI dataset for evaluation. However, we consider it important to include their results as they are involved with the same top-level goal.
We can see that our approach surpasses most of the works in the evaluation metrics, except for the Resnet34 model that only reports results regarding the IoU metric; however, this work followed a traditional 80 - 20 partition methodology to evaluate, which can make the result highly dependent on the partition used. Figures 6 - 8 and 7 - 9 show 2D and 3D visual examples of segmentation for a specific slice of a resonance set. In these examples, the ensemble enhanced the predictions of the U-Net++ by up to 19%. For example, the U-Net++ prediction, illustrated by the fourth image of Fig. 7, misses multiple organ details compared to the true image.
However, the weighted ensemble is capable of restoring these details, as can be seen in the last column of the same figure set. In general, we can observe how the proposed ensemble significantly increases the quality of the segmentation. In summary, although the U-Net++ model has proven to be an effective architecture for organ segmentation, it has deficiencies in segmenting certain sections of the GI tract by containing two or more classes of organs with high likelihood, due to the high capacity of the GI organs to deform because of body movement and respiratory function.
This work integrates the probabilities of the Hidden Markov Models to discern those cases where the base model fails to segment. Our work considers spatial and transition probabilities, constituting the main difference from related work.
5 Conclusions
Organ segmentation for the treatment of gastrointestinal tract cancer is an important task that requires precision and speed. It is vital to have algorithms that can help automate the process of segmentation, as support for medical specialists, to reduce collateral damage to healthy cells without increasing treatment times. However, segmenting GI tract organs remains complex due to the deformations they undergo from body movement and respiratory function.
This paper proposes a Deep Learning methodology that develops a weighted ensemble integrating U-Net++ and 2D-HMM models for semantic segmentation of the stomach and bowels. Although 2D-HMM does not provide highly accurate segmentation by itself, it boosts U-Net++ predictions in the general Dice by up to 32% and by up to 18% in IoU scores. The final precision of 0.811, obtained by the ensemble in the general Dice, is better than the results reported in the literature.
Furthermore, by using Leave-One-Out cross-validation, the metric provided has a high level of reliability over the dataset used. The proposed architecture has the potential to help implement more effective and efficient treatments for cancer patients by speeding up the targeting process of segmentation and minimizing risks.
Part of the future work will consider the integration of automatic contour refinement techniques or additional recurrent layers in the networks, which we believe could improve the quality given by the spatial and transition probabilities of the proposed ensemble. In addition, we plan to replicate the proposed methodology in other datasets to evaluate its generalization.