1 Introduction
The estrous cycle is the period between two ovulations and defines the receptivity of the female for its reproduction. The short duration of the cycle in rats makes them an ideal model for investigating the changes that occur during the reproductive cycle [24,17,25,7]. The cycle is used for reproduction control of rodents, obtaining high fertility rates, scheduling of production on specific dates, knowing the time of gestation, and the development of embryos at specific ages. The estrous cycle in rats lasts four days and is characterized by the stages: Proestrus, Estrus, Metestrus, and Diestrus, which can be determined according to the cell types observed in the vaginal smear [3].
The objective of this work is to provide a tool for automatic detection of estrous cycle stages through the image processing and neuronal networks. For this, we use manual and automatic feature extraction techniques, as well as different classifiers such as radial kernel Support Vector Machines (SVMs), Multilayer Perceptron networks (MLPs), and Convolutional Neural Networks (CNNs). The dataset, the feature extraction algorithms, and the neural network models are available in [10].
The rest of the paper is organized as follows. Section 2 provides a review of classical methods used for classifying the estrous cycle and shows the similarities between Papanicolaou (PAP) cells and cells of the estrous cycle. In Section 3 we review the feature extraction and the neural networks used. In Section 4 we show our approach for classifying the estrous cycle. We present several experiments and results, as well as a short discussion to demonstrate the effectiveness of our proposal. Finally, in Section 5 the conclusions and directions for future research are discussed.
2 Previous Work
Shannon L. Byers [3] describes a variety of methodologies used for classifying the estrous cycle. All of them need specialized people for being performed (Table 1). In the same paper, a graphic tool for classifying with the estrous cycle is presented (Figure 1).
Methodology | Advantages | Disadvantages |
---|---|---|
Cytology | Identifying all stages. | The sample needs to be dyed. |
Impedance | The cycle can be measured using a probe. | The methodology only can classify one stage. |
Observation | No extra equipment is required. | The methodology only can classify one stage. |
Claudia Caligioni [4] describes the process to follow for classifying the estrous cycle through vaginal smear.
Her work shows that it is not necessary to dye the rats to classify the sample. According to her work and the work of Marcondes [18] the percentage of each kind of cell present in the vaginal smear must be:
— Proestrus: Predominance of nucleated epithelial cells (Figure 2-a). These cells may appear in clusters or individually) [4].
— Estrus: Is characterized by cornified squa-mous epithelial cells, which occur in clusters (Figure 2-b). There is no visible nucleus; the cytoplasm is granular, and the shape is irregular [4].
— Metestrus: Is a mix of cell types with a predominance of leucocytes and a few nucleated epithelial and/or cornified squamous epithelial cells (Figure 2-c) [4].
— Diestrus: This stage consists predominantly of leukocytes (Figure 2-d) [4].
The state-of-the-art of estrous classification cycle does not have an automatic way for being performed at this moment. Nevertheless, the PAP cells are similar to the estrous cycle cells (Figure 3), and concerning this topic, there is a wide variety of algorithms.
Mariana E. Plissiti [22] has a 99.39% of accuracy segmenting the nucleus on PAP cells through the watershed transform. In the same year, she [21] worked on a methodology for segmenting the nucleus and overlapping cells through the H transform, SVM and C-Means. In 2015 [23] she presented a method for the segmentation of cytoplasms that showed an overlapping. She uses a variety of intensities in the RGB obtained from super-pixels. Dabashree Kashyap [12] propose a method to classify PAP cells using geometric and texture characteristics. He uses GLCM metrics and 3 SVM where the best result is obtained using a polynomial kernel. Ling Zhang [32] presents a method for segmenting cells using graphs.
He takes an image in the CIELAB color space (channel A) and applies the three-threshold Otsu algorithm; finally, he uses a graph-cut approach for enhancing the segmentation. The results obtained are robust to non-bimodal distributions in the histograms of the image. Meng Zhao [33] proposes a method of classifying through image analysis by blocks slightly larger than the size of a cell. For the classification, he uses texture and histogram metrics. He reaches a 98.98% accuracy and 95% sensitivity. The detection of the estrous cycle keeps similarities with the classification of PAP cells.
However, the main difference is that the classification of the estrous cycle is based on the quantity of each kind of cell presented in the samples [4,18] and the PAP classifying is based on the morphology of the cells. We took the PAP work as a starter point. Our proposal eliminates the bias of designing manual feature extractors, using automatic feature extraction, and achieving a completely automatic classification through neural networks of last generation.
3 Methodology
This section describes the different approaches used for the classification of the four aforementioned estrous cycles. As a first approximation, SVMs and MLPs are trained for the manual feature extraction method, described in Section 3.2. Subsequently, two architectures of Convolutional Neural Networks are trained, the first from scratch, the LeNet-5 network, and the second, a pre-trained model with Imagenet dataset [8], the VGG16 network.
3.1 Pre-Processed Features
The cell image is converted to the CIELAB color and grey level space. From CIELAB the channels A and B which represents a variation of the red color to green and the blue color to yellow are taken. A median filter with a window of 5,5 to eliminate the noise it is applied. Otsu's approach [20] is applied to segment the regions of interest.
3.2 Manual Feature Extraction
For the extraction of texture features, we used the GLCM algorithm [5,9] with steps (δ) [1,5] and angles (θ) [0°, 45°, 90°, 135°]. The GLCM metrics used:
Contrast:
Dissimilarity:
Homogeneity:
Energy:
where P i,j (the co-occurrence probability between grey levels i and j) is defined as:
where G i,j represents the number of occurrences of grey levels i and j, given a certain pair (δ,θ). The matrix G is calculated considered the times in which occurrences of intensity change given an angle θ and a step δ. Each element of the matrix represents the number of occurrences between a change of intensity to another (Figure 4).
The morphological metrics obtained were the number of elements, the average compactness Equation 8 and the total compactness Equation 7 giving a total of 105 characteristics, 35 by each channel of the CIELAB and 35 by the greyscale.
Compactness factor:
Total compactness:
Average compactness:
where Nl is the number of connected elements in the binary image.
3.3 Classification
Nowadays, there is a great diversity of classifiers. These are divided into two large branches, supervised and unsupervised. Within the unsupervised classifiers we can find algorithms like k-Means, Self-organizing maps, Hidden Markov models.
Supervised algorithms have better acceptance in the scientific community because they are more effective methods to achieve higher classification performance. Within this other branch, we can find the Multi-Layer Perceptron (MLP), Support Vector Machine (SVM), Deep Neural Networks (DNN) and the Convolutional Neural Network (CNN).
The nature of these classifiers is varied, while the SVMs excel in their ability to classify patterns (data), they have a low performance when trying to classify images. The MLPs and DNNs networks can classify both patterns and images of a single channel; however, these models tend to be large and difficult to train.
Finally, we have the CNNs, which in recent years, have proved to be the ideal tool for image classification with automatic feature extraction.
3.3.1 Deep Neural Networks
Deep Neural Networks (DNNs) are composed of a computation unit called Perceptron, defined by Equation 9. The objective is to separate two classes through a hyperplane.
where xi represents the ith element of the input x vector. w are the trainable synaptic weights, b is the bias and F represents a non-linear activation function [30,27,26]. In our case, the ReLU = max(0, y) function [19].
A DNN network usually consists of more than three intermediate layers, each layer of a variable Perceptron number, as shown in Figure 5. These networks are trained by stochastic gradient descent.
3.3.2 Support Vector Machine (SVM)
A SVM is a discriminative classifier, whose objective is to look for a separation hyperplane between two classes, defined by Equation 11. This maximizes the separation distance between two classes. For this, the SVM uses non-linear transformation functions or kernels. The two most common kernel functions are: the linear (Kl), Equation 12, and the radial basis (Kg), Equation 13, like Gaussians [6,2].
where αk are the parameters to be adjusted, xk is the training pattern. The K function is a predefined kernel and x is the support pattern.
where x is the training pattern, μ is the center of the Gaussian, σ the variance, and • is the dot product.
3.3.3 Convolutional Neural Networks (CNNs)
CNNs are a relatively new branch of machine learning, introduced by LeCun in the 1990s [15,16,14]. This architecture type has turned out to be the default option for analysis, classification and treatment of image problems [29,28,31,13,11].
The success of these architectures is due to the implementation of feature extractors, trained in an automatic way. These feature extractors are convolutional filters which, by means of a stochastic gradient descent training, "learn" the characteristic features of each class. The layers of convolutional filter are called feature extraction layers. These layers also use a sampling-based discretization element, called Max-pooling, which helps to reduce the dimensionality of the elements to be classified, as well as to preserve the most predominant features.
The second element that constitutes these networks is the classification layer; usually, this layer is implemented by a Multi-Layer Perceptron (MLP) of two or three layers.
The trainable parameters that constitute this type of networks are the synaptic weights corresponding to the MLP classification layer, and the weights corresponding to the convolutional filters in the feature extraction layer. All these parameters are trained end-to-end, through optimization algorithms such as back-propagation. The general architecture of this type of network is shown in Figure 6.
4 Experiments and Results
In this section, we describe the results when evaluating the different classifiers on two sets, the first composed of four stages of the estrous cycle (Figure 7) and the second by two classes. The first class is composed of the stages Proestrus and Estrus (Figures 7-a and 7-b) and the second class is formed by the stages Metestrus and Diestrus (Figures 7-c and 7-d). The second training set is shaped with the stages that define the receptivity of the female rat for its reproduction [3]. For implementation and classification details consult [10].
4.1 Dataset
The dataset consists of 32 images, 8 of each stage, of the estrous cycle. The images were taken with a Logitech C170 camera with a magnification of the microscope of 400x formed by an ocular 10X and an objective 40X, which helped provide a good image quality [18]. The images were divided into 16 sub-images. Each sub-image was tagged by the staff of the BUAPs vivarium Claude Bernard. The sub-images that did not present an adequate number of cells were discarded, resulting in a total of 89 images of the Diestrus stage, 125 of the Metestrus stage, 112 of the Estrus stage and 86 of the Proestrus stage with dimensions of 100x90 pixels (Figure 7).
4.2 Classification Methodology
The classification process is divided into two stages. In the first, we use the 105 manual features, result of texture and shape analysis described in Section 3.2. To classify these features, we used DNN and SVM networks with radial base kernel, which are ideal for classifying this type of information.
In the second classification stage, the CNN LeNet-5 (from scratch) and pre-trained VGG16 are trained, for which the images of the cells in RGB format with a resized dimension of 150x150 are used, normalized in value, dividing the pixel value by 255.
Since we are using three different types of classifiers DNN, SVM, CNN, and to be able to have a valid comparison point between the different architectures, the original dataset was treated in the following way for all tests. From the total of 412 images, three mutually exclusive sets were generated. The first set, the training, 280 images were taken at random, 70 images for the validation set, and 62 images for the test. These last three datasets were used for the training of the DNN and SVM networks.
Due to the lack of images (412 in total and 280 for training), we chose to artificially expand the training set generating a total of 5600 images, 19 artificial images for each original image, applying the following transformations: rotation range [0, 180], horizontal and vertical flip. At the end, the training set for the CNNs is 5600 images. In summary, only the training set for the CNNs was expanded, the validation and test sets are the same for all the trials.
4.2.1 SVM Classification
For the classification of the manual features, a random search grid was generated for μ and σ parameters of the radial base kernel [1]. The grid has a uniform distribution in a logarithmic scale range of [-10, 10] and a density of 10,000 samples. The classification result of this architecture is shown and compared in Section 4.3.
4.2.2 DNN Classification
In the same way that a search grid of random hyper-parameters was generated for the SVM. For the DNN a similar hyper-parameter search grid was generated. This grid includes the number of intermediate layers in a range of [1, 6], the perceptrons number per layer in a range of [1, 250], and the learning rate in a range of [0.1, 0.00001]. The random search grid was generated with a uniform distribution of its values. The density of the grid is 10,000 elements or architectures. The classification result of this architecture is discussed in Section 4.3.
4.2.3 LeNet-5 CNN Classification
Opting for a different point of view to the two previous classification methods, we decided to use a more robust architecture in terms of image classification; a Convolutional Neural Network, which contemplates both, classification and automatic feature extraction. This architecture (LeNet-5) is composed of a first block of 6 convolutional filters of 5x5, followed by a max-pooling of (2, 2) with a step of 2. The second block consists of 16 convolutional filters of 5x5, followed by a max-pooling of (2,2) with a step of 2. The third block consists of 120 convolutional filters 5x5. All the activations are ReLU functions.
The classification layer consists of a MLP of two layers, the first layer of 84 neurons and the output layer of 4 neurons for four classes classification problem, and one output perceptron neuron for a two classes classification. The results of this architecture are shown and discussed in Section 4.3.
4.2.4 VGG16 Classification
In this stage and due to the results of the previous classifiers, we decided to use a pre-trained VGG16 network [29] with the Imagenet dataset [8]. This under two premises: the first is to take advantage of the previous knowledge of the pre-trained convolutional filters and, the second, using transfer learning techniques and artificial data augmentation we can combat the high percentage of overfitting presented by previous classifiers.
Two tests were carried out with the VGG16 architecture, the first using all the pretrained feature extraction blocks (convolutional and max-pooling filter layers) and training only the classifier at the end of said network. The second test was to eliminate the last feature extraction block consisting of three convolutional filters and one maxpooling.
This in order to rule out the abstraction of high-level features of the last convolutional layers, and take advantage of the low-level features of the first convolutional layers, since these features are more suitable for classifying basic elements such as cells. The MLP classifier added to this architecture consists of two layers of 100 and four neurons for four estrous cycles classification, and of 100 and one neuron for two estrous cycles classification.
4.3 Results
In this section, we present the results of classifying two datasets (for two and four estrous cycles), with the four architectures of neural networks as described above.
4.3.1 Classification of 4 (Estral Cycle) Classes
This problem turns out to be the most interesting and difficult to classify, where each class is represented by a certain estrous cycle, Proestrus, Estrus, Metestrus, and Diestrus. The results of each classifier are presented in Table 2. The percentages of classification for MLP, SVM, LeNet-5, VGG-16 and VGG-16 Modified are shown. In summary, we can conclude that for this specific problem, the convolutional neural networks have a greater ability to automatic feature abstraction, and these features are more representative than the manually selected features (see Section 3.2).
4.3.2 Two-Class Classification
In this section, we present the classification results for the estrous cycles, Proestrus and Estrus stages as the first class and the second class was formed by the stages Metestrus and Diestrus.
In Table 3 the Micro and Macro F1 score is shown for each neural network model. Although the classification percentages for this training set are higher than those shown in Table 2, the model with the best classification percentage again is the modified VGG-16 model.
4.4 Generalization Analysis
In this section, the models with better generalization are analyzed. This is due to the fact that in the training stage and because of the training data shortages, all the aforementioned models have overfitting to a certain degree. Also, although the Micro and Macro F1 scores are reliable metrics for measuring the performance of a network, these metrics are skewed, since they were calculated with a reduced set of only 62 images. From the above, we present a discussion on the best model to use for the classification of the estrous cycles and this is not the model that obtained the highest percentage in terms of classification.
From Figure 8, we can conclude that the architecture that presents the lower overfitting is the LeNet-5 network (shown in red). This is because it has the smallest difference between the percentages of validation and test with a classification percentage greater than 80%. Also, the use of the Modified VGG-16 network (VGG-16-M), turns out to be a good option, with a difference of 0.022% between the test and validation percentage, and a good 82% in test.
Notice that these percentages vary with respect to the percentages shown in Table 2. The latter shows the highest training rates, but they have up to 15% of overfitting, and the percentages shown in Figure 8 are the classification percentages of the architectures with the lowest degree of overfitting.
Regarding the classification of the second dataset (2 classes), from Figure 9 we can see that as far as over-training is concerned, any of the architectures used is a good option. This is because of the difference between the maximum and minimum percentage of overfitting is only 0.04 percent. This leads us to focus on the neural network with the highest percentage of classification, which in this case is the Modified VGG-16 network, which has a classification of 98.38%, and it turns out to be the best option for the dataset of 2 classes.
5 Conclusion and Future Work
The contribution of this work is the automatic classification of the estrous cycle through the automatic feature extraction. A methodology for the autonomous classification of the estrous cycle was presented in their four stages: Proestrus, Estrus, Metestrus, and Diestrus, as well as the classification of stages with high hormonal levels (Estrus and Proestrus) which are used for population control. In the first case, 82% of accuracy was reached while in the second case it was 98.38%.
The results obtained are considered sufficient to solve the problem of population control. However, the classifiers have problems separating the stages Metaestrous and Diestrous, so it is necessary to improve the features for these stages by increasing the number of images. In this direction, we prove that the automatic feature extraction of the CNNs is more robust than the proposed manual feature extraction in terms of generalization of neural networks, which translates into greater reliability in terms of classification. As future work, we intend to expand the dataset, in order to provide a reliable classification of the estrous cycle using convolutional neural networks as well as design an expert system based on the graphic tool proposed by Byers [3].