1 Introduction
As a delta state, Bangladesh has a huge number of rivers across the various parts of the country. Since the production of fish contributes to the everyday life of millions of people in Bangladesh, this is considered to be the second most valuable agricultural crop in the whole country [13]. Amongst 32,000 species of fish worldwide, almost 40% of those species live in freshwater, as in our country, marine and inland fish (in fresh-waters and brackish waters) have a huge number of 401 and 251 species respectively [7]. So, we cannot ignore the importance of fish in our people’s lives.
Hence, there is a need to have a fish processing unit that will help to produce a comparatively better classification system that helps the processing unit to collect the fish data on a conveyor belt in any kind of fish processing company. Along with that, there is a need to collect the shape info of the various classes of fish to make a processing system that processes or packs the same size fish. But there is a problem with the background. Various types of background make classification more difficult.
Therefore, spectral characteristics are used [23] to overcome the underwater turbulence and other noises because underwater there are environmental variations in luminosity, fish camouflage, dynamic backgrounds, water murkiness, low resolution, shape deformations of swimming fish, and subtle variations between some fish species [16]. We need a background independent fish classification (FC) using segmentation [24, 1] depending on shape, texture, color. Moreover, some techniques involve dividing a fish into several parts like fish head, body, tail [6, 2]. Salp Swarm Algorithm (SSA) and threshold Otsu’s method also produce a satisfactory FC result [15].
Boundaries of segmented regions and the contour extraction are improved by a proposed system using the median-cut algorithm [22]. However, many of the recent papers propose a feature descriptor using a transfer learning approach using pre-trained models like VGG16 [3, 9] and AlexNet [2].
After extracting the features various deep learning algorithms are used to classify like Artificial Neural Network (ANN) [24, 17, 12], Convolution Neural Network (CNN) [3, 18, 21, 8, 30, 27, 11], Deep Learning Network (DLN) [2]. Many of the recent papers suggested FC using machine learning techniques like Support Vector Machine (SVM) [28] and it is also used with the feature descriptor like Hybrid Linear Binary Pattern (HLBP) as classifier [29], Nearest Neighbors (KNN) [19], Decision Tree (DT) [28], Naive Bayesian as a fusion layer of DLN [2].
However, segmentation removes the maximum number of unnecessary features and collects only the regions of interest. So for the mentioned reasons, the suggested model use u2-net to remove the background and a feature extractor like HOG to extract the features from the selected region. Then a classification model is used to classify the classes of fish.
This paper significantly contributes to the following aspects, (i) Efficiently classify fishes by removing variant background using transfer learning techniques like u2-Net, (ii) Extract features depending on shape and color images by HOG, (iii) Prepare a new fish dataset containing 2,678 samples of five classes.
The following parts of the article are distributed as, Section-2 describes the overall technique or methodology of the detailed discussion on the proposed classifier, Section-3 and 4 illustrate the dataset preparation techniques and the result analysis respectively, and Section-5 concludes the article itself.
2 Methodology
The proposed technique has four layered structures described in fig. 1. Preprocessing stage mainly preprocessed the images. It divides those images into two groups as follows: (i) focusing on shape (ii) colored images (background removed). These groups are named as background removed binary image and background removed color image respectively in fig. 1.
HOG is used to generate the feature arrays of two types of images. The details are described in further sections. So, there will be two types of feature arrays and two ensemble stacking classifiers are needed to classify the features individually (fig. 1). Finally, the decision-making layer uses the previous two ensembles to make decisions depending on the maximum prediction rate.
2.1 Preprocessing
Firstly, an actual image of the fish from our dataset has been taken as fig. 2(a). But the real image is huge in size (4624x2136) so that it requires resizing for computational purposes. Images are resized into 140x300 with an unchanged aspect ratio illustrated in fig. 2(b).
There is a problem with variant backgrounds on FC so that the classification task can’t identify the true area of interest on an image because any image contains a huge area that does not play any kind of role on FC.
To overcome the issue, the proposed method adopted a deep learning algorithm, called U2-Net [26] as a transfer learning approach. It captures contextual information on a different scale from an image.
So after applying the method, it produces the mask image like fig. 2(d), and the mask is then used to remove the background from an actual image as fig. 2(c).
2.2 Feature Extraction
After getting fig. 2(c) and fig. 2(d), HOG [10] descriptor extracts the features. This algorithm basically focuses on the magnitude as well as on the direction. In addition to that, it breaks an image into several parts to capture the magnitude and the orientation. It produces not only the edge value but also the direction of the edge. So that, the proposed method suggested applying the HOG on both fig. 2(c) and fig. 2(d).
The resulting image from fig. 2(c) to fig. 2(e) demonstrates the inner edges along with the outer edges. Since the source is an RGB image, the output contains the edges of the fish body(inner edges) and shape(outer edges). But on the other hand, fig. 2(f) captures the outer edges from fig. 2(d). As fig. 2(d) is only a black and white image so that there is no inner edge left on the resulting image. However, the preprocessing phage produces two types of features illustrated above namely Feature array 1 and 2 (fig. 1) which are being used to classify the fishes separately.
2.3 Ensemble for Classification
In the preprocessing stage, the maximum number of unnecessary parts of an image is removed by the salient object detection technique. This means now ours have the feature arrays of selected region-1 and 2 described in fig. 2(e) and fig. 2(f) respectively with minimalist areas. These areas satisfy our goals of feature selection because features only contain the fish details. So the proposed model suggested using shallow machine learning approaches like SVM, KNN, Logistic Regression, and Decision Tree to create the ensemble classifier.
Ensemble classifier is a technique that is a composition of many individual homogeneous or heterogeneous classifiers. There is an ensemble model called a stacking classifier. It is made of two parts: (i) base learner which is used as a training layer and (ii) meta learner which is used as a decision layer on the stacking classifier. The suggested model proposed a stacking ensemble classifier with SVM, KNN, Logistic Regression, and Decision Tree for the base learner. They produce four separate prediction results on the dataset and finally a Logistic Regression model for meta learners that decides the final result from the previously generated prediction results.
As there are two types of selected regions, two stacking ensembles are needed to train and test on both sets of regions. As illustrated in fig. 1 there are models 1 and 2 generated by the mentioned technique from feature arrays 1 and 2 respectively.
2.4 Decision Making
Model 1 and model 2 are then used to produce the result of the final decision layer. They both give the five different percentages of each fish class. So, there are 10 different prediction values from both models 1 and 2. This layer performs a max voting approach to decide the finally generated fish class from those 10 probability values. Maximum probability is decided to be the final output class or result class.
3 Dataset Preparation
Dataset consists of 2678 numbers of images of five different fishes of Rohu, Mrigal carp, Silver carp, Clown knife fish, and Tilapia described in table 1. The resolution of the images is 4624x2136. Images were captured with Samsung S5KGW1 censored camera. Every class is captured in a different position and collected from Bangladeshi local ponds. In fig. 3, there is a sample of images collected for the training and testing.
Local Name | Eng. Name | # Images | Split Size | |
Training | Testing | |||
Chitol | Clown knifefish | 610 | 408 | 202 |
Mrigel | Mrigal carp | 616 | 412 | 204 |
Rui | Rohu | 642 | 430 | 212 |
Silver carp | Silver carp | 596 | 399 | 197 |
Telapia | Tilapia | 214 | 143 | 71 |
Total | 2678 | 1792 | 886 |
After preparing the dataset, it has been divided into train and test split with a 77% and 33% ratio respectively. To pick the random images for the train and test set, sklearn.model_selection.train_test_split is used. But there is one problem: if the whole dataset is divided into train and test sets, there will be a chance that an imbalance number of fish classes will be added into the train and test. So, to overcome this problem every class of fish is divided separately into train and test set as described in table 1.
4 Result Analysis
In table 3, presented the system environment. This system is used to evaluate the models. The next few subsections describe the analysis process.
Input Data | Classifier Name | Classification Accuracy | |
Training | Testing | ||
Feature array 1 | SVM | 99.94% | 99.66% |
KNN | 99.83% | 99.77% | |
Logistic Regression | 100% | 99.66% | |
Decision Tree | 100% | 91.094% | |
Feature array 2 | SVM | 100% | 99.66% |
KNN | 99.83% | 99.66% | |
Logistic Regression | 100% | 99.66% | |
Decision Tree | 100% | 93.80% |
System | Type | Details |
CPU | Model name | Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz |
Architecture | x86_64 | |
RAM | 16GB | |
OS | Ubuntu 20.04.3 LTS | |
VGA compatible controller | Intel Corporation HD Graphics 530 (rev 06) | |
GPU | Model name | GeForce GTX 960M (rev a2) |
4.1 Base Learners Accuracy
Recommended models (models 1 and 2) have two layers of base learners. So, observation is conducted through the base learners to evaluate the inner structure of the stacking models as follows in table 2. Observed accuracy shows that the classifiers make an outstanding performance on the created dataset both on training and testing due to using the extra feature deduction layer as the paper suggested.
4.2 Final Model Accuracy
On table 2, individual classifiers show good results. Another meta learner is used to predict the previous base learner’s result. It gives the consistent and the final accuracy of each feature array. Hence, a logistic regression classifier is used to supervise the result of the previous layers. So, there need to be two results of accuracy both for models 1 and 2. And they produce a better accuracy described in table 4. Besides, confusion matrix-based evaluation is conducted on models 1 and 2. That produces a good metric described in fig. 4.
4.3 Comparison Analysis
The performance of the proposed classifier is compared using other types of classifiers from different suggested models as in table 5. This table describes that the recent works produce a good accuracy but comparatively proposed model stands better than the others.
Author Name | FC Algorithm Name | Accuracy |
KAYA [17] | ANN | 98.88% |
Alsmadi [4] | HGAGD-BPC | 96% |
Hnin [14] | SVM | 100% |
Qin [25] | linear SVM classifier | 98.57% |
Matai [20] | PCA algorithm | 100% |
Ali-Gombe [3] | Deep CNN | 97.20% |
Kutlu [19] | Nearest neighbour | 99% |
Taheri-Garavand [30] | Deep CNN | 98.21% |
Abinaya [2] | NBC and DLN | 98.60% |
This article | Proposed model | 99.77% and 100% |
5 Conclusion
A fish classification technique with salient object detection has been proposed in this paper to overcome the background variant issue on FC. It has several steps of preprocessing approaches like image resizing and background removal. Afterward, a feature descriptor layer is used.
As previously illustrated, that preprocessing technique already separated many features depending on shape and color gradient, the ensemble layers with SVM, KNN, Logistic Regression, and Decision Tree play a nice role in the classification of the fishes. For the reason above, the proposed methodology stands good with high accuracy.
Our dataset consists of five classes of fish. The tested result is 99.77% on model-1 and 100% on model-2. The final decision is made from those ensembles depending on the high accuracy. Finally, our test results are compared to the proposed techniques illustrated on [5] and it stands tremendously good amongst all results illustrated there.
Moreover, the future enhancements are as follows: (i) The addition of a transfer learning approach with HOG images for more robust feature selection and dimensionality reduction. So there can be added a DLN layer after the HOG feature selection layer to achieve. (ii) Morphometric analysis of actual fish from the fish images. As this paper used the shape feature for classification, the shape can also be used for the height, width, and weight comparison. Therefore generate an automatic system that can produce those characteristics from an image. (iii) Dataset improvement. In the future, the dataset will have images of more than five classes of fish so that it can empower more to the proposed models.