1 Introduction
In recent years, class imbalance problems have emerged as one of the challenges in data mining community [28]. This kind of data appear in many real-world classification problems like fault diagnosis [29], anomaly detection [24], medical diagnosis [19], circuit breaker maintenance diagnosis [21], among others. In binary classification, this problem occurs when the number of instances of one class is much lower than the instances of the other class. The overrepresented class is called the majority or negative class, and the other class the minority or positive class. Traditional classifiers generally tend to classify almost all instances as negative (i.e., the majority class) [22].
Many techniques for dealing with class imbalance have emerged: those that modify the data distribution by preprocessing techniques (data level solutions) [2, 6, 18, 23], those at the level of the learning algorithm which adapt a base classifier to deal with class imbalance (algorithm level solutions) [4, 16, 22], those that apply different costs to misclassification of positive and negative samples (cost-sensitive solutions) [8, 18, 25, 27], and ensemble based solutions that combine the previous solutions by means of an ensemble [10].
In this paper, we present an ensemble solution to classify imbalanced using the Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor (IFROWANN) algorithm [22]. This classifier combine fuzzy rough set theory and ordered weighted average aggregation to taking into account the imbalance of the classes. Each strategy of weight and fuzzy-rough indiscernibility relation produce different classification results. It is difficult to know the best configuration to obtain the best classification result in different datasets.
The ensemble generates the same classifier (IFROWANN), but with different weight strategy and different fuzzy relation. Final classification is given by one of these strategies. To evaluate the quality of our model, we have carried out an extensive experimental analysis on a collection of 66 imbalanced datasets with different imbalance ratios (IR) (between 1.82 and 129.44), originating from the SCI2S site (sci2s.ugr.es). In the experiments, we have compared our algorithm with the base IFROWANN proposal to show that it is similar to it without to test the weight and fuzzy relation strategies. Also, we have compared the ensemble with a set of 16 state-of-the-art methods designed for imbalanced classification, obtaining that it is better positioned than 14 and without significant differences with the rest. To assess the classification performance, we have use the Area Under the Curve (AUC) metric [22], and the significance of the results has been supported by the Friedman tests and post hoc Holm procedure.
The remainder of this paper is organized as follows. In Section 2, we introduce the imbalanced classification problem, including an overview of the state-of-the-art methods for solving it. In Section 3, we recall the standard IFROWANN algorithm. In Section 4, we introduce the WIFROWANN algorithm and present the proposed strategies to instance classification. In Section 5, we discuss the setup of the experimental study. In Section 6, we present and discuss the results. In Section 7, we draw some conclusions and future work about the study.
2 Classifications in Imbalanced Datasets
In this section, we first introduce the problem of imbalanced datasets in classification. Then, several techniques to address the class imbalance problem are presented. In binary classification, it is considered a set of data samples 𝑈, characterized by their values for the set 𝐴 = {𝑎1,..., 𝑎𝑚} of attributes. Moreover,𝑈 = 𝑃 ∪ 𝑁 where 𝑃 represents the positive class, and 𝑁 represents the negative class. We denote 𝑝 = 𝑃 ∨, 𝑛 = 𝑁 ∨, and 𝑡 = 𝑈 ∨ 𝑝 + 𝑛. The imbalance rate is then defined as 𝑛/𝑝. The imbalanced classification problem can be tackled using four main types of solutions:
1) Sampling (solutions at the data level) [18]: This kind of solution consists of balancing the class distribution by means of a preprocessing strategy. Techniques at data level are undersampling, oversampling and hybrid methods. Some examples of this technique are Synthetic Minority Oversampling Technique (SMOTE) algorithm [6], SMOTE-ENN [2] and SMOTE-RSB* [23].
2) Design of specific algorithms (solutions at the algorithmic level) [4, 16]: Traditional classifier is adapted to deal directly with the imbalance between the classes. This is the case of Imbalanced Fuzzy Rough Ordered Weighted Average Nearest Neighbor (IFROWANN) [22].
3) Cost-sensitive solutions [8, 18]: These kind of methods incorporate solutions at data level, at algorithmic level, or at both levels together. They try to minimize higher cost errors where the cost of misclassifying a positive instance should be higher than the cost of misclassifying a negative one. The main examples are Cost-sensitive C4.5 decision tree (CS-C4.5) [25] and Cost-sensitive support vector machine (CS-SVM) [27].
4) Ensemble solutions [11]: Usually combine an ensemble learning algorithm and one of the techniques above, specifically, data level and cost-sensitive. For example the EUSBOOST algorithm [10], which uses evolutionary undersampling.
Next, we will discuss the evaluation of machine-learning algorithms in imbalanced domains.
3 Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor (IFROWANN)
In this section, we introduce the IFROWANN classification algorithm proposed in [22]. This algorithm is a variation of the Fuzzy-Rough Nearest Neighbor (FRNN) algorithm [17]. In order to predict the class of a new instance 𝑥, the IFROWANN algorithm calculate the memberships degrees of 𝑥 to the fuzzy-rough lower and upper approximation of each class and assigns the instance to the class with higher degree.
More precisely, let 𝑈 be the universe, I an implicator, T a t-norm defined by I(𝑎,
𝑏) = 𝑚𝑎𝑥(1 − 𝑎, 𝑏) and T(𝑎, 𝑏) = 𝑚𝑖𝑛(𝑎, 𝑏),for 𝑎, 𝑏 in [0,1],
and 𝑅 a fuzzy relation that represents approximate indiscernibility between
instances,
An implicator I is a [0,1]2 → [0,1] mapping that is decreasing in its first argument and increasing in its second argument, and that satisfies I (0,0) = I(0,1) = I(1,1) = 1, and I(1,0) = 0. The membership degrees to the positive class 𝜇𝑃(𝑥) and negative class 𝜇𝑁(𝑥) are defined by:
where
where OWA is the operator to take into account the imbalance. Given a sequence 𝐴 of
𝑡 real values 𝐴 = ⟨𝑎1, …, 𝑎𝑡⟩, and a weight vector 𝑊 =
⟨𝑤1, …, 𝑤𝑡⟩ such that 𝑤𝑖 ∈ [0,1]
and
IFROWANN algorithm has two fundamental factors: indiscernibility fuzzy function and weight vector. There is no weight and fuzzy relation strategies to obtain the best results for any 𝐼𝑅. In order to obtain the best result of classification we must to try with several combinations of weight vectors and fuzzy relations. For this reason, this paper proposes an ensemble with the IFROWANN algorithm, which combine different weight strategies and fuzzy relations.
4 Extending the IFROWANN Algorithm
The Wrapper Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor (WIFROWANN) is an ensemble that uses the IFROWANN algorithm in base, with different configurations of OWA weight vector and fuzzy relations. The description of the algorithm is divided in two main parts. First, we discuss the configurations to run the classifier, and second in Section 4.1, the different strategies to output the ensemble classification of the new instance.
To predict the class of a new instance 𝑥, the ensemble creates a set of classifiers 𝐶 = {𝐶1, …, 𝐶𝐿}, each one with a weight strategy and an indiscernibility fuzzy relation strategy too. Given a vector with different weight strategies 𝑊 = [𝑊1, …, 𝑊𝑛] and a vector with different configurations of indiscernibility fuzzy-rough relations 𝑅 = [𝑅1, …, 𝑅𝑚]; the ensemble build 𝑊 × 𝑅 classifiers. Each classifier 𝐶𝑖computes the membership degree to the positive and negative class and the final result is given by the strategy of fusion or selection chose by us. Next section we explain these strategies. Weights strategy define which weight vector the classifier 𝐶𝑖 will use to calculate the belong degree to the lower approximation to the positive and negative class in equations 3 and 4 respectively. The different weight vectors proposed in [22] are:
where 𝑝 = |𝑃| and 𝑛 = |𝑁|, are the number of instances of the Positive 𝑃 and Negative 𝑁 classes.
A variation of the
where 𝑟 = [𝑝 + 𝛾(𝑛 − 𝑝)].
In particular, the first 𝑝 positions in
We consider the following three alternatives for defining the fuzzy relation 𝑅 Average t-norm, Łukasiewicz t-norm and Minimum t-norm 𝑅𝐴𝑣, 𝑅𝑇𝐿𝑜𝑅𝑀𝑖𝑛:
where Łukasiewicz t-norm is defined by, for 𝑢1, 𝑢2,..., 𝑢𝑚 in [0,1]:
and 𝑅𝑎is the similarity function between 𝑥 and 𝑦 instances with the attribute 𝑎.
For a quantitative attribute and nominal attribute, we use the equations respectively:
We explain below in detail the strategies followed to build the ensemble.
4.1 Strategies to Fusion Results
Strategy I: Select the Classifier with Best AUC in Training
This strategy selects the classifier with best AUC in training and the classification is made by the selected classifier. In other words, each classifier 𝐶𝑖 classifies the training set 𝑈𝑡𝑟 and calculates the AUC for this training set. The ensemble selects the classifier 𝐷 with the best AUC.
where 𝑊𝑖 is the weigth strategy followed by the classifier 𝐶𝑘 and 𝑅𝑗 is the fuzzy relation strategy take it into account by the classifier 𝐶𝑘. The new instance 𝑥 takes the class given by the classifier 𝐷.
Strategy IIa: Select the classifiers of maximum AUC and average the membership degrees to the positive and negative class
In this strategy the ensemble selects a set of algorithms with maximum AUC value.
The selected set computes the membership degree to the positive and negative class of the instance𝑥, then the calculated membership degrees are averaged. Finally 𝑥 is classified as positive if the membership degree to the positive class is greater than the negative class 𝜇𝑃′(𝑥) ≥ 𝜇𝑁′(𝑥), and negative in the other case.
Strategy IIb: Fusion the membership degrees to the positive and negative class
First, each classifier 𝐶𝑖 calculates the membership degree to the positive 𝜇𝑃𝑖(𝑥)and negative 𝜇𝑁𝑖(𝑥)class, and the outputs of the ensemble are the average degrees to the positive and negative class of all classifiers. Finally 𝑥 is classified as positive if 𝜇𝑃′(𝑥) ≥ 𝜇𝑁′(𝑥), and it is classified as negative, otherwise.
5 Experimental Setups
In this section, we describe the experimental framework used to validate our proposal, including the benchmark datasets, the state-of-the-art methods, and the statistical tests used in order to carry out the performance comparison.
5.1 Datasets
We consider 66 datasets with different 𝐼𝑅 (between 1.82 and 129.44) to evaluate our proposal. The dataset was obtained from SCI2S site. They are available online as part of the KEEL data set repository [1]. The characteristics of these datasets can be found in Table 1, showing the imbalanced rate 𝐼𝑅, the number of instances (Inst), and the number of attributes (Attr) for each of them.
Dataset | IR | Inst | Attr | Dataset | IR | Inst | Attr |
glass1 | 1,82 | 214 | 9 | glass-0-4vs5 | 9,22 | 92 | 9 |
ecoli0vs1 | 1,86 | 220 | 9 | ecoli-0-3-4-6vs5 | 9,25 | 205 | 7 |
wisconsin | 1,86 | 683 | 7 | ecoli-0-3-4-7vs5-6 | 9,28 | 257 | 7 |
pima | 1,90 | 768 | 8 | yeast-0-5-6-7-9vs4 | 9,35 | 528 | 8 |
iris0 | 2,00 | 150 | 4 | ecoli-0-6-7vs5 | 10 | 220 | 6 |
glass0 | 2,06 | 214 | 9 | vowel0 | 10,1 | 988 | 13 |
yeast1 | 2,46 | 1484 | 8 | glass-0-1-6vs2 | 10,29 | 192 | 9 |
vehicle1 | 2,52 | 846 | 18 | glass2 | 10,39 | 214 | 9 |
vehicle2 | 2,52 | 846 | 18 | ecoli-0147vs2356 | 10,59 | 336 | 7 |
vehicle3 | 2,52 | 846 | 18 | led7digit02456789vs1 | 10,97 | 443 | 7 |
haberman | 2,68 | 306 | 3 | glass-0-6vs5 | 11 | 108 | 9 |
glass-0123vs456 | 3,19 | 214 | 9 | ecoli-0-1vs5 | 11 | 240 | 6 |
vehicle0 | 3,23 | 846 | 18 | glass-0-1-4-6vs2 | 11,06 | 205 | 9 |
ecoli1 | 3,36 | 336 | 7 | ecoli-0-1-4-7vs5-6 | 12,28 | 332 | 7 |
new-thyroid2 | 4,92 | 215 | 5 | cleveland-0vs4 | 12,62 | 173 | 13 |
newthyroid1 | 5,14 | 215 | 5 | ecoli-0-1-4-6vs5 | 13 | 280 | 6 |
ecoli2 | 5,46 | 336 | 7 | ecoli4 | 13,84 | 336 | 7 |
segment0 | 6,01 | 2308 | 19 | yeast-1vs7 | 13,87 | 459 | 7 |
glass6 | 6,38 | 214 | 9 | shuttle-0-vs-4 | 13,87 | 1829 | 9 |
yeast3 | 8,11 | 1484 | 8 | glass4 | 15,47 | 214 | 9 |
ecoli3 | 8,19 | 336 | 7 | page-blocks-1-3vs4 | 15,85 | 472 | 10 |
page-blocks0 | 8,77 | 5472 | 10 | abalone9-18 | 16,68 | 731 | 8 |
ecoli-0-3-4vs5 | 9 | 200 | 7 | glass-0-1-6vs5 | 19,44 | 184 | 9 |
yeast-2vs4 | 9,08 | 515 | 7 | shuttle-2-vs-4 | 20,5 | 129 | 9 |
ecoli-0-6-7vs3-5 | 9,09 | 222 | 7 | yeast-1-4-5-8vs7 | 22,1 | 693 | 8 |
ecoli-0-2-3-4vs5 | 9,1 | 202 | 7 | glass5 | 22,81 | 214 | 9 |
glass-0-1-5vs2 | 9,12 | 172 | 9 | yeast-2vs8 | 23,1 | 482 | 8 |
yeast-0-3-5-9vs7-8 | 9,12 | 506 | 8 | yeast4 | 28,41 | 1484 | 8 |
yeast-02579vs368 | 9,14 | 1004 | 8 | yeast-1-2-8-9vs7 | 30,56 | 947 | 8 |
yeast-0256vs3789 | 9,14 | 1004 | 8 | yeast5 | 32,78 | 1484 | 8 |
ecoli-0-4-6vs5 | 9,15 | 203 | 6 | ecoli-0-1-3-7vs2-6 | 39,15 | 281 | 7 |
ecoli-0-1vs2-3-5 | 9,17 | 244 | 7 | yeast6 | 39,15 | 1484 | 8 |
ecoli-0-2-6-7vs3-5 | 9,18 | 224 | 7 | abalone19 | 129,44 | 4174 | 8 |
In our experimental study, we have also considered two subsets of the collection based on their 𝐼𝑅:
𝐼𝑅 < 9(low imbalance): This group contains 22 datasets, all with IR lower than 9.
IR≥9 (high imbalance): This group contains 44 datasets, all with IR at least 9.
Furthermore, each dataset is partitioned in order to perform a fivefold cross validation.
5.2 Algorithms Analyzed in the Experimental Study
For the experimental study we consider the principal state-of-the-art methods: The IFROWANN algorithm with its competitive variants: AV-W6, AV-W4, TL-W6 and FRNN algorithm [22], preprocessing techniques, cost-sensitive and ensemble methods combined with a base classifier. We chose tree-based method C4.5 [20], support vector machines SVM [27], and lazy learner 1NN (k=1) [5]. Preprocessing techniques are: SMOTE, SMOTE+RSB*, SMOTE+ENN, Borderline-SMOTE, SafeLevel-SMOTE and DBSMOTE. Cost sensitive algorithms are: CS, MetaCost, CostSensitiveClassifier and CSWeighted. Ensemble methods are: AdaB-M1, AdaC2, RUSB, SBAG, Easy and EUSBOOT.
In order to compare the different algorithms appropriately, we will conduct a statistical analysis using nonparametric tests as suggested in [12, 14]. We first use Friedman’s aligned-ranks test [9] and then Holm’s post hoc test [15]. The post hoc procedure allows us to decide whether a hypothesis of comparison can be rejected at a specified level of significance 𝛼. In this paper, we set 𝛼 = 0.05. The KEEL tool was used to perform the tests.
5.3 Parameters
The parameters for the ensemble and IFROWANN algorithm are the fuzzy relations Average t-norm, Łukasiewicz t-norm and Minimum t-norm (see equations 10, 11 and 12); similarity functions for quantitative and nominal attributes (see equations 14 and 15) and 6 variants of weight vectors combinations recommended in [22], which are described below:
In the Strategy I we prove three variants. (1) W-All: ensemble uses[𝑁𝑜𝑛𝑒, 𝑊1, 𝑊2, 𝑊3, 𝑊4, 𝑊5, 𝑊6] weight strategy and [𝐴𝑉, 𝑇𝐿, 𝑀𝐼𝑁] fuzzy relations; (2) W-weights: all classifiers have weight vectors, and (3) W-W6W4W5: builds classifiers with[𝑊6, 𝑊4, 𝑊5] weight strategies and [𝐴𝑉, 𝑇𝐿] fuzzy relations.
In the Strategy IIa we form one variant W-SF: ensemble uses [𝑁𝑜𝑛𝑒, 𝑊1, 𝑊2, 𝑊3, 𝑊4, 𝑊5, 𝑊6] weights strategies and the three fuzzy relations and in the Strategy IIb the variant W-F2, ensemble uses [𝑊6, 𝑊4, 𝑊5] weight strategies and [𝐴𝑉, 𝑇𝐿] fuzzy relations.
6 Experimental Results
In this section, we present the results of our experimental analysis.
In Section 6.1, we compare our proposal with IFROWANN baseline methods and its best configurations. Next, in Section 6.2, we compare the algorithms with the state-of-the-art methods for imbalanced classification.
6.1 Comparative Analysis of WIFROWANN with IFROWANN
Table 2 shows the mean AUC obtained for each variant of the ensemble and the best IFROWANN configurations with each block of datasets. It can be noticed that for the high imbalance datasets (IR ≥ 9), AV-W6 obtains the highest average AUC. However, for low imbalance datasets (𝐼𝑅 < 9), the ensemble with W-F2 reaches the highest value.
Method | 𝐼𝑅 < 9 | 𝐼𝑅 ≥ 9 |
W_All | 0,9045 | 0,9038 |
W_Weights | 0,9121 | 0,9115 |
W_W6W4W5 | 0,9202 | 0,9213 |
W_SF | 0,9114 | 0,9126 |
W_F2 | 0,9255 | 0,9253 |
TL_W4 | 0,9141 | 0,9073 |
TL_W6 | 0,9086 | 0,9232 |
AV_W4 | 0,9204 | 0,9174 |
AV_W6 | 0,9110 | 0,9343 |
MIN_W4 | 0,9076 | 0,8833 |
MIN_W6 | 0,8902 | 0,8986 |
FRNN_MIN | 0,8948 | 0,8716 |
FRNN_AV | 0,9061 | 0,9089 |
FRNN_TL | 0,9030 | 0,9020 |
Next, we carry out a statistical analysis of our results for each block of datasets.
Statistical Analysis for Low Imbalance Ratio Datasets: Table 3 shows the Friedman test and Holm’s procedure for 𝐼𝑅 < 9 datasets. For low imbalance the best ranking is obtained by W-F2 and Holm’s procedure rejects those hypotheses that have 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.025. Holm posthoc test allows to conclude that W-F2 is no significantly better than AV-W4; but is significantly better than the rest of the variants of both algorithms. It remarkable to notice that W-F2 and W-W6W4W5 variants of the WIFROWANN are best positioned in the ranking than IFROWANN, and W-F2 has significant difference with AV-W6.
Statistical Analysis for High Imbalance Ratio Datasets: In this case, Holm’s procedure rejects those hypotheses that have 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.016667. As we can observe (see Table 4), the best ranking is obtained by IFROWANN with AV-W6. The adjusted p-values indicates that the method AV-W6 significantly no outperforms the TL-W6 IFROWANN configuration and the variants of WIFROWANN methods W-F2 and W-W6W4W5.
Algorithm |
Average
Friedman Ranking |
Adjusted p-value |
W-All | 9.2955 | 0.000001 |
MIN-W4 | 8.5455 | 0.000012 |
FRNN-AV | 8.4318 | 0.000018 |
TL-W6 | 7.6136 | 0.000273 |
W-Weights | 7.3409 | 0.000618 |
AV-W6 | 7 | 0.001614 |
W-SF | 6.9773 | 0.001717 |
TL-W4 | 6.1364 | 0.013565 |
AV-W4 | 4.3864 | 0.279642 |
W-W6W4W5 | 4.25 | 0.330547 |
W-F2 | 3.0227 | - |
6.2 Comparative Analysis with the State-of-the-art Methods
This section compares the ensemble variants with the state-of-the-art methods. The mean AUC results for the state-of-the-art methods are shown in Table 5. For every technique and every IR levels, the highest AUC value is marked in bold.
Methods | <9 | >=9 | ||
Preprocessing techniques | C4.5 | SMOTE+ENN | 0,8640 | 0,8164 |
SMOTE+ RSB* | 0,87 | 0,8232 | ||
Borderline-SMOTE | 0,8564 | 0,7998 | ||
SafeLevel-SMOTE | 0,8643 | 0,8106 | ||
ADASYN | 0,8604 | 0,8035 | ||
SPIDER2 | 0,8499 | 0,7778 | ||
DBSMOTE | 0,8357 | 0,7638 | ||
SMO (SVM) | SMOTE | 0,8574 | 0,8418 | |
SMOTE+ENN | 0,8560 | 0,8412 | ||
SMOTE+ RSB* | 0,91 | 0,8815 | ||
Borderline-SMOTE | 0,8556 | 0,8317 | ||
SafeLevel-SMOTE | 0,8565 | 0,8358 | ||
ADASYN | 0,8545 | 0,8212 | ||
SPIDER2 | 0,8269 | 0,6923 | ||
DBSMOTE | 0,8252 | 0,7192 | ||
1NN (KNN con k=1) | SMOTE | 0,8478 | 0,8272 | |
SMOTE+ENN | 0,8645 | 0,8342 | ||
SMOTE+ RSB* | 0,92 | 0,9046 | ||
Borderline-SMOTE | 0,8518 | 0,8007 | ||
SafeLevel-SMOTE | 0,8365 | 0,7861 | ||
ADASYN | 0,8526 | 0,8270 | ||
Cost sensitive algorithms | C4.5 | CS | 0,8578 | 0,8137 |
MetaCost | 0,8617 | 0,8246 | ||
CostSensitiveClassifier | 0,8487 | 0,7931 | ||
SMO | CSWeighted | 0,8597 | 0,8397 | |
MetaCost | 0,7289 | 0,6559 | ||
CostSensitiveClassifier | 0,8565 | 0,8304 | ||
1NN | CSWeighted | 0,8559 | 0,8416 | |
MetaCost | 0,8455 | 0,8147 | ||
CostSensitiveClassifier | 0,8367 | 0,7943 | ||
Ensemble methods | C4.5 | AdaB-M1 | 0,8463 | 0,7877 |
AdaC2 | 0,8649 | 0,7958 | ||
RUSB | 0,8747 | 0,8405 | ||
SBAG | 0,8771 | 0,8431 | ||
Easy | 0,8711 | 0,8243 | ||
SMO | AdaB-M1 | 0,8059 | 0,7392 | |
AdaC2 | 0,6487 | 0,6163 | ||
RUSB | 0,8270 | 0,7141 | ||
SBAG | 0,8556 | 0,8406 | ||
Easy | 0,8501 | 0,8304 | ||
1NN | AdaB-M1 | 0,8375 | 0,7948 | |
AdaC2 | 0,8370 | 0,7935 | ||
RUSB | 0,8562 | 0,8416 | ||
SBAG | 0,8599 | 0,8427 | ||
Easy | 0,8589 | 0,8365 | ||
EUSBOOT | 0,93 | 0,9071 |
The mean AUC results for the ensemble variants are shown in Table 2 (the first five rows). From these results, we can observe that the best AUC values in all blocks are obtained by SMOTE + RSB*, EUSBOOT and WIFROWANN with all its variants, and W-F2 obtains the highest AUC values, except for low 𝐼𝑅 datasets for which EUSBOOT gets the highest score. We carry out a statistical analysis of our results for each techniques and each block of datasets. In these cases, per block we show only the methods which obtain a 𝑝 − 𝑣𝑎𝑙𝑢𝑒 > 0. The rest of the state-of-the-art methods has a 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.
Statistical analysis for preprocessing techniques: Tables 6 and 7show the Friedman test and Holm’s procedure for 𝐼𝑅 < 9 and 𝐼𝑅 ≥ 9. In both cases, the best ranking is obtained by W-F2. For 𝐼𝑅 < 9 Holm’s procedure rejects those hypotheses that have 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.008333. W-F2 has significant difference with all preprocessing techniques, except for RSB-kNN and RSB-SVM. For IR≥9 Holm’s procedure rejects those hypotheses that have 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.01. W-F2 has significant difference with all preprocessing techniques too, except for RSB-kNN.
Statistical analysis for Cost-Sensitive algorithms: Tables 8 and 9 show the results of Friedman test and the Holm’s procedure for IR<9. For low imbalance, the associated p-value is 0.0125, which leads us to conclude that there are statistically significant differences among the cost-sensitive compared methods compared with W-F2. For IR≥9 Tables show only Friedman test because of the p-values of cost-sensitive algorithms was 0.
Statistical analysis for ensemble techniques: Table 10 and 11 show Friedman test and Holm’s procedure for low and high imbalance datasets. The lowest Friedman rank is obtained by W-F2; however, there is no significant differences with EUSBOOST. W-F2 statistically outperforms the rest of the ensemble methods. We can conclude the same analysis for high imbalance.
Algorithm |
Average
Friedman Ranking |
Adjusted p-value |
RSB-C4.5 | 15.3864 | 0.000001 |
SMO(SMOTE) | 15.1818 | 0.000001 |
W-All | 7.1591 | 0.079631 |
RSB-kNN | 6.5455 | 0.130075 |
RSB-SVM | 6.2045 | 0.167274 |
W-Weights | 5.5682 | 0.257154 |
W-SF | 5.4091 | 0.284091 |
W-W6W4W5 | 3.7273 | 0.677354 |
W-F2 | 2.6591 | - |
Algorithm |
Average
Friedman Ranking |
Adjusted p-value |
RSB-SVM | 9.5682 | 0.000121 |
RSB-kNN | 6.3636 | 0.037686 |
W-All | 5.875 | 0.070439 |
W-Weights | 4.9659 | 0.190773 |
W-SF | 4.4318 | 0.310542 |
W-W6W4W5 | 3.2727 | 0.707224 |
W-F2 | 2.5909 | - |
Algorithm |
Average
Friedman Ranking |
Adjusted p-value |
SMOCSWeighted | 7.7045 | 0.000001 |
W-All | 4.7273 | 0.016393 |
W-SF | 3.8409 | 0.099929 |
W-Weights | 3.7727 | 0.112483 |
W-W6W4W5 | 2.5682 | 0.574592 |
W-F2 | 1.9091 | - |
Algorithm |
Average
Friedman Ranking |
SMO(MetaCost) | 12.2273 |
1NN(CostSensitiveClassier) | 10.4545 |
1NN(MetaCost) | 9.2841 |
C4.5CS | 9.1477 |
SMO(CostSensitiveClassier) | 8.6364 |
C4.5(MetaCost) | 8.4886 |
SMOCSWeighted | 8.1591 |
1NNCSWeighted | 7.9091 |
W-All | 4.4545 |
W-Weights | 3.8295 |
W-SF | 3.6364 |
W-W6W4W5 | 2.6818 |
W-F2 | 2.0909 |
Algorithm |
Average
Friedman Ranking |
Adjusted p-value |
C4.5(RUSB) | 11.5 | 0.000001 |
SMO(SBAG) | 11.2955 | 0.000002 |
C4.5(SBAG) | 10.5227 | 0.000012 |
W-All | 5.8864 | 0.058076 |
W-Weights | 4.9545 | 0.162399 |
W-SF | 4.5455 | 0.238646 |
EUSBOOST | 4.2727 | 0.301791 |
W-W6W4W5 | 3.0455 | 0.706474 |
W-F2 | 2.3409 | - |
6 Conclusions
In this paper, we have presented the WIFROWANN method, a new ensemble level solution for two-class imbalanced classification problems that is based on the IFROWANN algorithm. In particular, the W-All, W-Weights, W-SF, W-W6W4W5 and W-F2 variants of WIFROWANN method, considering six weighting strategies and no weighting strategy, combined with three different indiscernibility fuzzy relations. Our experimental results and statistical analysis have shown that:
W-F2 obtains better AUC mean respect to IFROWANN and the best position in the Friedman ranking for low imbalance datasets. Holm’s procedure shows that this variant present significant difference with AV-W6 for low imbalance and no significant difference with the same IFROWANN configuration for high imbalance.
WIFROWANN outperforms 14 state-of-the-art representative algorithms that cover preprocessing level, cost-sensitive, and ensemble solutions specifically designed for imbalanced learning and similar behavior with IFROWANN, SMOTE+RSB and EUSBOOT methods.
For future work, we will consider extend WIFROWANN method for multiclass and multi-labels classification problems.