Wardp HMM: A Shilling Attack Detection Technique Using Wardp Method and Hidden Markov Model

Chowdhury, Keya; Majumder, Abhishek; Sarkar, Joy Lal; Chakraborty, Sukanta; Roy, Sudipta; Chowdhury, Keya; Majumder, Abhishek; Sarkar, Joy Lal; Chakraborty, Sukanta; Roy, Sudipta

doi:10.13053/cys-25-3-3907

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.25 no.3 Ciudad de México jul./sep. 2021 Epub 13-Dic-2021

https://doi.org/10.13053/cys-25-3-3907

Articles

Ward_p HMM: A Shilling Attack Detection Technique Using Ward_p Method and Hidden Markov Model

Keya Chowdhury¹

Abhishek Majumder¹^*

Joy Lal Sarkar¹

Sukanta Chakraborty¹

Sudipta Roy²

^¹Tripura University, Department of Computer Science & Engineering, India, ckeya28@gmail.com, abhi2012@gmail.com, joylalsarkar@gmail.com, visitsukanta@gmail.com

^²Assam University, Department of Computer Science and Engineering, India, sudipta.it@gmail.com

Abstract:

Collaborative Filtering Recommender Systems (CFRSs) are widely employed in several applications because of its satisfying performance in the customized recommendation. Recent studies show that CFRSs are at risk of shilling attacks where attackers inject shilling profiles into the system. Malicious user injected ratings not only severely impact genuineness of recommendations but also user's trustworthiness within recommendation systems. Existing unsupervised clustering technique uses Ward method, which is an iterative method of low scalability. For addressing this issue, in this work an unsupervised SA detection technique named Ward_pHMM has been proposed. It uses Ward_p and Hidden Markov Model (HMM). In this proposed method HMM is used to measure difference of user’s rating behavior. It generates User Suspicious Degree (USD) of each user by analyzing user’s Suspicious Degree Range of Items (SDRI) and User’s Matching Degree (UMD). Then Ward_p method is applied to merge users based on USD and to acquire group of Attack Users (AUs). For performance analysis of the proposed technique, Amazon-ratings sample dataset was used. The performance comparison shows that proposed Ward_pHMM technique outperforms baseline technique with respect to precision, recall and F1-score.

Keywords: Profile injection attack; Hidden Markov model; user matching degree; user suspicious degree; Ward_p method

1 Introduction

The Collaborative Filtering (CF) based recommendation system is developed for filtering out irrelevant resources [¹]. In the recommender systems, collaborative Filtering Recommender System (CFRS) is considered as a popular and productive technique. CFRS work on the principle that identical users have identical tastes.

For recommender frameworks CF has been the source of vulnerability, due to its open and interactive nature. Usually a user-based CF algorithm makes recommendation by searching out similar user patterns, which are obtained from the preferences of numerous totally non-identical people [¹]. If profiles contain biased information, they may be thought as real users and eventually lead to biased recommendations. Therefore, relevant data is buried under a good deal of irrelevant information. The filtering procedure of CF depends on the profiles of different clients, so the usage of Collaborative Filtering in recommender framework is vulnerable to Shilling Attacks (SAs). For their own benefits attackers use biased rating profiles [²]. SA can be classified as, Push Attack (PA) and Nuke Attack (NA). The PA has been used for promoting someone’s own items by giving maximum ratings and the NA has been used for demoting product of someone’s rivals by giving minimum ratings [³].

The increasingly prevalent shilling attackers apply biased rating profiles to the frameworks to control items recommendation. It not only brings down the recommending accuracy but also harms the reliability of intermediated exchange stages and members. For example, in any online site if any retailer and customer both give high positive ratings for the product then that customer can be recognized as genuine purchaser.

Otherwise, if some purchaser only gives low negative ratings to every item then he or she can be recognized as fake user or an attacker who is performing SAs. But in some cases, the genuine purchaser also gives moderate ratings, then it become difficult to recognize without proper investigation. Thus, in RS detection of SAs is a major challenge. SA detection has drawn the attention of lot of researchers. Many techniques have been proposed by different researchers for SA detection. A brief study on different existing SA detection techniques has been presented in section 2. Out of these, UD-HMM [⁴] is a promising SA detection technique. In detecting most types of SAs UD-HMM performed very well. But in case of UD-HMM, the detection performance is not well when obfuscated attack takes place. The obfuscated attack uses Standard Average Attack (SAA). Every filter item is selected from top x% of most popular items with equal probability. For addressing this problem, an unsupervised SA detection method named Ward_pHMM has been proposed in this paper. It uses Ward_p [⁵] method and Hidden Markov Model (HMM).

Proposed method concentrates on detecting the genuine users and the attack users.

Contributions of the paper are:

A shilling attack detection technique, namely Ward_pHMM, has been proposed. Here, HMM is used to calculate the User’s Preference Sequence (UPS).
The proposed technique uses Ward_p clustering method for obtaining the group of attack users. This is because Ward_p method uses feature weights, which produces results that are superior to those produced by Ward method.
To analyze performance of Ward_pHMM technique, extensive experiments have been carried out and compared with UD-HMM technique.

The paper is arranged as follows. Section 2 discusses background and related work. Proposed Shilling Attack (SA) detection technique is presented in section 3. Section 4 discusses performance analysis and comparison of both UD-HMM and Ward_pHMM. Finally, in section 5 conclusions and future work has been discussed.

2 Background and Related Work

The section discussed profile injection attack in brief. It also presents the existing profile injection attack detection techniques.

2.1 Profile Injection Attack

An attack against a CFRS requires a group of Attack Profiles (APs) which attacker injects. Attackers may inject shilling profiles with highest or lowest rating to the target items to be promoted or demoted respectively [⁶]. An AP contains lot of biased ratings [⁷].

Figure 1 illustrates an attack profile’s generalized form [³]. i^S represents the selected item which is mainly used for characterizing the attack and 𝜑(𝑖^𝑆) represents the selected item’s rating. i^F represents the filler item which normally rate items randomly to be looked like normal profile.

Fig. 1 Attack profile’s generalized form [3]

It is difficult to detect. 𝜑(𝑖^𝐹) represents the ratings of filler item. i^φ represents the unrated items and 𝜑(𝑖^∅) represents the ratings of unrated item. i^T represents the target items, which gets highest ratings for promotion or lowest ratings for demotion by the attackers. 𝛾(𝑖^𝑇) represents the ratings of target item.

2.2 Shilling or Profile Injection Attack Detection techniques

The detection of profile injection attacks in CFRSs has attracted huge attention from research community [⁶, ⁷]. Over past few decades, many shilling attack detection techniques were proposed. There are three types of detection techniques: (i) unsupervised detection method, (ii) supervised detection method and (iii) semi-supervised detection method.

The supervised classification techniques need labeled training classifiers and sample information. They can appropriately detect attacks of known kind. Example of supervised classification technique is RAdaBoost [⁸]. Yang et al. proposed this detection technique. It detects different types of attacks based on 18 statistical features of malicious users.

However, due to huge amount of features this method is computationally intensive. Burke et al. [⁹] proposed a length variance (lengthVar), which is a generic attribute. For a user the number of ratings is represented by length. This attribute measures the variation in length for some provided user from average length. For finding fake profiles that are correlated with items' subset, a variance-adjusted HV value was proposed by Bryan et al. [¹⁰], whose objective is that fake profiles will have a maximum Hv value.

Previous knowledge of attacks is normally not required in unsupervised clustering techniques. This include candidate attack users to be labeled and range of attack profiles injected. Utilizing multidimensional scaling a hybrid two phase detection method was proposed by Lee et al. [¹¹]. It is an unsupervised clustering technique. The detection performance of this method is very good with high filler size while detecting the Average Attack (AA). With small filler size, the detection performance is not well when detecting Random Attack (RA).

Zhang et al. proposed a HMM and Hierarchical Clustering (HC) based technique named UD-HMM [⁴] to identify the profile injection attacks in CFRSs. This method first calculates the User’s Suspicious Degree (USD) by utilizing the HMM and then uses the HC to detect the group of attack users.

The method outperforms the baseline methods in detecting different kinds of profile injection attack. However, when detecting the obfuscated attack based on standard average attack the detection performance is not good. To detect attackers based on beta distribution Yang et al. [¹²] used a novel Beta-Protection (βP) method. This method does not require previous information about the rating distribution of each product. Beta-Protection (βP) is used to immune the missing values.

In most of the semi-supervised detection techniques, there is little quantity of users who are labeled but massive quantity of users are unlabeled. So some existing works emphasize on modeling of both unlabeled and labeled consumer profiles. Zhang et al. [¹³] developed one Semi-Supervised SA detection method. From the product reviews, the method detects the malicious users. Performance of this method is good when known types of attacks are detected.

However, it requires some labeled profiles to create the training classifiers. Wu et al. [¹⁴] proposed hPSD (semi-supervised hybrid learning) model for detecting SA. This model uses both user-item relations and user features to gain maximum rate of SA detection. Cao et al. [¹⁵] and Wu et al. [¹⁶] proposed a Semi-SAD (Semi-supervised learning based SA Detection) method. This method takes advantage from both unlabeled and labeled user profiles for detecting SA.

3 Proposed Technique

In this section, the proposed technique named Ward_pHMM has been presented. Ward_pHMM is an unsupervised SA detection technique, which uses HMM and Ward_p method. Here the Ward_p [⁵] method creates feature weights by utilizing L_p norm, which can be seen as feature rescaling factor. The clusters formed by Ward_p depend on p. The proposed scheme has two parts.

The first part is measurement of difference in rating behaviors of user and second part is the detection of attack profiles.

3.1 Prerequisite

In this section, prerequisites for working of the Ward_pHMM method have been presented.

3.1.1 User Preference Sequences (UPS)

In CFRS, authentic clients ordinarily rate objects according to their real preference. Whereas, Attack Clients (ACs) rate objects to bias framework's output according to their specific requirements. The rankings based on the rating given by ACs do not reflect the genuine preferences. In this way, a large difference exists between attack and genuine clients with respect to rating patterns. Such difference can be analyzed based on User Rating Item Sequence (URIS). URIS is given as:

URISu={im1s1v,im2s2v,…,imnsnv}. (1)

where, im1, im2, …, imn represents the items, v represents the user and s represents the rating time.

The process of obtaining User Preference Sequences (UPS) is:

Based on rating information of user, observation sequence is constructed. Observation sequence represents item’s rating sequence of every consumer.
For getting the Hidden Markov Model λ0, firstly parameters of HMM λ ={ M, A, B } are set to small arbitrary values. Here, M represents initial probability distribution, A is matrix of transition probability, and B is matrix of emission probability. The observation sequence OB = {OB1, OB2, … , OBS} is used to train the HMM. Here, s denotes the length of the number of items rated by user v (i.e., observation sequence). Then for re-estimating HMM parameters Baum-Welch algorithm [¹⁷] is used.
Finally from the re-estimated HMM parameters, UPS and State Transition Sequence of each user is obtained using Viterbi algorithm [¹⁷].

3.1.2 User Matching Degree

For a user v ∈ V, let the observation sequence be OBv = {OB1, OB2, …, OBS} and preference sequence (i.e., the equivalent hidden state sequence of user v) be Qv = {q1,q2,…,qS}. Then the User v’s Matching Degree (UMDv) can be calculated as:

UMDv=M^(q1)B^(q1,OB1)∏i=2SA^(qj−1,qj)B^(qj−1,OBj), (2)

where, the initial State Probability (SP) matrix is represented by M^, the Observation Probability (OP) matrix is represented by B^ and the State Transition Probability (STP) matrix is represented by A^.

3.1.3 User Suspicious Degree (USD)

The ACs injects some specific number of APs into the CFRS to create the supported attack impact. Since the ACs must rate the similar target object, so ratings for the target object is very important.

3.1.3.1 Item Rating Sequence

Let i be an item and J be the set of all items in the dataset. Rating sequence of item i ∈ J is known as Item Rating Sequence (IRS). It refers to series of item i’s ratings rtv1,is1, rtv2,is2, … , rtvn,isn provided by users v₁, v₂, ..., v_n at time s₁, s₂, ..., s_m. IRS of item i (IRS_i) can be written as:

IRSi={rtv1,is1,rtv2,is2,…,rtvn,isn}. (3)

3.1.3.2 Item Entropy

Uncertainty of arbitrary variables is often measured by entropy. Let in the CFRS the set of different user provided ratings be G. 𝑃_𝑖_,_𝑒 represents the probability for item i ∈ J, that the users have given e points. Then, the Item Entropy of i (IEi) can be calculated as:

IEi=−∑e∈GPi,elog⁡2Pi,e, (4)

Pi,e=∑rtvk,i∈IRSisk⌜(rtvk,isk,e)∑e∈G∑rtvk,i∈IRSisk⌜(rtvk,isk,e), (5)

where, ⌜(rtvk,isk,e) denotes a discriminator function. If rtvk,isk=e, then ⌜(rtvk,isk,e)=1; otherwise ⌜(rtvk,isk,e)=0, for dataset, G = {1, 2, 3, 4, 5}.

The more standardized item’s rating distribution is, the lesser item’s entropy will be. If the probability of that item is greater, it suggests that the ACs have rated on that item and also that item is a target object.

3.1.3.3 Item Suspicious Degree (ISD)

Let, ζ_i given in eq. 6 represents the normalized value of item i’s reciprocal entropy, φ_v given in eq. 7 represents the normalized value of user v’s reciprocal matching degree. For item i ∈ J the Item Suspicious Degree (ISDi) is given in eq. 8:

ζi=1/IEi−1/IEmax⁡1/IEmin⁡−1/IEmax⁡, (6)

φv=1/UMDv−1/UMDmax⁡1/UMDmin⁡−1/UMDmax⁡, (7)

ISDi=∑v∈vGφv|vG|×ζi, (8)

where, the group of users who gave item i high ratings has been represented by V_G. If the value of rating is more than 3, it is referred as high rating for the dataset. The lowest value of User Matching Degree (UMD) is UMD_min. Highest value of UMD is UMD_max. The user v’s matching degree is UMD_v. IE_min and IE_max be the lowest and highest value of item entropy respectively. Item i’s entropy is IE_i.

Here, φ_v indicates that the user is more suspicious if the variation of UMD is higher. ζ_i indicates that the user is more suspicious if the item ratings distribution is more dispersed.

3.1.3.4 Suspicious Degree Range of Items

Suspicious Degree Range of Items (SDRI) rated by user v ∈ V, SDR_Iv, can be calculated as:

SDRIv=ISDmax⁡v−ISDmin⁡v, (9)

where, ISDmin⁡v and ISDmax⁡v represent minimum and maximum value of ISD rated by user v respectively.

3.1.3.5 User Suspicious Degree

For user v ∈ V, let USDv is user v’s User Suspicious Degree (USD). φ_v is given in eq. 7. It represents the normalized value of user v’s reciprocal matching degree. ϕ_v given in eq. 10 represents the normalize value of ISD rated by v. So using the linear weighted combination of ϕ_v and φ_v, in eq. 11 USD_v can be calculated as:

ϕv=SDRIv−SDRImin⁡SDRImax⁡−SDRImin⁡, (10)

USDv=f×φv+(1−f)×ϕv, (11)

where, Weight Factor is represented by f, SDRI_v represents the Suspicious Degree Range of Items (SDRI) rated by user v, SDRI_max and SDRI_min represent highest value and lowest value of SDRI rated by users, respectively.

3.1.4 Ward_p Method

Ward_p [⁵] is an agglomerative hierarchical clustering technique. This method creates feature weights by utilizing L_p norm [¹⁸, ¹⁹]. For transforming the weights into feature rescaling factor L_p norm is used. The clusters created by Ward_p are reliant on p. The Ward_p technique given in eq. 12 can be calculated as:

Wardp(Zi,Zj)=MziMzjMzi+Mzj∑r∈Rwtlrp|czi,r−czj,r|p, (12)

wtlr=1∑o∈R[Dlrp/Dlop]1(p−1), (13)

Dlrp=∑n∈Z1|ynr−clr|p, (14)

clr=1|Z1|∑yn∈Z1ynr, (15)

ynr=xnr−xr¯range(xr), (16)

where, MZi and cZi,r represent the cardinality of item i at cluster Z_i and centroid of item i at cluster Z_i with respect to its feature r, respectively.

MZj and cZj,r represent the same for item j at cluster Z_j. L = |USD|, l ∈ L, where L represents the length of User Suspicious Degree. wt_lr is the weight of feature r ∈ R in the cluster having centroid at c_l. D_lrp is the distance of the feature r in cluster Z_l having centroid at c_l with respect to p. p represents the optimal exponent (p ≥ 1, p ≤ 5). c_lr is the centroid of feature r in cluster Z_l. y_nr represents cluster feature value. xr¯ is the average of feature r over the whole user’s suspicious degree and x_nr denotes an entity in the user’s suspicious degree.

3.3 Proposed Ward_pHMM Algorithm

For detecting shilling attackers, the proposed algorithm mainly needs three tasks, (i) calculation of User Preference Sequences (UPS), (ii) generation of User Suspicious Degree (USD) and (iii) shilling attacker detection based on Ward_p method. Algorithm 1 presents the proposed Ward_pHMM technique.

Algorithm 1 Algorithm of the Ward_pHMM technique

For calculating UPS firstly, this model generates test set from rating database and attack profiles (Line 2). Secondly, for each user URIS is constructed using eq. 1 (Lines 4-13). Thirdly, the Hidden Markov Model parameters are initialized (Line 14) and the HMM is trained (Lines 15-17). Finally, to generate UPS the trained HMM is used (Lines 18-21).

For generating USD at first, each item’s entropy is calculated using eq. 4 (Lines 22-24). Then, using eq. 2 each user’s UMD is calculated from UPS (Lines 25-27). Secondly, each item’s ISD (Lines 28-30) and SDRI (Lines 31-33) is calculated using eq. 8 and eq. 9 respectively.

Finally, each user’s USD is generated using eq. 10 to obtain the set of USD (Lines 34-37).

For detecting shilling attackers based on Ward_p method firstly, the L and 𝑤𝑡_𝑙𝑟 are set (Line 38). Then the Ward_p method is used for grouping set of USD in two clusters using eq. 12 (Lines 39-44). Finally, the group of Attack Users (AUs) (Lines 45-52) is generated. The cluster with higher mean value of USD is denoted as the group of AUs. The workflow of the proposed Ward_pHMM technique is shown in figure 2.

Fig. 2 Workflow of the Ward_pHMM method

4 Performance Analysis and Comparison

In this section, performance of the proposed Ward_pHMM technique has been analyzed.

It also presents a comparison between the performance of Ward_pHMM and UD-HMM. For this experiment Amazon-ratings dataset [²⁰] is used. This Amazon-ratings dataset contains 1210271 User-Ids, 249274 Product-Ids, 2023070 Ratings and 4231 Timestamps.

Ratings vary from 1 to 5. Where 5 indicate most liked and 1 indicates disliked. Here, Amazon-ratings dataset is sampled randomly containing 5000 User-Ids, 757 Product-Ids, 5255 Ratings and 1613 Timestamps. Shilling Profile (SP) has been constructed based on the obfuscated attack model [²¹] with different filler size and attack size, which is injected in dataset. Here, UD-HMM parameters α and N are set to 0.7 and 5 respectively.

4.1 Performance metrics used

To analyze performance of proposed Ward_pHMM technique, here it has been compared with UDHMM with respect to precision, recall and F1-score. The precision, recall, and F1-score are defined as:

Precision=True Positive(TP)True Positive(TP)+False Positive(FP), (17)

Recall=True Positive(TP)True Positive(TP)+False Negative(FN), (18)

F1−score=2×Precision×RecallPrecision+Recall, (19)

where, number of APs correctly detected is defined by True Positive (TP). Number of authentic users correctly detected is defined by True Negative (TN). Number of authentic users misclassified as attack ones is defined as False Positive (FP). Number of APs misclassified as authentic users is defined as False Negative (FN).

4.2 Performance Comparison

To compare the precision, recall and f1-score values of Ward_pHMM and UD-HMM methods, experiments have been performed on the Amazon-ratings sampled dataset with different attack size and filler size.

Figure 3 and figure 4 present effect of attack size on precision when the filer size is 3% and 5% respectively. The precision values are captured for different Filler Size (FS) and Attack Size (AS) under obfuscated attack based on standard average attack. When filler size is set to 3%, precision value of Ward_pHMM ranges from 0.8 to 0.9. In case of UD-HMM it ranges from 0.34 to 0.41. On the other hand, when filler size is set to 5% precision value of Ward_pHMM is between 0.85 to 0.91. In case of UD-HMM, it ranges between 0.19 and 0.29. So, this indicates that the proposed method detects the attack users more correctly compared to the UDHMM method.

Fig. 3 Effect of attack size on precision when filter size is 3%

Fig. 4 Effect of attack size on precision when filter size is 5%

Figure 5 and figure 6 present the effect of attack size on recall value when the filler size is 3% and 5% respectively. The recall values are recorded for various filler size and attack size under obfuscated attack on the Amazon-ratings sampled dataset. When filter size is set to 3% recall value of Ward_pHMM ranges from 0.88 to 0.93. In case of UD-HMM it ranges from 0.37 to 0.95. However, the highest value of UD-HMM is slightly more than that of Ward_pHMM but in most of the cases Ward_pHMM performs better than UD-HMM. On the other hand, when filter size is set to 5% recall value of Ward_pHMM is between 0.86 and 0.94.

Fig. 5 Effect of attack size on recall when filter size is 3%

Fig. 6 Effect of attack size on recall when filter size is 5%

In case of UD-HMM, it ranges between 0.55 and 0.94. Overall, recall value of Ward_pHMM method is higher than the existing UD-HMM method.

This signifies that proposed method's detection performance is better than the UDHMM method.

Figure 7 and 8 presents the effect of attack size on F1-score when the filter size is 3% and 5% respectively. When the filter size is 3%, F1-score of the proposed Ward_pHMM method is in the range of 0.84 and 0.91.

Fig. 7 Effect of attack size on F1-score when filter size is 3%

Fig. 8 Effect of attack size on F1-score when filter size is 5%

In case of UD-HMM it is from 0.35 to 0.52. On the other hand, when the filter size is 5% F1-score of Ward_pHMM method varies between 0.85 and 0.92. In case of UD-HMM it varies from 0.35 to 0.43. The Ward_pHMM method has higher F1-score value than the existing UD-HMM method. So, Ward_pHMM method detects the genuine users and attack users more accurately than the existing UD-HMM method. This signifies that with respect to detection performance proposed method outperforms UD-HMM method

5 Conclusions and Future Work

CFRS is a very efficient way for handling the problem of information overloading. However, CFRSs are very much vulnerable to numerous shilling attacks due to insertion of variety of malicious user profiles in the system.

These malicious user profiles affect the user recommendations. For addressing this issue in this paper, a shilling attack detection technique named Ward_pHMM has been proposed. For overcoming the problem of Ward method during clustering, the proposed scheme uses Ward_p method. Performance of the Ward_pHMM method has been analyzed using the Amazon-ratings sample dataset.

It has been observed that Ward_pHMM method outperforms UD-HMM method with respect to precision, recall and F1-score. Ward_p method has still some scope for improvement. The Ward_p method requires the calculation of centroids which make the proposed technique considerably computation intensive. Therefore, development of light weight shilling attack detection technique remains as future work.

Acknowledgement

The authors are thankful to Mobile Computing Laboratory, Department of Computer Science & Engineering, Tripura University for providing the necessary infrastructure.

References

1. Si, M., Li, Q. (2020). Shilling attacks against collaborative recommender systems: A review. Artificial Intelligence Review, Vol. 53, No. 1, pp. 291–319. DOI: 10.1007/s10462-018-9655-x. [ Links ]

2. Wei, R., Shen, H. (2016). An improved collaborative filtering recommendation algorithm against shilling attacks. Proceedings of 17^th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 330–335. DOI: 10.1109/PDCAT.2016.077. [ Links ]

3. Mobasher, B., Burke, R., Bhaumik, R., Williams, C. (2007). Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness. ACM Transactions on Internet Technology (TOIT), Vol. 7, No. 4. DOI: 10.1145/1278366.1278372. [ Links ]

4. Zhang, F., Zhang, Z., Zhang, P., Wang, S. (2018). UD-HMM: An unsupervised method for shilling attack detection based on hidden Markov model and hierarchical clustering. Knowledge-Based Systems, Vol. 148, pp. 146–166. DOI: 10.1016/j.knosys.2018.02.032. [ Links ]

5. Cordeiro de Amorim, R. (2015). Feature relevance in ward’s hierarchical clustering using the Lp norm. Journal of Classification, Vol. 32, pp. 46–62. DOI: 10.1007/s00357-015-9167-1. [ Links ]

6. Lam, S.K., Riedl, J. (2004). Shilling recommender systems for fun and profit. Proceedings of the 13^th International Conference on World Wide Web, pp. 393–402. DOI: 10.1145/988672.988726. [ Links ]

7. Gunes, I., Kaleli, C., Bilge, A., Polat, H. (2014). Shilling attacks against recommender systems: A comprehensive survey. Artificial Intelligence Review, Vol. 42, No. 4, pp. 767–799. DOI: 10.1007/s10462-012-9364-9. [ Links ]

8. Yang, Z., Xu, L., Cai, Z., Xu, Z. (2016). Re-scale AdaBoost for attack detection in collaborative filtering recommender systems. Knowledge-Based Systems, Vol. 100, pp. 74–88. DOI: 10.1016/j.knosys.2016.02.008. [ Links ]

9. Burke, R., Mobasher, B., Williams, C., Bhaumik, R. (2006). Detecting profile injection attacks in collaborative recommender systems. Proceedings of 8^th IEEE International Conference on E-Commerce Technology and 3^rd IEEE International Conference on Enterprise Computing, E-Commerce, and E-Services (CEC/EEE'06). DOI: 10.1109/CEC-EEE.2006.34. [ Links ]

10. Bryan, K., O'Mahony, M., Cunningham, P. (2008). Unsupervised retrieval of attack profiles in collaborative recommender systems. Proceedings of the 2008 ACM Conference on Recommender Systems, pp. 155–162. DOI: 10.1145/1454008.1454034. [ Links ]

11. Lee, J.S., Zhu, D. (2012). Shilling attack detection—A new approach for a trustworthy recommender system. INFORMS Journal on Computing, Vol. 24, No. 1, pp. 117–131. DOI: 10.1287/ijoc.1100.0440. [ Links ]

12. Yang, Z., Cai, Z., Guan, X. (2016). Estimating user behavior toward detecting anomalous ratings in rating systems. Knowledge-Based Systems, Vol. 111, pp. 144–158. DOI: 10.1016/j.knosys.2016.08.011. [ Links ]

13. Zhang, L., Wu, Z., Cao, J. (2017). Detecting spammer groups from product reviews: A partially supervised learning model. IEEE Access, Vol. 6, pp. 2559–2568. DOI: 10.1109/ACCESS.2017.2784370. [ Links ]

14. Wu, Z., Wang, Y., Wang, Y., Wu, J., Cao, J., Zhang, L. (2015). Spammers detection from product reviews: a hybrid model. Proceedings of IEEE International Conference on Data Mining. pp. 1039–1044. DOI: 10.1109/ICDM. 2015.73. [ Links ]

15. Cao, J., Wu, Z., Mao, B., Zhang, Y. (2013). Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system. World Wide Web, Vol. 16, pp. 729–748. DOI: 10.1007/s11280-012-0164-6. [ Links ]

16. Wu, Z., Cao, J., Mao, B., Wang, Y. (2011). Semi-SAD: applying semi-supervised learning to shilling attack detection. In Proceedings of Fifth ACM Conference on Recommender Systems. pp. 289–292. DOI: 10.1145/2043932.2043985. [ Links ]

17. Rabiner, L.R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, Vol. 77, No. 2, pp. 257–286. DOI: 10.1109/5.18626. [ Links ]

18. Chan, E.Y., Ching, W.K., Ng, M.K., Huang, J.Z. (2004). An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognition, Vol. 37, No. 5, pp. 943–952. DOI: 10.1016/j.patcog.2003.11.003. [ Links ]

19. Huang, J.Z., Ng, M.K., Rong, H., Li, Z. (2005). Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 5, pp. 657–668. DOI: 10.1109/TPAMI. 2005.95. [ Links ]

20Kaggle (2021). https://www.kaggle.com/skillsmuggler/amazon-ratings/data. [ Links ]

21. Hurley, N., Cheng, Z., Zhang, M. (2009). Statistical attack detection. Proceedings of Third ACM Conference on Recommender systems, pp. 149–156. DOI: 10.1145/1639714.1639740. [ Links ]

Received: March 07, 2021; Accepted: June 07, 2021

^* Corresponding author: Abhishek Majumder, e-mail: abhi2012@gmail.com

This is an open-access article distributed under the terms of the Creative Commons Attribution License