1 Introduction
Recommender systems aim to suggest the items a customer might like based on the information about his/her preferences and ratings. Recommendation system can be viewed as an extension to association/pattern mining. It has been observed if an item B is associated with item A then whenever any user buys item A, he is recommended item B and vice-versa [1, 2].
Recommendation systems are useful for both buyers and sellers since they reduce buyer’s effort and increase sales. These systems are put to use in many fields like e-commerce websites, news filtering, web searches, online dating, social networking sites [3, 4, 5]. Movie recommendation or movie rating prediction is a popular use-case of recommender systems [6, 7]. It is analyzed that the state-of-the-art methods are slow, non-scalable and their achieved accuracy needs improvement. In this paper, a fast and scalable method to generate recommendations is proposed which is optimized by improved firefly algorithm.
The traditional methods of clustering like k-means algorithm are slow, so firefly optimization algorithm is used to create clusters. This firefly clustering algorithm is made scalable and parallelized by utilizing Apache Spark tool.
Firefly is a population based algorithm which has some additional advantages as compared to single point search algorithms. Some of the most important fields of its application are optimization of dynamic and noisy environment and constraints, combinatorial and multi-objective optimization. Apart from the field of optimization it is also capable of solving classification problems that we come across in the fields of neural network, data mining and machine learning. Clustering techniques are used to group similar items or objects together based on unsupervised learning. In this paper, the data set is divided based on random cluster heads and then the cluster-heads are re-calculated iteratively for optimal use in Firefly algorithm. The detailed working methodology and mathematical foundation is given in section 3.
The remainder of the paper is organized as follows. Section 2 surveys the various recommender algorithms. Section 3 introduces the vanilla version of firefly algorithm. Section 4 explains the working of proposed algorithm, improved firefly algorithm to generate optimal recommendations. Section 5 provides experimental results and analysis. Section 6 presents the conclusion along with future directions.
2 Recommender Algorithms
Recommender algorithms fall into three categories [8, 9]: content-based, collaborative and hybrid as shown in Figure-1. Collaborative filtering is based on the concept of user-ratings, where ratings given to the products by every user are stored, and for a user X the persons who have similar rating pattern are identified, and those products are recommended which were given high ratings by this identified group of people.
The recommender systems in addition to collaborative filtering also has approaches based on content-based methods on information retrieval, Bayesian inference, and case-based reasoning methods [10, 11]. These methods take the actual content or attributes of the items to make recommendation (instead of or in addition to patterns with user rating). Content-based algorithms recommend to a customer those items which are similar to items that the same customer has bought or searched in the past. Hybrid recommender systems [12] have also emerged as a recommendation technique combining content-based and collaborative algorithms into composite systems that build on the strengths of their algorithmic components.
Content-Based Filtering systems recommend an item based on the contents of that item. If a user has previously searched for, or looked at some items with attribute ‘A’ then more items with attribute A will be recommended. Thus recommendations are made by comparing the contents of an item with the profile of the target user. The profile of a user is built from his history of interaction with the system by modeling the user’s preferences. The attributes can be assigned automatically or manually. The attributes have to be represented such that the user profile and the items can be compared to extract meaningful relations. A learning algorithm which can create the user profile based on items bought/viewed is also needed [13].
Collaborative Filtering (CF) is a process in which ratings are obtained from the users and recommendations to a new user are given based on opinions of other users with similar taste. The items that are recommended to a user are based upon his/her similarity to other users. For example, if two users X and Y have shown similar preferences in the past then the items which are liked by X in the future will be recommended to Y and vice versa. So basically, this algorithm assumes that if some user A has the same view as user B on an issue then A is more likely to have the same view as B on any other issue as well [14]. Collaborative Filtering based approach can be further divided into two categories: Memory-based and Model-based.
Memory-Based approach is a simple approach which makes use of a similarity measure to find users/items which are related to the active user or to the items bought/viewed by the active user. The methods that can be used to find out the similarity between two users are Euclidean distance, cosine similarity, correlation, etc.
User-based approach: User–user collaborative filtering was the first of the automated CF approach. It was first introduced in the GroupLens Usenet article recommender [15]. Then those items which have been highly rated by most of these users are identified and recommended to the active user [16, 17, 18].
Item-based approach: Item-based collaborative filtering models the ratings item-wise and not user-wise. An item-item matrix is built to determine relationship between every pair of items. A similarity measure like correlation is used to build this matrix. When a customer rates an item A then this matrix is looked up to find the items which have highest similarity with A in the matrix. Slope [19, 20]
Model-Based approach is basically dependent upon machine learning, data mining algorithms to make predictions. In this approach the aim is not to find most similar users/items, but to develop a model to classify the user and recommend highly rated items of other users belonging to the same class. The machine learning algorithms used in model-based approach are clustering, Bayesian networks, singular value decomposition (SVD), etc. This model has advantage over memory-based approach in the sense that it provides faster recommendations, handles sparsity better than memory-based ones, scales with dataset and has better prediction performance.
Many recommender systems combine the memory-based and model-based collaborative filtering algorithms which can be called hybrid collaborative filtering. This type overcomes the limitations of both the other types but increases complexity and is expensive. The hybrid approach can be used to overcome some of the common problems that occur when either of the other two approaches is used independently. It has been observed that the hybrid approach provides more accurate recommendations than either of the two approaches [21]. A popular example of hybrid approach is content-boosted collaborative filtering [22]. Apart from these popular techniques, there are some other recommendation techniques as well.
Knowledge-Based Recommenders. These recommender systems area specific kind of recommender systems that are based on prior knowledge about all the items that are available and also knowledge about user preferences [23].
Demographic recommenders. As the name suggests, these recommender systems provide recommendations based on a demographic profile of the user. The ratings given by users in a particular demographic section are used to provide recommendations to a user of that particular section. Here are also few problems which are encountered by recommender systems like cold start, sparsity, trust and privacy [24].
3 Firefly Algorithm
Firefly algorithm developed by Yang in 2008 [25] is a meta-heuristic algorithm used to solve optimization problems. Firefly algorithm is among those stochastic algorithms which follow randomization approach to search the solution in the data set. In this section, the biological, mathematical foundation and behavior of firefly is presented. It also explains the intuition and foundation of clustering with firefly algorithm.
3.1 Biological Foundation and Behavior
Fireflies are distinguished by their flashing light which is produced by a biochemical process also known as bioluminescence [26, 27, 28] . The rhythmic flashes are used as signals for mating [29, 30]. Apart from attracting the mating partners, these bright lights are used as warning signals from potential predators.
Firefly algorithm produced use the following assumptions [25]:
− The brightness of the firefly corresponds to the objective function.
− Each firefly is attracted to all other fireflies as they are unisex.
− A brighter firefly is more attractive, and a less bright firefly will move towards a firefly which is brighter. The attractiveness/brightness decreases as the distance between the fireflies’ increases.
3.2 Mathematical Formulation
The light intensity I(r) varies according to the inverse square law:
where,
The light intensity I varies with distance
where,
The Eq-1 and Eq-2 can be combined to give the following equation:
Taking the above equations into consideration, the attractiveness
where
In the real time environment, the attractiveness function of the firefly i.e.
The Cartesian distance (r) is the distance between any two random fireflies i and j at location
where,
In 2-dimensional case, we have:
Firefly i is attracted to another more attractive firefly j according to equation (8):
The value of the parameters
The pseudo-code for firefly algorithm is given below in Figure-2.
The above algorithm is only for exploitation part (finding the local best solution). For exploration part (to find global solution), we make use of the Levy flight instead of the traditional method. To make the process of exploitation faster, the less bright fireflies are moved towards brightest firefly only instead of all the brighter fireflies [31, 32].
3.3 Clustering Using Firefly Algorithm
In this section, the main aim is to calculate cluster heads by minimizing the sum of calculated distances of the patterns with their cluster heads [34, 35]. The function to be minimized during clustering process can be described as given in Eq-9:
where M is the no. of clusters, ck is the cluster head of kth cluster, and xi is a data point belonging to the cluster.
The cluster head of a cluster is the centroid of the cluster. The centroid of a cluster with n points can be calculated by Eq-10:
where nk is number of points in the kth cluster.
By performing clustering, we can divide a dataset into different groups based on some similarity measures. Most widely used similarity measures are based on distance calculation between the dataset and the cluster heads [36].
The cluster heads are calculated by minimizing the Euclidean distance between each data instance Xi and the cluster center ck. The cost function for the pattern i is given by Eq-11:
where D is the count of data instances, and
4 Proposed Firefly Recommendation System (FRS)
The proposed Firefly Recommendation System i.e. (FRS) works in two phases which includes training phase and recommendation phase. Phase I is an offline process in which rating matrix is produced from the collected data and clusters are obtained using firefly clustering algorithm. Phase II is a real-time process in which the recommendations for current user are generated.
In this phase, the active user is assigned a recommendation cluster and recommendations are generated.
Phase I: Training Phase
The movie-lens dataset has 100,000 ratings of 943 users on 1682 movies. The movies are classified into 19 genres viz. action, comedy, horror, etc. The dataset is divided into two parts, 80% as training data and 20% as test data.
The data is converted into a 943X1682 matrix. The dataset need not be normalized as the ratings are in the scale of 1-5. However, the dataset is sparse (only 100,000 ratings out of possible 1,586,126 available), so we need to replace the missing values by 0.
The rating matrix is divided into K clusters using firefly clustering technique. N fireflies are initially generated, each having K cluster-heads.
Each cluster-head has 1682 dimensions having values in the range 1-5, which are generated at random.
For each firefly, K clusters are generated by assigning each point in the dataset to the nearest cluster-head in the firefly (similarity measure used is Euclidean distance), and the WCSS (within-cluster sum of squares) is calculated.
The firefly with the lowest WCSS is considered to be the brightest firefly, and the less bright fireflies are moved towards the brightest fireflies.
The brightest firefly is also moved at random to a position which further increases the intensity of the brightest firefly.
This process is repeated to certain number of iterations, and the fittest firefly after all these iterations is considered to be the final solution (clusters).
Phase-II: Process of recommendation for active users
To generate recommendations for an active user, a cluster (among k-clusters) is to be selected. A simple approach is to select the cluster whose centroid has highest similarity with the active user e.g. the centroid with lowest Euclidean distance with the active user. If there are large numbers of clusters, then multiple clusters can also be used for better results. In such a case, the probability that a cluster is chosen for generating recommendations is given by Eq-12:
where,
where
The recommendations are provided from the cluster with highest probability or from multiple clusters which lie in particular probability range. The latter approach may provide the active user recommendations which are different and make him interested in trying something new.
After selecting the clusters for recommendations, next step is to predict the ratings for un-rated items of the active user and recommending the items whose predicted value is high. If there is only one chosen cluster, then the values of unrated items is simply the average of the ratings given for corresponding item by all the users in the cluster.
But if multiple clusters have been selected then we also consider the quality of ratings in each chosen cluster. A criterion of the rating quality of a cluster is the number of ratings available to each item in the cluster, higher the density of ratings better the quality of the cluster:
where,
5 Experimental Results and Analysis
For performance analysis of our recommendation system framework we calculate various metrics like MAE, SD, RMSE and t-value. Various graphs and tables of the calculated results are shown for better understanding of the framework.
MAE: Mean Absolute Error
We calculated mean absolute error on the dataset of movielens dataset by using Eq-15:
where, M is no. of movies in the dataset, pij is predicted value for i user on j items, and tij is true rating.
The results are shown in Table 4 for the calculated MAE for different values of K. The outcome as observed from this table is that as we increase the number of clusters, MAE values gradually decrease.
Movie | Movie | … | Movie | |
#1 | #2 | #1682 | ||
User#1 | 5 | 3 | … | 0 |
User#2 | 4 | 0 | … | 0 |
. | . | . | … | . |
. | . | . | . | |
. | . | . | . | |
User#943 | 0 | 5 | … | 0 |
Movie | Movie | … | Movie | ||
#1 | #2 | #1682 | |||
Firefly#1 | Cluster-Head #1 | 2 | 3 | … | 3 |
Cluster-Head #2 | 3 | 1 | … | 5 | |
Cluster-Head #3 | 4 | 1 | … | 2 | |
… | … | … | … | … | |
. | . | . | . | . | . |
Firefly#20 | Cluster-Head #1 | 5 | 1 | … | 1 |
Cluster-Head #2 | 1 | 2 | … | 2 | |
Cluster-Head #3 | 2 | 4 | … | 4 |
SD: Standard Deviation
By using Eq-16, we calculate SD on movie lens dataset:
The results of SD with different cluster count are shown in Table-4. The outcome of this calculated metrics is that as the number of cluster increases their SD value decreases.
RMSE: Root Mean Square Error
We calculated RMSE on movie lens dataset by using Eq-17:
where,
The results after calculation of RMSE for different cluster count are represented in Table-4. It is observed that the RMSE value gradually decreases as we increase the number of clusters like other metrics like MAE and SD.
t-value
This t-value basically depends on the values of mean obtained for different clusters and their calculated SD values. We calculate t-value (for significance level of 5%) of the dataset by using Eq 18:
Similar to the other matrices, t-value also decreases for the same reason as mentioned above. Results are shown in Table 4.
The performance of proposed firefly-based recommendation system was also compared with the other popular clustering-based techniques like k-means, PSO (Particle Swarm Optimization), and ACO (Ant Colony Optimization), Bat algorithm, Cuckoo search. All the algorithms were run for 100 iterations. The performance of firefly-based recommendation was slightly better than all other techniques as can be seen in the Table 5 and Figure-4.
k-means | PSO | ACO | Bat | Cuckoo | Firefly | |
MAE | 0.69 | 0.7 | 0.7 | 0.67 | 0.71 | 0.58 |
SD | 0.113 | 0.113 | 0.112 | 0.107 | 0.114 | 0.102 |
RMSE | 1.23 | 1.23 | 1.22 | 1.19 | 1.23 | 1.15 |
t-value | 2.81 | 2.81 | 2.81 | 2.76 | 2.81 | 2.75 |
Precision | 0.53 | 0.52 | 0.51 | 0.54 | 0.52 | 0.58 |
Recall | 0.43 | 0.41 | 0.41 | 0.44 | 0.41 | 0.47 |
6 Conclusion and Future Work
This paper proposed an improved firefly meta-heuristic based clustering approach for recommendation systems. A clustering based recommender system should be able to generate optimal clusters, hence firefly algorithm was utilized. The original firefly algorithm has been improved by making it faster by moving the less bright firefly towards only the brightest firefly instead of all the brighter fireflies.
For exploration, Levy flight has been used instead of random function. For fast results, the algorithm is parallelized using map-reduce to enable it to be executed in a scalable environment.
The performance of the proposed approach is evaluated using various metrics and the results indicate that the approach generates highly relevant recommendations. In the future work, other swarm optimization methods like whale optimization, shark smell optimization, etc. can be utilized.
The optimization methods other than swarm optimization like neural networks can also be utilized. There are various ways in which the recommendations from optimal clusters can be generated, these also can be explored.