1 Introduction
Service-Oriented-Architecture (SOA) and its primary implementation technology, Web Services (WS), have revolutionized the process of designing and developing corporate applications for software suppliers. Functioning as fundamental units that can transmit and modify data, they interconnect to generate novel composite value-enhanced services that may be accessed as needed.
The most crucial step of the Web services’ consumption cycle was and remains service discovery. This is because of their sheer number, which grows exponentially, as well as the fact that better services or mashups will be produced if we can make a good discovery. Research on this subject can be grouped into three (03) categories: syntactic, semantic, and social. The first one covers syntactic techniques for measuring the degree of services similarity that is mostly based on WSDL descriptions, which are published in and processed, word-for-word, by a UDDI directory [21]. Semantic techniques fall within the second category. Several formalisms have been suggested in this context to incorporate a semantic aspect, starting with the straightforward annotation of WSDL descriptions (WSDL-S) [1], moving on to the proposal of a high-level WS ontology (OWL-S) [14], and finally giving birth to a new conceptual model (WSMO) [18]. The third category includes methods that propose a model based on a social network of web services to intercept and exploit the history of their interactions in order to improve their discovery [13].
To reduce complexity, several studies have been conducted to address this issue, and research has demonstrated that Web service classification/clustering is the optimal approach for not just the discovery process but also for recommendation, selection, and composition. [22, 4, 24, 20]. In this work, we will propose an approach for the classification of web services that falls under the semantic category but without resorting to a domain ontology.
This is not specific to our work, but it is associated with the remarkable success of Deep Learning methods and applications. More precisely, word embedding techniques like Word2Vec [19], Glove [17], or even BERT [3], which are great ways to capture semantics without using ontologies. These latter have considerably slowed down the development of proposals in the field given the extensive work involved in developing high-level ontologies, annotating Web services, and performing related inference processes.
In Our approach, we started by analyzing the textual descriptions of web services in order to extract, with an algorithm, what we will qualify as service actions. Then, we extended and exploited a selection of stat-of-art text classifiers and accentuated their learning with these actions. Finally, we conducted extensive experiments on more than 8,400 real-world web services from ProgrammableWebfn to evaluate the effectiveness of our proposal, and we have unequivocally demonstrated that our suggested technique can get a higher level of precision in classification and outperform the most advanced methods currently available. The subsequent sections of this work are structured in the following manner.
Section 2 provides a comprehensive review of relevant research, with a specific emphasis on the application of Deep Learning methods for the classification of Web services. Section 3 is specifically devoted to showcasing our methodology. Section 4 provides a comprehensive examination and interpretation of the experimental findings. Section 5 serves as the final section of the report, providing a conclusion and addressing potential future research.
2 Related Work
The emergence of web APIs has captured the attention of the academic community and has witnessed substantial engagement. The main sources of data for researching Web services categorization using Deep Learning techniques are Web services description documents and data collected from their environment [26].
For instance, in [28] Cao et al. proposed a Wide and Bi-LSTM model combining all the discrete features in the description documents of Web services and performing the breadth prediction of Web service category to automatically extract the most pertinent semantic information from a Web service document.
Then, a Bi-LSTM architecture uses a topical attention-mechanism to mine the word order and context information of the words in the Web service description documents in order to do Web service classification prediction. In [32], the authors provide a DeepWSC framework for web service clustering that is heuristics-based.
In order to cluster web services effectively, within this method, a signed graph convolutional network is used to extract service composability characteristics from service invocation associations, and an upgraded recurrent convolutional neural network (RCNN) [10] is used to extract deep semantic features from service descriptions.
This is an extension of paper [31], the key difference is that the new DeepWSC utilizes the composability characteristics of services, which are then inputted into the deep neural network with the deep semantic features of web services. These features are combined using an approach to generate integrated implicit properties of web services.
The paper [26] introduces ServeNet, a deep neural network that can automatically extract high-level features from service names and descriptions without the need for feature engineering or length restrictions. ServeNet is capable of predicting the classification of services into 50 different categories.
In their study, Kang et al. [9] utilize the attention mechanism to combine the local implicit state vector of a Bidirectional Long Short-Term Memory Network (BiLSTM) with the global Hierarchical Dirichlet Process (HDP) topic vector. They propose a topical attention-based BiLSTM technique for classifying Web services.
The BiLSTM model is designed for the purpose of automatically acquiring the keyword feature representations of Web services. The topic vectors of Web service documents are acquired using HDP during offline training.
A topic attention technique is employed to improve the feature representation by discerning the significance or weight of different keywords in Web service documents.
An attention-based BiLSTM model is also coupled with Information Gain theory to present a Web service classification in [29]. The suggested technique focuses on intricate elements inherent in Web services, such as the significance of various words and the sequential semantic connections between words, through the utilization of IG theory and the attention-based BiLSTM model.
The authors of [23] have developed a new deep neural network that combines a Graph-Convolutional-Network (GCN) and a Bidirectional-Long-Short-Term-Memory (Bi-LSTM) network. This network aims to automatically extract function-description-documents by capturing different relationships in graphs and exploiting them. The GCN is used to extract global spatial features, while the Bi-LSTM network learns sequential features.
Also, Peng et al. [16] present a graph attention network-based Web services classification approach. To begin, it capitalizes on description documents, Web service tags, and the call relationship between mashups and services to create a service relationship network based on Web service composition and shared annotations. The self-attention mechanism then determines the attention coefficient of each service node in the network, and different service nodes in nearby areas are allocated different weights to classify Web services. The graph attention network combines a Web service’s content characteristics with structure information.
X Yong et al. introduce LDNM [25], a comprehensive framework for classifying web services. This framework utilizes a deep fusion technique to combine structured and unstructured characteristics. The initial step included transforming each service document into a feature vector using two different document representation techniques: topic distribution based on LDA (Latent Dirichlet Allocation) [2] and Doc2vec [11], which is a document embedding model based on neural networks.
Then, using Node2vec [6], they obtained structured representation vectors extracted from service invoking and tagging graphs. Finally, they employ an MLP neural network to fuse these features and train a service classifier.
The methods we just discussed attempt to classify web services based on their textual descriptions or by combining them with other features. When we looked more closely at these methods, we discovered that they treat these descriptions as a collection of words rather than taking into account and differentiating between significant words and their placements in these descriptions.
Original Tags | Stem Tags |
(‘Service’, ‘proper noun’) | (‘servic’, ‘adjective’) |
(‘used’, ‘verb’) | (‘use’, ‘adjective’) |
(‘validate’, ‘verb’) | (‘valid’, ‘adjective’) |
Furthermore, some techniques simply use the stems of words, which can distort the meaning. In this light, we extracted the actions of web services from their textual descriptions using an algorithm, which we will describe in more detail in the following section.
Then, a number of models from the ”Text Classification” [8, 10, 3, 30, 12] domain are extended and adjusted to better fit the classification of these services using these actions as additional features.
3 The Classification Process of the Proposed Approach
The process of our approach, which we named DeepLAB-WSC (Deep Learning Actions Based Web Service Classification), is illustrated in Figure 1. It consists of four (04) steps.
3.1 Step 1: Pre-processing of Web Services Descriptions
Web Services are characterized by their capacity to execute actions and perform tasks. Nevertheless, in the majority of situations, when it comes to presenting information about them, we can only offer a concise written explanation.
In order to effectively identify them using this method, the key question to ask is: “Which specific words or phrases in these descriptions accurately convey the essence of a service?” Initially, we believed that when we encounter a text, the acts described within it are inherently represented by ‘verbs’. We began the process by extracting all possible verbs from web service descriptions using pre-existing natural language models that had been trained beforehand.
At this juncture, we encountered two predicaments: The primary concern is that these models lack the ability to precisely identify all verbs. In the statement “Service X searches for all worldwide airlines that operate in a given country”, the term ‘searches’ is identified as a noun. If we isolate each word and disregard its context, what would happen?
The technique becomes more challenging due to the loss of start tags for many words throughout the text cleaning and pre-processing steps, especially after the Stemming step. Consequently, verbs are often misidentified as nouns, adjectives, and so on. Take into account the statement that follows: The words ‘Service’, ‘used’, and ‘validate’ in the statement “Service X is used to validate monetary transactions.” are marked as follows:
In most cases, this has the effect of changing the semantic meaning of each word and, as a result, the overall meaning of the description. Because of this, we attempted to overcome this in our situation from the very beginning of our approach. That’s why we begin by softly pre-processing the service descriptions in order to retain as many words as possible with their original tags.
3.2 Step 2: Obtain Actions Via Web Service Descriptions
Another problem we faced was the scarcity of verbs in the majority of brief descriptions, making it challenging to utilize these models effectively. In order to tackle this issue, we employed the ‘Extract-actions’ algorithm at this phase to extract activities as per our definition where xcomp denotes an open-clausal-complement and dobj symbolizes the direct object. The algorithm is demonstrated in the following examples:
3.3 Step 3: Actions and Descriptions Embedding
Of the five (05) classification models we trained (see Section 3.3), we used two methods for the Word embedding process: Glove and BERT.
-
– GloVe (Global-Vectors for word representation) [17] is an unsupervised learning method developed by Stanford University researchers with the goal of generating Word Embeddings by aggregating global word co-occurrence matrices from a given corpus.
We used the variant with a length equal to 100 and trained on 6B tokens including 400K vocab. This variant is used on the WSC2RCNN and TextING-Based [30]classification models.
-
– BERT(Bidirectional-Encoder-Representations from Transformers) [3] is a state-of-the-art language representation model.
It is a huge deep bidirectional encoder-based transformer model that has been pre-trained on more than 110 million parameters. BERT is used as the word embedding model for our BERT and BERT-GCN based [12] classification models.
3.4 Step 4: Classification based on Services’ Actions and Descriptions Embedding
Our approach’s learning process consists of training our classifiers using embedding representations of web service descriptions, then attempting to accentuate their learning with the actions extracted in Step 2 and embedded in Step 3. After making a connection between the work seen in Section 2 and the results obtained by [15] and [5], the tested models are:
3.4.1 WSC2RCNN
In our previous work [7], we used a Web Service Classifier with two (02) RCNN deep neural networks. We gave, pre-trained GloVe model, representations for both descriptions and collected actions into a pair of Recurrent-Convolutional-Neural Networks (RCNN) that would be trained.
For both networks, we employed the original version of RCNN [10], which seeks to capture textual semantics by considering sequence word order. This approach utilizes both Recurrent Neural Networks (RNN) to understand the local context of tokens and Convolutional Neural Networks (CNN) to capture long-term dependencies. The left and right contexts of a word wi, which we determined using the following equations, were represented by cl(wi) and cr(wi), respectively, for the first RCNN network:
where the activation function
Equation (3) is used to obtain the latent semantic representation of the
where a word
A similar calculation is made for
A softmax activation layer is used to normalize the final output, M_output, because we have a multi-class issue. We present our model in Figure 2.
3.4.2 TextING Classifier
We conducted experiments on WS descriptions using TextING. The authors of [30] provide a novel graph neural network for text categorization. In this network, each document is treated as a separate graph, allowing for the learning of word interactions at the text level. In this process, distinct graphs are generated for each text, and subsequently, Graph Neural Networks (GNN) are employed to acquire detailed word representations by considering their local structures. Additionally, GNN has the capability to produce embeddings for words that have not been seen before in the new document. Ultimately, the word nodes are included into the document embedding.
3.4.3 BERT Classifier
A fine tuned BERT is used and a number of layers are added to elaborate a BERT Web Service classifier, shown in figure 4. The model used is bert-base-uncased, which consists of 12 layers, 768 hidden units, 12 attention heads, and a total of 110 million parameters, and a softmax activation layer to normalize the final output.
3.4.4 BERT_With_Actions Classifier
We made multiple attempts to incorporate the extracted actions efficiently while trying to increase the performance of the “BERT Classifier” and validate our approach. The solution found consists of concatenating the sentences of the web service descriptions with the actions repeated “Rep” times while not exceeding the BERT variant’s maximum sequence length (See figure 5):
3.4.5 Classifiers based on BERT-GCN
We also put Yuxiao Lin et al.’s proposition (BERT-GCN) to the test. In [12], BERT with his large-scale pretraining, and GCN [27] with his transductive learning, are trained together.
First, a graph with both word nodes and document nodes is built. The weight of an edge that connects two nodes
The word-document edges and word-word edges are determined using the term frequency-inverse document frequency (TF-IDF) and positive point-wise mutual information (PPMI) measures, respectively.
Then, the GCN layers iteratively update document nodes after initializing them using BERT-style embeddings, and the main training aim is to do linear interpolation between the BertGCN prediction and the BERT prediction. This may be expressed as:
where
4 Experiments
4.1 Used Evaluation Metrics
The evaluation of selected models has been conducted using four widely used evaluation measures: Purity, NMI, Recall, and F1-measure. Purity is a metric used for guided cluster validation. It is calculated using the following formula:
where
where can we obtain the mutual information I using:
And the entropy H using:
Recall is a metric used to determine the accuracy of predicting the real class by measuring the proportion of properly predicted instances. It is computed as follows:
where TP represents the count of services properly allocated to their respective class, and FN represents the count of services where the model wrongly forecasts their positive class as negative. The last used measure is
The term “FP” represents the count of APIs where the model wrongly forecasts their negative classes as positive:
4.2 Experimental Environment
We performed tests to assess and showcase the efficiency of our methodology. All experiments were executed on a platform equipped with an Intel (R) Xeon (R) platinum 8259CL CPU@2.50GHz (32 cores) and 256GB RAM.
Utilizing the Datasetfn of [31, 32] from ProgrammableWeb, the leading online registry for Web services with a vast collection of over 23,000 APIs, we accessed a compilation of 17,923 authentic web services that were obtained by web crawling. The experimental data comprises 8,459 high-quality WS from the top 20 classes (refer to Figure 1).
4.3 Experimental Results and Discussions
We compared our results to those of [31, 32] in order to correctly interpret them. Table 2 summarizes all of the results obtained by the various studied methods. We started by putting in tests for RCNN, TextING 3.4.2, BERT 3.4.3, and BERT-GCN 3.4.5 classifiers. The results were satisfactory, starting with the RCNN classifier, which alone exceeded those of [32] on all metrics.
ID | Primary Cat | # of services | ID | Primary Cat | # of services |
0 | Tools | 887 | 10 | Telephony | 342 |
1 | Financial | 757 | 11 | Security | 312 |
2 | Messaging | 591 | 12 | Reference | 304 |
3 | eCommerce | 553 | 13 | 299 | |
4 | Payments | 553 | 14 | Search | 290 |
5 | Social | 510 | 15 | Travel | 294 |
6 | Enterprise | 509 | 16 | Video | 281 |
7 | Mapping | 429 | 17 | Education | 277 |
8 | Government | 371 | 18 | Advertising | 274 |
9 | Science | 357 | 19 | Transportation | 269 |
Models | Purity | NMI | Recall | F1-Score |
DeepWSC | 0.5708 | 0.4856 | 0.3821 | 0.3969 |
DeepWSC + Heuristics | 0.6379 | 0.5273 | 0.4186 | 0.4356 |
RCNN* | 0.6438 | 0.5704 | 0.6438 | 0.6247 |
WSC2RCNN | 0.6595 | 0.5795 | 0.6588 | 0.6512 |
TextING | 0.6887 | 0.5986 | 0.6864 | 0.6714 |
BERT | 0.7746 | 0.6924 | 0.7746 | 0.7712 |
BERT + Actions | 0.7785 | 0.6931 | 0.7786 | 0.7759 |
BERT-GCN | 0.7788 | 0.6935 | 0.7788 | 0.7717 |
BERT-GCN + Actions | 0.7810 | 0.6972 | 0.7811 | 0.7745 |
This is because the authors of [31] wanted to get closer to the semantics of WS descriptions by trying to capture the context of each word with an RCNN deep neural network. However, by applying strict stemming, the meaning of many words is lost (See Section 3.1).
This is what we could observe in their pre-processed dataset. This issue has been fixed and the results of our RCNN classifier exceeded the best of [32]’s by an average of 26.57% on all the evaluation metrics.
The first tested method of the GNN class (TextING) achieves an average improvement of 34.89%. On [32], BERT and BERT-GCN had average advantages of 53.7% and 54.2%, respectively. To complete the evaluation of our approach’s effectiveness, each of the proposed classifiers has gone through an extra training process in which we have attempted to include the actions retrieved by our “Extract-Actions” algorithm (See Section 3.2).
The results have demonstrated that the classifiers’ performance improves each time these actions are combined. Our WSC2RCNN Classifier outperforms our RCNN and presents an average advantage of 5.51% across all the evaluation metrics. Furthermore, our BERT+Actions classifier achieves an average advantage of 0.43% over the BERT classifier. Ending with BERT-GCN+Actions, which outperforms all classifiers seen in this work and shows a 0.37% improvement over BERT-GCN.
5 Conclusion and Future Research
This work introduces a novel method for classifying web services, named DeepLAB-WSC, that relies on the concise textual descriptions of the services. Our methodology stands out by prioritizing the activities executed by web services, which are derived from their descriptions, and use deep learning text classification techniques to categorize the services.
Our comparison trials have demonstrated that DeepLAB-WSC surpasses current cutting-edge techniques for classifying web services in terms of all performance parameters. Our technique has a key benefit in that it specifically targets the most crucial aspect of the description, namely the actions, in order to enhance the accuracy of categorization.
Looking ahead, we plan to extend this work by leveraging the power of BERT and GNN to build social networks of web services based on the similarities (BERT based classifiers) and complementarities (predicting the next sentence/document) of their descriptions. Making this social dimension profitable will allow usto enhance both the process of finding and combining online services.