Domain-Independent Intent Extraction from Online Texts

Luong, Thai-Le; Tran, Nhu-Thuat; Dang, Tien-Son; Tran, Quoc-Long; Phan, Xuan-Hieu; Luong, Thai-Le; Tran, Nhu-Thuat; Dang, Tien-Son; Tran, Quoc-Long; Phan, Xuan-Hieu

doi:10.13053/cys-24-1-3158

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.24 n.1 Ciudad de México Jan./Mar. 2020 Epub Sep 27, 2021

https://doi.org/10.13053/cys-24-1-3158

Articles

Domain-Independent Intent Extraction from Online Texts

Thai-Le Luong¹²^*

Nhu-Thuat Tran¹

Tien-Son Dang¹

Quoc-Long Tran¹

Xuan-Hieu Phan¹

^¹ VNU University of Engineering and Technology, Faculty of Information Technology, Vietnam. luongthaile80@utc.edu.vn, hieupx@vnu.edu.vn

^² University of Transport and Communications, Faculty of Information Technology, Vietnam

Abstract

Identifying user's intents from texts on online channels has a wide range of applications from entrepreneurship, banking to e-commerce. However, intent identification is not a simple task due to the intent and its attributes are various and strongly depend on the domain of data. If the number of intent domains increases, the number of intent's attributes will get bigger. As a result, the complexity of intent extraction task grows up significantly. Additionally, when a new domain comes, it involves considerable physical efforts to define specific labels for intent and attributes for that domain. Hence, it would be much better to come up with a new method for extracting user's intents which is not dependent on a specific domain. In our research, we study the problem of domain-independent intent identification from posts and comments crawled from social networks and discussion forums. We present ten general labels, i.e. labels do not depend on a specific domain, and utilize them when extracting intent and its related information. We also propose a map between general labels and domain-specific labels. We extensively conduct experiments to explore the efficiency of using general labels compared to specific labels in extracting user's intents when the number of intent domains increases. Our study is conducted on a medium-sized dataset from three selected domains: Tourism, Real Estate and Transportation. In term of accuracy, when the number of domains grows, our proposal achieves significantly better results than domain-specific method in identifying user's intent.

Keywords: Information extraction; intent identification; intent mining; domain-independent

1 Introduction

With the explosion of Internet, there have been more and more people having accounts on social networks and discussion forums. These accounts daily generate a huge amount of valuable data. For example, a user posted on webtretho.com: "Our family is going to Da Nang from 14/6 to 18/6, we have 5 adults and 1 child (1-year-old), could you recommend us the hotel, the best places to visit there and the total cost is about 20 million dong. Tks. Phone number: 0913 456 233". If a travel agent could analyze and extract user's intent, going, together with its related information: destination (Da Nang), agenda (from 14/6 to 18/6), number of people (5 adults and 1-year-old child), etc. from this post, they would give the inline advertising strategy to this user. Clearly, this advertising is very effective because it attends to the specific needs of users.

According to Luong et al.(2016) ^[¹⁹^], the process of fully understanding user's intent includes three major stages: user filtering, intent domain identification and intent parsing and extraction. However, this approach faces some challenges. Firstly, user's intent and its attributes greatly depend on domain they come from. For instance, in Transportation domain, it is more likely for users to share a post containing brand, price, color, model, etc. when they are going to buy a car. In Real Estate domain, number of floors, number of bedrooms, direction of façade and location are the most frequently mentioned attributes when people want to buy a house. It can be easily realized that the number of intent's attributes will increase dramatically when the number of intent domains increases. Thus, the complexity of intent identification grow sharply because of the growth of intent's attributes. Additionally, the stage of intent parsing and extraction is also hard to scale to other domains because it is domain-specific. When a new domain comes, it involves exhaustive work to define a new set of labels for intent and attributes for that domain.

To the best of our knowledge, there have been no researches that attempt to overcome the variety of intent's structure to achieve better result in extracting user's intent. "Intent extraction from social media texts using sequential segmentation and deep learning models" by Luong et al. (2017) ^[²¹^] is probably the most related study to our work. However, in this study, authors attempted to extract user's intent and related information from texts generated on social networks but this method is domain-specific and does not generalize to other domains.

In this paper, we aim at building an intent extraction method which can deal with the variety of intent's attributes and also scale up to data coming from various domains. The underlying idea of this work is that for intent extraction task, we define a set of ten general labels, i.e. labels do not depend a specific domain. This idea comes from the fact that some attributes, also called labels, such as intent, brand, contact, price, etc. frequently appear in posts from almost domains, and, therefore, we take them as general labels.

For other attributes which are only mentioned in posts from a specific domain, for example, color in Transportation domain, number of floors in Real Estate domain, time period in Tourism domain we group them as description label.

In our work, we will intensively conduct experiments to explore the efficiency of proposed general labels compared to specific labels in intent extraction task when the number of intent domains increases. We extract the labels using various machine learning methods. Notable methods among them are the state-of-the-art Conditional Random Fields (2003) ^[¹⁶^], Bidirectional Long Short-term Memory and Bidirectional Long Short-term Memory - Conditional Random Fields (2016) ^[¹⁷^].

Although we attempted to make our models be flexible, we still deal with some challenges. One of the most difficulty is the ambiguity of natural language. Such a challenge is presented in the post: "If anyone want to liquidate your own LX motorbike, please call me. My phone number is 0983256999".

The intent key word of this post is hidden and need some effort to deduce. While user needs to buy an old LX motorbike, the prediction model may extract the intent keyword is liquidate. Therefore, in the scope of this work, we only focus on extracting user's intent from posts containing explicit intents as described in ^[¹⁹^]. Overall, the main contributions of our study are:

— We proposed a domain-independent method for intent extraction task based on the set of general labels. We also proposed the map between general labels and specific labels for data from Tourism, Transportation and Real Estate domains;
— We built an annotated medium-sized dataset containing more than 8500 Vietnamese posts from social networks and discussion forums. These data can be used for later researches in Vietnamese intent identification;
— We conducted a comparative and efficient experiments with multiple powerful machine learning models to verify the efficiency of proposed general labels.

The remainder of our paper is organized as following. Section 2 reviews some of the most related studies to our work. In section 3, we present our domain-independent approach for intent extraction task. Section 4 shows the experimental results and some discussions. Finally, section 5 concludes some main points in our work.

2 Related Work

To the best of our knowledge, there have been no domain-independent approaches attempted to solve the problem of intent extraction. The most related study to ours is Luong et al. (2017) ^[²¹^]. In this research, authors approached the intent extraction task but in a domain-specific way, i.e. build the set of specific labels for two domains Real Estate and Cosmetics & Beauty and utilized these specific labels to extract user's intent. Another related study is the one using domain-adaptation approach by Xiao Ding (2015) ^[⁴^].

The authors used datasets from specific domains to identify the consumption intention. Then they attempted to transfer the CNN mid-level sentence representation learned from one domain to another domain by adding an adaptation layer. They also proposed method to extract intention words from sentences with consumption intent. Intention word refers to the word that can best indicate users' needs. In our work, in addition to intent keyword extraction, we also extract necessary information related to the intent.

Prior to recent intent extraction researches, most of studies on detecting user's intention are based on classification. In the early 2000s, researchers have tended to classify the user's intention into three pre-defined classes: navigation, information and transaction ^[²⁶^,¹³^,¹⁴^]. Those classes do not seem to be clear enough to reveal someone's intent. Besides, authors just only focused on analyzing the queries from search engines to understand the users' intentions. In 2013, Zhiyuan Chen et al. (2013) ^[³^] claimed that their solution was the first one that try to identify user intents in posts, i.e. the context of text documents from discussion forums.

After that time, researchers have drawn attention to online texts, and they also tried to make the intent identification more clearly. V.Gupta et al. (2014) ^[⁶^] attempted to identify only purchase intent from social posts by categorizing the posts into two classes namely PI (purchase intent) and non-PI. This has been done by extracting features at two different levels of text granularity -word and phrase based features and grammatical dependency based features.

Purohit et al. (2015) ^[²⁴^] attributed intentions to every day behaviors, from a user query issued a search engine to buying a laptop to a user participating in a conversation. The authors define the problem of mining "relevent social intent from an ambiguous, unconstrained natural language short-text document" as a classification task with 3 classes: seeking, offering, and none. Luong et al. (2016) ^[²⁰^] followed this approach to identify intent domain.

The authors utilized two classification models, Maximum Entropy Classifier (MEC) and Support Vector Machine (SVM), to classify the intent posts into 12 major domains such as electronic device, fashion & accessory, finance, food service, furnishing & grocery, travel & hotel, property, job & education, transportation, health & beauty, sport & entertainment, and pet & tree.

Recently years, supervised learning has shown a critical drawback that it requires vast amount of manually annotated training data. To overcome this drawback, recent studies focus on domain-adaptation, transfer learning and semi-supervised learning approaches. Such approaches have been successfully applied in user's intent detection. Z. Chen et al. (2013) ^[³^] leveraged labeled data from other domains to train a classifier for the target domain. They proposed a new transfer learning method to classify the posts into two classes: intent posts (positive class) and non-intent posts (negative class).

J. Wang et al. (2015) ^[²⁷^] proposed a graph-based semi-supervised approach to sort tweets into six intent categories, namely food & drink, travel, career & education, goods & services, event & activities and trifle. With effective information propagation via graph regularization, only a small set of tweets with category labels is needed as the supervised information. Ngo et al. (2017) ^[²²^] proposed a new method for intention detection, which leveraged labeled data in multi-source domains to improve performance of classification in the target domain. Specifically, they used stochastic gradient descent (SGD) to optimize the aggregation process of source and target data in a Naive Bayesian framework.

The method has shown positive improvement in intention detection task on the same benchmark dataset that used in ^[³^].

3 Domain-Independent Proposal for Intent Identification

3.1 Domain-Independent Intent Extraction

In the research presented by Luong et al. ^[¹⁹^], authors defined the user explicit intent as a 5-tuple:

Iue=u,c,d,w,p, (1)

in which:

— u is the user identifier on social media services;
— c is the current context or condition around this intent. For example, a user may currently be pregnant, sick, or having baby. Context c also includes the time at which the intent was expressed or posted on online;
— d is the intent domain such as Real-Estate, Finance, Tourism, etc;
— w is the intent name, i.e., a keyword or phrase representing the intent. It may be the name of a thing or an action of interest. For example, w can be rent (house), borrow (loan), or book (tour), etc;
— p is a list of properties or constraints associated with an intent. It is a list of property-value pairs related to the intent. For example, p can be {location="near Yen Pho industrial zone", acreage ="90-120m2", ...}.

And according to Luong et al. (2017) ^[²¹^], authors proposed a domain-specific intent extraction model, where they tried to extract w and p with the assumption that d had been identified. It means that the intent information can be extracted only if the domain of the intent have been known. However in real applications, the user's intents are very diverse, they may be want to buy a house", or are going to travel", and even need to borrow some money" etc.

And that leads to a large amount of intent domain types, such as Tourism, Transportation, Finance, Real Estate, Education... And as we mentioned above, the more the number of the intent domains are, the more complicated it is to extract the intent information. Firstly, one needs to identify the intent domain d. And secondly, for each domain d, one has to build a set of specific labels to identify necessary attributes. For example, one could arrive at 15 specific labels for Tourism, 18 specific labels for Real Estate and 17 specific labels for Transportation domain respectively (see tables 1, 2, 3). Finally, after combining these three sets of specific labels, one could have the set of 33 specific labels (see table 9).

Table 1 The 15 specific labels of Tourism domain

Tourism Label	Abbreviation	Description
Intent	int	User intent (travel, look for (hotel), book...)
Brand	brd	The object’s brand (Vietnam Airlines, VietTran...)
Contact	ctt	User ’s email or phone number
Context	ctx	User’s condition that affects his/her intent (pregnant, with baby along...)
Description of Object	obj-des	More about object’s characteristic (sea view)..
Destination	dest	The place where user is going to
Name of Accommodation	accom-name	Name of hotel, resort (Sealink, Sunwah, Ana Mandara)
Number of Objects	obj-num	The quantity of mentioned object
Number of People	ppl-num	The number of people in the journey
Object	obj	The object which user mentions
Point of Departure	dpt	The place where user’s journey starts
Point of Time	time-pnt	When user’s journey starts or finishes
Price	prc	The price of mentioned object
Time Period	time-prd	How long user’s journey takes
Transport	trp	Means of transportation

Table 2 The 18 specific labels of Real Estate domain

Real Estate Label	Abbreviation	Description
Intent	int	User intent (sell, buy, for rent..)
Acreage	acr	Object’s acreage
Brand	brd	The object’s brand
Contact	ctt	User ’s email, phone number
Context	ctx	User’s condition that affects his/her intent
Description of Object	obj-des	More about object’s characteristic ( residential land, agriculture land)
Equipment	eqm	The equipments in house, flat
Facade Direction	face-dir	The direction of facade
Facade Size	face-size	Facade’s width
Location	loc	Object’s location or user’s location
Number of Bedrooms	bed-num	The number of bedrooms
Number of Bathrooms	bath-num	The number of bathrooms
Number of Facades	face-num	The number of object’s facades
Number of Floors	fnum	The number of floors
Number of Objects	obj-num	The quantity of mentioned object
Object	obj	The object which user mentions
Owner	own	The seller is the head-owner of object or not
Price	prc	The price of mentioned object

Table 3 The 17 specific labels of Transportation domain

Transportation Label	Abbreviation	Description
Intent	int	User intent (sell, buy, hire..)
Brand	brd	The object’s brand (Honda, Yamaha, Toyota...)
Color	clr	Object’s color
Contact	ctt	User ’s email, phone number
Context	ctx	User’s condition that affects his/her intent
Description of object	obj-des	More about object’s characteristic
Location	loc	Object’s location or user’s location
License Plate	lpe	The license plate of object
Model	mdl	The model of the object (corola 1.6, wave rsx)
Number of Objects	obj-num	The quantity of mentioned object
Object	obj	The object which user mentions
Origin	orig	The place where object is manufactured
Owner	own	The seller is the head-owner of object or not
Price	prc	The price of mentioned object
Registration	reg	The object has legal documents or not
Registration Year	reg-year	When object is registered
State	stt	Object is old or new

Therefore, if the number of intent domains get bigger, the number of labels will grow sharply. As a result, the bigger the number of labels is, the more difficult the predict model has to face. In this paper, we proposed a new method to identify the user intention that does not depend on the domain d. We still formulate our work as a sequential labeling problem, but the main improvement is the idea of generalizing a new set of domain-independent labels that we will describe more clearly in the next subsection. Instead of building a set of specific labels for each intent domain, we try to build the most suitable set of labels for all intent domains.

Then we built a carefully experimental strategy to verify our assumption that the set of general labels is more effective than the set of specific labels for intent extraction problem when intent domains are scaled up. This assumption lead us to the novel approach to identify users intents that be called domain-independent method. This approach allows us to extract w and p without having to identify d.

This will help us to bypass one difficulty. Moreover, we are free from the worry of the number of labels increasing when a new domain comes, because we only need a few general labels as will be described in the next subsection for every domain. In the next subsection, we will present three sets of specific labels that we proposed for three domains Tourism, Transportation and Real Estate. Especially, we will explain more carefully about the way that we map from the set of general labels to these three sets of specific labels.

3.2 Domain-Independent Labels Versus Domain-Specific Labels

With three domains that we chose to crawl the data for model training (Tourism, Transportation, Real Estate), we built three specific sets of labels. We have 15 labels for Tourism, 18 labels for Real Estate, and 17 labels for Transportation, and they are described in detail in table 1, table 2 and table 3 respectively.

After conducting surveys carefully all the crawled data from three domains (Tourism, Transportation and Real Estate) and also some online data from other domains, we decided to proposed a set of 10 general labels. The table 4 presents these ten general labels and especially shows the mapping from the set of general labels to three sets of domain-specific labels. Some information/properties exists in almost sort of intent domains, such as intent, object, price..., then they are treated as general labels. Some other properties are just specific for each intent domain, for examlpe time period in Tourism domain, acreage in Real Estate domain or color in Transportation domain will be aggregated to make the label description in the set of general labels.

Table 4 The domain-independent labels

Domain-Independent Label	Abbre- via- tion	Tourism Specific Label	Real Estate Specific Label	Transportation Specific Label
Intent	int	Intent	Intent	Intent
Brand	brd	Brand	Brand	Brand
Contact	ctt	Contact	Contact	Contact
Context	ctx	Context	Context	Context
Description	des	- Description of Object - Point of Time - Time Period	- Acreage - Description of Object - Equipment - Facade Direction - Facade Size - Number of Bathrooms - Number of Bedrooms - Number of Facades - Number of Floors	-Color -Description of Object - License Plate - Model - Origin - Registration Year - State
Location	loc	-Destination - Point of Departure	Location	Location
Number of Objects	obj-num	Number of Objects	Number of Objects	Number of Objects
Object	obj	Object	Object	Object
Other	oth	- Name of Accommodation - Number of People - Transport	- Owner	- Owner - Registration
Price	prc	Price	Price	Price

3.3 Intent Extraction Models

Given a post contains a user's intent which belongs to any intent domain, our model desires to extract the intent keyword (such as buy, sell, hire...) and all the necessary information that relates to the user's intention. So we proposed to use two advanced machine learning models to build our models, namely CRFs and LSTM.

3.3.1 Conditional Random Fields

Conditional random fields ^[¹⁶^] are probabilistic models has shown a great success in segmenting and labeling sequence data.

Given o = {o₁, o₂, …, o_T} as input observation sequence data, CRFs identifies s = {s₁, s₂,…, s_T}, which is a finite set of state associated with a set of labels y_i = ( y_i ∈ L = {y₁, y₂, …, y_M}) by a probability function:

pθso=1Zθoexp∑t=1TFs,o,t, (2)

where Zθo=∑s'exp∑t=1TFs',o,t is the normalizing factor to ensure that pθso is a probabilistic distribution, and Fs,o,t=∑iλifis,o,t is the sum of CRFs feature f_i with the feature weight λ_i correspondingly. CRFs is trained by searching the set of weights θ*=λ1*,λ2*,…,λn* to maximize the log likelihood function. When the labels make the state sequence unambiguous, the likelihood function in exponential models such as CRFs is convex, thus searching the global optimum is guaranteed.

It has been shown that quasi-Newton methods, such as L-BFGS ^[¹⁸^], is the most efficient for this issue. In our work, we utilized pycrfsuite (https://python-crfsuite.readthedocs.io/en/), which is a fast implementation of Conditional Random Fields on Python. We chose linear chain CRFs architecture because of faster training time. State features used in our model were as following:

— N-gram feature: we used unigram, bigram and trigram to capture the context of word in the posts;
— Part-of-speech (POS) tag of word was utilized to enrich linguistics features of word, i.e. user's intent is a verb or location is a noun. We used each single word (separated by spaces) as a word segmentaion unit. Then, we used pyvi which is a python based implementation for VN POS tagging https://pypi.org/project/pyvi/. After manually inspecting this POS tagging tool on social texts, we found that this tool is appropriate;
— Some of entities in our data have special forms so we used word format feature to improve the accuracy in recognizing them. For example, word contains digit tend to be a point of time or price, word is initialized by a capital character tend to be a location;
— We built a dictionary to improve the learning task and using dictionary looking-up feature for unigram, bigram and trigram. In this dictionary, we built lists of unigram, bigram or trigram that belong to some labels. For the label Brand as an example, we created the list contains the words or phrases such as Hon da, Vietnam Airline, VinGroup... Then if the unigrams, bigrams or trigrams appear in those lists, the correspondingly features of the current single word will be updated. For example, if w₀w₁ in list_brand return predicate w₀: w₁: in_dictionary=brand

3.3.2 Bidirectional Long Short-Term Memory (Bi-LSTM)

LSTM was developed based on recurrent neural network (RNN) architecture by Hochreiter and Schmidhuber (1997) ^[⁹^] and it is known to be the most effective deep learning model in natural language processing problem. Given the input (x₁,x₂,...,x_n), we have LSTM model computes the state sequence (h₁,h₂,...,h_n) by iteratively applying the following updates:

it=σWxixt+Whiht-1+Wcict-1+bi,ct=1-it⨀ct-1+it⨀tanhWxcxt+Whcht-1+bc,ot=σWxoxt+Whoht-1+Wcoct+bo,ht=ot⨀tanhct,

where σ is the element-wise sigmoid function and ⊙ is the element-wise product. LSTM have been designed to combat RNN issue by incorporating a memory-cell and have been shown to capture long-range dependencies. They do so using several gates that control the proportion of the input to give to the memory cell c_t, and the proportion i_t to forget from the previous state.

In sequence tagging task, we have to take care of both past and future input features for a given time, so we chose Bi-LSTM network ^[⁵^] to do our second experiment. The figure 1 illustrates the structure of Bi-LSTM model that we used.

Fig. 1 The Bi-LSTM Model

With this model, l_i represents the word i and its left context and r_i represents the word i and its right context. Then these two vectors will be combined to create the result vector represents the word i in its context, c_i.

Following the Bi-LSTM architecture in ^[¹⁷^], we trained our Bi-LSTM model with the following set up:

— Because our data contains both words in formal and informal convention so it is very hard to use pre-trained word embeddings as input to Bi-LSTM model. Instead, we utilized the embeddings learned through our network.
— We combined both word embedding feature and char embedding feature as input to Bi-LSTM to reduce the affection of words which are not in vocabulary.

Specifically, the size of char embedding and the number of char long short-term memory unit in our model are both 25. These ones for the size of word embedding and the number of word long short-term memory unit are both 100.

We also used dropout technique to reduce the over-fit phenomenon. Our optimization method was Adam with learning rate, learning rate decay and clip gradients initialized by 0.001, 0.9, 5.0 respectively. All of these hyper-parameters would be tuned during training phase.

3.3.3 Bidirectional Long Short-Term Memory -CRFs (Bi-LSTM-CRFs)

Instead of making tagging independently, a CRFs layer is added at the end of the tagging process of a Bi-LSTM model. The output of Bi-LSTM layer had been considered as the input of CRFs layer and the output of CRFs layer will be the final tags. Based on the model described in ^[¹⁷^], we utilized Bi-LSTM-CRFs model described in figure 2 for our problem. The initialization of this model was same as the one described in Bi-LSTM model above.

Fig. 2 The Bi-LSTM-CRFs Model

4 Experimental Evaluation

4.1 Experimental Data

In our work, we used the data from online forums, social media network and other websites. We collected data for Tourism domain from two main sources: https://www.webtretho.com/forum/f110/ and https://dulich.vnexpress.net/. In Real Estate domain, data was mostly crawled from https://batdongsan.com.vn/. Some Facebook public groups, such as: https://www.facebook.com/groups/xemay-cuhanoi, were used for collecting data for out last selected domain, Transportation. We only used the posts which have length from 30 characters up to 800 characters in order to reduce noisy data come from advertisement posts.

Overall, our built dataset contains about 3000 posts for each domain. After that, we had a group of 5 students to tag the data with the labels that we had built. We carefully did the cross-check among of these students works to choose the most suitable annotation. Finally, for each post, we have two sets of labels to tag. The first one is the set of specific domain labels of a post and the second one is the set of general labels. Figure 3 presents an example post be tagged with domain-specific labels and domain-independent labels in turn.

Fig. 3 Tagging posts with specific labels and general labels

After that, we carried out the experiments with both tagged types, the results and discussion will be presented bellow. We used 60% of data to train our model, 20% of data to tune the hyper-parameters. Finally, to evaluate our model we used the remaining 20% of our collected data.

4.1.1 Results on a Specific Domain

4.2 Evaluation Measures

For all experiments, precision, recall and F1-score at the segment (or chunk-based) level are used as the evaluation measures. Specifically, assume that the true segment sequence of an instance is s = (s₁,s₂,...,s_N) and the decoded segment sequence is s'=s1',s2',…,sK'. Then, sK' is called a true positive if sK'∈s. The precision and recall are the fractions of the total number of true positives among the total number of decoded and true segments respectively. We report the F1-score which is computed as 2.precision.recall/(precision + recall). Besides, we have the support as the number of the true segment corresponding to each label in the testing set.

4.3 Experimental Results and Discussions

We conducted the experiments with three models as we described above, namely CRFs, Bi-LSTM, and Bi-LSTM-CRFs. With an attempt to prove our assumption that using general labels is more effect than specific ones if the number of intent domains increases, we carefully conducted totally 42 experiments, including:

— For each individual intent domain (Tourism, Real Estate, Transportation), extract intent with both set of specific labels and general ones;
— For each combination of 2 domains (Tourism vs. Real Estate, Tourism vs. Transportation, Real Estate vs. Transportation), extract intent with both specific labels and general ones;
— For the combination of all 3 intent domains, extract intent with both specific labels and general ones.

In the next sub section, we will present some of the most interesting results and their discussions.

As mentioned above, we did the experiments for each of three specific domains: Tourism, Transportation and Real Estate with both the set of general labels and the set of specific labels. Table 7 presents the overall results for these experiments with CRFs, Bi-LSTM, and Bi-LSTM-CRFs respectively. The highest F1-score we received when conducting experiments for each domain separately are the results of Tourism domain, they are 83.33% for extracting general labels and 82.01% for specific labels.

Table 7 The average F1-score for each specific domain with general labels and specific labels


General label	Tourism	Transportation	Real estate
CRFs	80.08	79.69	71.24
Bi-LSTM	81.71	77.43	72.51
Bi-LSTM-CRFs	83.33	79.75	74.21
Specific label	Tourism	Transportation	Real estate
CRFs	79.34	79.78	71.29
Bi-LSTM	80.89	78.00	71.70
Bi-LSTM-CRFs	82.01	79.76	74.85

As described above, Tourism domain has least number of labels compared to Real Estate and Transportation (15, 18 and 17 labels respectively). Moreover, after carefully analyzing data from three domains, we found that Tourism domain contains less noisy data, such as improper abbreviation, emoticons than two remaining domains. Table 7 also shows that Bi-LSTM-CRFs achieves better results than CRFs and Bi-LSTM our experiments, although there isn't any hand-crafted feature was used in Bi-LSTM-CRFs. It proves that Bi-LSTM-CRFs is the most suitable model for our problem.

For more detail, table 6 shows the best chunk-based results when applying Bi-LSTM-CRFs model for the Tourism data using the set of general labels. And table 5 shows the best chunk-based results when applying Bi-LSTM-CRFs model for the Tourism data using the set of specific labels. As we recognize, it is better to use the set of specific labels when extracting users intents in each individual domain. The reason is for a specific domain, the disparities in accuracy between using general labels and specific label are small, see table 7, while specific labels can describe the entities in greater detail.

Table 5 The best chunk-based result with the set of specific labels for a specific domain - Tourism domain

Specific Label	Precision	Recall	F1-score	Support
Intent	86.65	86.38	86.52	661
Brand	0.00	0.00	0.00	14
Contact	89.91	92.45	91.16	106
Context	64.71	51.76	57.52	85
Description of Object	39.47	40.91	40.18	110
Destination	86.46	85.32	85.89	756
Name of Accommodation	51.09	54.65	52.81	86
Number of Objects	93.33	86.42	89.74	81
Number of People	89.23	82.39	85.67	352
Object	81.48	76.92	79.14	143
Point of Departure	72.84	72.84	72.84	81
Point of Time	86.04	89.29	87.64	794
Price	74.12	76.83	75.45	164
Time Period	84.88	85.71	85.29	203
Transport	56.14	58.18	57.12	55
avg/total	82.29	81.82	82.01	3691

Table 6 The best chunk-based result with the set of general labels for a specific domain - Tourism domain

General Label	Precision	Recall	F1-score	Support
Intent	91.43	83.96	87.54	661
Brand	50.00	14.290	22.22	14
Contact	95.16	92.45	93.78	106
Context	72.06	57.65	64.05	85
Description	83.72	85.00	84.36	1107
Location	91.98	79.45	85.26	837
Number of Objects	95.77	83.95	89.47	81
Object	85.04	75.52	80.00	143
Other	82.03	76.88	79.37	493
Price	69.94	69.51	69.72	164
avg/total	86.38	80.71	83.33	3691

So, we then present the best chunk-based results of Transportation domain and Real Estate domain when doing experiments with the set of specific labels in the table 8. We find that almost labels that benefit from high number and also their values have the recognizable form, such as Intent, Price, Contact,..., usually get high accuracy in all of three domains. However, some labels although have quite high number, such as Location, Description, Equipment and Context, they still get not really high accuracies. This can be explained by their complicated and barely recognizable value forms.

Table 8 The best chunk-based F1-score result with the set of specific labels for Transportation domain and Real Estate domain

Transportation Label	F1-score	Support	Real Estate Label	F1-score	Support
Intent	90.03	661	Intent	93.37	569
Brand	87.26	192	Brand	25.00	10
Contact	94.63	458	Contact	93.23	402
Context	52.75	57	Context	40.32	51
Color	63.27	109	Facade Direction	60.91	96
Description of object	60.78	239	Acreage	83.56	575
License Plate	71.90	124	Description of Object	50.00	131
Location	78.76	403	Location	56.83	1052
Model	74.23	663	Number of Bathroom	93.33	70
Number of Objects	53.61	54	Number of Objects	51.28	39
Object	76.13	426	Object	76.80	553
Origin	81.55	111	Number of Floor	72.22	139
Owner	84.09	135	Facade Size	57.68	137
Price	88.16	501	Price	92.44	452
Registration	71.58	106	Equipment	58.17	85
Registration Year	86.90	90	Number of Bedroom	88.21	104
State	53.88	148	Number of Façade	41.18	32
			Owner	60.10	182
avg/total	79.78	4477	avg/total	74.85	4679

4.3.1 Using General Labels or Specific Labels when Scaling up Intention Domains

We would like to show our results and discussions to do the comparison between using the set of general labels and the set of specific labels. Figure 5 presents the average F1 score when we apply CRFs model in experiments with 1 domain using the set of domain-independent labels (general labels) and the set of domain-specific labels (specific labels) respectively, and the corresponding results when we increase the number of domains to 2 domains, 3 domains. Similarly it also shows the results when applying Bi-LSTM model and Bi-LSTM-CRFs model respectively.

Fig. 5 The average F1-score when applying CRFs, Bi-LSTM and Bi-LSTM-CRFs models in experiments with 1 domain, 2 domains and 3 domains using general labels and specific labels correspondingly

We realize that it is usually gets better results when using the set of general labels rather than using the set of specific labels. So we could come to the conclusion that it would be better to use the set of general labels when identifying user's intent from collections of data combining from various domains. And as we mentioned above, one more reason for this conclusion is using the set of general labels help to get rid of rebuilding a new set of labels when a new intent domain comes.

4.3.2 The Best Result for the Combination of Three Domains

Figure 4 shows the results when we apply CRFs model, Bi-LSTM model and Bi-LSTM-CRFs model to extract users intention for the combination of data from three selected domains. For each model, we conducted experiments with the set of general labels and the set of specific labels respectively. In this situation, Bi-LSTM-CRFs model still achieves higher average F1-score than the two remaining models and the experiments using the set of general labels alway show higher results than the set of specific labels.

Fig. 4 The average F1-score for the combination of three domain datas with Bi-LSTM-CRFs, Bi-LSTM, CRFs models correspondingly

Table 9 and table 10 below show the best chunk-based results when we do the experiment with the set of 32 specific labels and the set of 10 general labels for the combination of data from three selected domains respectively. This is the result when we applied Bi-LSTM-CRFs method into our model. With the set of general labels we find that the accuracies for almost labels are quite stability. They are almost over 70%, except the label Context. This can be explained by the number of the Context labels are small, moreover the description of the Context labels in this problem is very diverse and complicated as can be seen in the table 1, 2 and 3. Moreover in the experiment with the set of general labels, Intent and Object labels, which are the most important labels to identify users intents, always achieve higher F1-score than themselves in the experiment with the set of specific labels.

Table 9 The best chunk-based result with the set of specific labels for the combination of 3 domains

Specific Label (32)	Precision	Recall	F1-score	Support
Intent	90.94	89.69	90.31	1891
Object	75.80	79.86	77.78	1122
Acreage	83.64	80.00	81.78	575
Brand	74.66	76.39	75.51	216
Color	81.00	74.31	77.51	109
Contact	94.14	94.72	94.43	966
Context	58.22	44.04	50.15	193
Description	67.13	40.00	50.13	480
Destination	83.70	84.92	84.31	756
Equipment	77.97	54.12	63.89	85
Facade Direction	58.82	62.50	60.61	96
Facade Size	61.11	56.20	58.56	137
License Plate	75.00	75.00	75.00	124
Location	61.82	62.54	62.18	1455
Model	71.30	74.21	72.73	663
Name of Accommodation	45.95	59.30	51.78	68
Number of Bathroom	95.45	90.00	92.65	70
Number of Bedrooms	92.08	89.42	90.73	104
Number of Facades	50.00	50.00	50.00	32
Number of Floors	69.23	64.75	66.91	139
Number of Objects	75.30	71.84	73.53	174
Number of People	82.04	86.93	84.41	352
Time Period	91.01	84.73	87.76	203
Price	86.10	83.71	84.88	1117
Origin	76.32	78.38	77.33	111
Owner	72.58	68.45	70.45	317
Point of Departure	72.00	66.67	69.23	81
Point of Time	86.08	88.04	87.05	794
Registration	83.15	69.81	75.90	106
Registration Year	94.67	78.89	86.06	90
State	60.87	47.30	53.23	148
Transport	58.93	60.00	59.46	55
avg/total	79.26	77.57	78.21	12847

Table 10 The best chunk-based result with the set of general labels for the combination of 3 domains

General Label (10)	Precision	Recall	F1-score	Support
Intent	90.35	91.06	90.70	1819
Object	80.78	77.18	78.94	1122
Brand	85.96	70.83	77.66	216
Contact	94.17	95.34	94.75	966
Context	56.05	45.60	50.29	193
Description	76.58	70.10	73.20	3960
Location	69.69	71.12	70.40	2292
Number of Objects	72.84	67.82	70.24	174
Other	75.45	72.82	74.11	916
Price	87.38	86.12	86.74	1117
avg/total	79.72	77.08	78.33	12847

All in all, it reconfirms that Bi-LSTM-CRFs and the set of general labels are suitable for identifying users intents in the combination of various intent domain.

5 Conclusion

In this work, we present a novel method to deal with the problem of intent parsing and extraction. We call it the domain-independent intent extraction model. In this model, we propose a set of 10 general labels that is generated mainly base on three domains Tourism, Transportation, Real Estate and some other domain data as well.

We carefully conduct more than 40 experiments to verify our assumption that the set of general labels is more effective than the set of specific labels in the user intent identification task especially when intent domains are scaled up. Finally, most of experimental results show that our proposed general labels achieve higher accuracy than specific labels in almost experiments. The average accuracies with the set of general labels are stability and almost be over 74%.

Although these accuracies are not quite high, but it reconfirms that our approach is sensible. We also realize that we should improve our models and also the data to achieve higher results.

Acknowledgements

This work was supported by Ministry of Education and Training under the grant number B2019-SKH-01.

References

1. Ashkan, A., Clarke, C.L., Agichtein, E., & Guo, Q. (2009). Classifying and characterizing query intent. Proceeding of ECIR'31, pp. 578-586 . DOI: 10.1007/978-3-642-00958-753. [ Links ]

2. Bratman, M. (1987). Intention, plans, and practical reason. Harvard University Press. [ Links ]

3. Chen, Z., Liu, B., Hsu, M., Castellanos, M., & Ghosh, R. (2013). Identifying intention posts in discussion forums. Proceeding of the HLT-NAACL , pp. 1041-1050. [ Links ]

4. Ding, X., Liu, T., Duan, J., & Nie, J.Y. (2015). Mining user consumption intention from social media using domain adaptive convolutional neural network. Proceedings of the 29th AAAI Conference on Artificial Intelligence, pp. 2389-2395. [ Links ]

5. Graves, A. & Jürgen, S. (2005). Framewise phoneme classification with bidirectional LSTM networks. Proceeding of (IJCNN'05), Vol. 4. [ Links ]

6. Gupta, V., Varshney, D., Jhamtani, H., Kedia, D., & Karwa, S. (2014). Identifying purchase intent from social posts. Proceeding of (ICWSM). [ Links ]

7. Haque, R., Hasanuzzaman, M., & Way, A. (2019). Mining Purchase Intent in Twitter. Computación y Sistemas, Vol. 23, No. 3. [ Links ]

8. Hashemi, H.B., A.Siaee, A., & Kraft. R., (2016). Query intent detection using convolutional neural networks. Proceeding of WSDM QRUMS Workshop. [ Links ]

9. Hochreiter, S. & Jrgen, S. (1997). Long short-term memory. Neural computation, pp. 1735-1780. [ Links ]

10. Hollerit, B., Kroll, M., & Strohmaier, M. (2013). Towards linking buyers and sellers: detecting commercial intent on twitter. Proceeding of WWW. [ Links ]

11. Hu, J., Wang, G., Lochovsky, F., Sun, J.T., & Chen, Z. (2009). Undertanding user's query intent with wikipedia. Proceeding of the WWW. [ Links ]

12. Hu, D.H., Shen, D., Sun, J.T., Yang, Q., & Chen, Z. (2009). Context-aware online commercial intention detection. Proceeding of The ACML. [ Links ]

13. Jansen, B. J., Booth, D. L., & Spink, A. (2007). Determining the User Intent of Web Search Engine Queries. Proceeding of The 16th ACM, pp. 1149-1150. [ Links ]

14. Kathuria, A., Jansen, B.J., Hafernik, C., & Spink, A. (2010). Classifying the user intent of web queries using k-means clustering. The Emeral Group Journal, Vol. 20, No. 5, pp. 563-581. [ Links ]

15. Kim, J.K., Tur, G., Celikyilmaz, A., Cao, B., & Wang, Y.Y. (2016). Intent detection using semantically enriched word embeddings. Proceeding of SLT Workshop, IEEE. [ Links ]

16. Lafferty, J., Andrew, M., & Fernando, P. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceeding of ICML. [ Links ]

17. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. arXiv:1603.01360. [ Links ]

18. Liu, V. & Nocedal, J. (1989). On the limited memory BFGS method for large-scale optimization. Mathematical Programming, Vol. 45, pp. 503-528. [ Links ]

19. Luong, Th.L., Tran, Th.H., Truong, Qu.T., Truong, Th.M.Ng., Phi, Th., & Phan,X.H. (2016). Learning to filter user explicit intents in online vietnamese social media texts. Proceeding of ACIID, pp. 13-24. [ Links ]

20. Luong, Th.L., Truong, Qu.T., Dang, H.Tr., & Phan, X.H. (2016). Domain identification for intention posts on online social media. Proceeding of SoICT, pp. 52-57. [ Links ]

21. Luong, Th.L., Cao, M.S., Le, D.T., & Phan, X.H. (2017). Intent extraction from social media texts using sequential segmentation and deep learning models. Proceeding of the 9th KSE, pp. 215-220. [ Links ]

22. Ngo, X.B., Le, C.L., & Tu, M.Ph. (2017). Cross-Domain Intention Detection in Discussion Forums. Proceeding of the Eighth SoICT, pp. 173-180. [ Links ]

23. Nobari, G.H. & Chua,T.S. (2014). User intent identification from online discussions using a joint aspect-action topic model. Proceeding of AAAI. [ Links ]

24. Purohit, H., Dong, G., Shalin, V., Thirunarayan, K., & Shethet, A. (2015). Intent classification of short-text on social media. IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 222-228. [ Links ]

25. Ren, X., Wang, Y., Yu, X., Yan, J., Chen, Z., & Han, J. (2014). Heterogeneous graph-based intent learning with queries, web pages and wikipedia concepts. Proceeding of ICWSDM, pp. 23-32. [ Links ]

26. Rose, D.E. & Levinson, D. (2004). Understanding user goals in web search. Proceeding of the 13th ACM, pp. 13-19. [ Links ]

27. Wang, J., Cong, G., Zhao, W.X., & Li, X. (2015). Mining user intents in Twitter: a semi-supervised approach to inferring intent categories for tweets. Proceeding of the 29th AAAI. [ Links ]

28. Xu, J., Zhang, Q., & Huang, X. (2013). Understanding the semantic intent of domain-specific natural language query. Proceeding of International Joint Conference on Natural Language Processing, pp. 552-560. [ Links ]

Received: December 25, 2018; Accepted: March 04, 2019

^* Corresponding author is Thai-Le Luong. luongthaile80@utc.edu.vn

This is an open-access article distributed under the terms of the Creative Commons Attribution License