Predicting and Integrating Expected Answer Types into a Simple Recurrent Neural Network Model for Answer Sentence Selection

Kamath, Sanjay; Grau, Brigitte; Ma, Yue; Kamath, Sanjay; Grau, Brigitte; Ma, Yue

doi:10.13053/cys-23-3-3241

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.23 no.3 Ciudad de México jul./sep. 2019 Epub 09-Ago-2021

https://doi.org/10.13053/cys-23-3-3241

Articles of the Thematic Issue

Predicting and Integrating Expected Answer Types into a Simple Recurrent Neural Network Model for Answer Sentence Selection

Sanjay Kamath¹³^*

Brigitte Grau¹²

Yue Ma³

^¹ LIMSI, CNRS, Université Paris-Saclay, Orsay, France. sanjay@limsi.fr

^² ENSIIE, Université Paris-Saclay, Évry, France. bg@limsi.fr

^³ LRI, Univ. Paris-Sud, CNRS, Université Paris-Saclay, Orsay, France. yue.ma@lri.fr

Abstract

Since end-to-end deep learning models have started to replace traditional pipeline architectures of question answering systems, features such as expected answertypes which are based on the question semantics are seldom used explicitly in the models. In this paper, we propose a convolution neural network model to predict these answer types based on question words and a recurrent neural network model to find sentence similarity scores between question and answer sentences. The proposed model outperforms the current state of the art results on an answer sentence selection task in open domain question answering by 1.88% on MAP and 2.96% on MRR scores.

Keywords: Question answering; deep learning; answer sentence selection; expected answer types; sentence similarity

1 Introduction

Question answering systems in recent times have mainly been dominated by neural network approaches that fetch state of the art results across different NLP tasks. Open domain question answering tasks include answer sentence selection, reading comprehension, multi-hop reasoning and reading etc. An example of a question answer pair from a dataset:

Q: How a water pump works?
A: pumps operate by some mechanism … to perform mechanical work by moving the fluid.

An answer sentence selection model would retrieve the entire sentence from a paragraph as an answer. A common goal of the neural network models is to build end to end approaches which do not rely on intermediate tools or data provided by other systems. Some recent works such as BERT ^[³^] and ELMO ^[¹¹^] use pre-trained language models trained with large neural network architectures and use it to fine tune downstream NLP tasks. These methods outperform current state of the art systems for reading comprehension as well as many other tasks. However, training such models on large datasets and the requirement of large scale computation power is sometimes not a feasible solution.

Other state of the art models such as QANet ^[¹⁹^] on SQUAD and other end to end approaches try to implicitly learn information such as entity types, part of the speech tags, named entities, syntactic dependencies etc. and perform downstream tasks. But the challenge still remains in understanding whether or not they utilize such information implicitly or just overfit over the datasets and their unintended bias. A feasible yet challenging approach would be to utilize both the power of neural networks approaches and explicit information such as entity types, dependencies, tags, together. Expected Answer Types (referred to as EAT hereafter) is one such vital information which is important for question answering systems to detect which type of answers do the questions require. Some examples of EAT with questions are listed below:

Question: Which NFL team represented the AFC at Super Bowl 50?
Expected Answer Type: HUM.
Question: Where was franz kafka born ?
Expected Answer Type: LOC.

^[¹⁵^] refer to this information as Question Classes in their work and show a significant improvement over a previous state of the art DNN model on TrecQA dataset which uses only word level information.

Our contributions in this article are as follows. We introduce two different ways of using Question Classes which is further referred as EAT or Expected Answer Types and experiment with several datasets along with TrecQA to determine if this would work better for a wider range of large scale datasets by using a simple model of a recurrent neural network which uses a pre-attention mechanism. To annotate other datasets apart from TrecQA, with EAT information, we propose a multiclass classifier model which is trained on a dataset built by using an existing rule-based system which predicts EAT for questions.

We report our findings on WikiQA, SQUAD-Sent and TrecQA dataset performance and show that we outperform state of the art results on TrecQA dataset^¹ by the two different ways of highlighting Expected Answer Types in the data.

Answer sentence selection task has been extensively studied with different approaches ranging from n-gram models to neural network models. In former feature based QA systems, the Expected Answer Type (EAT) has been shown as a very important feature ^[⁷^].

The EAT corresponds to an entity type organized in answer type taxonomies, as in ^[⁸^] for the open domain or semantic types in biomedical domain as in ^[⁵^].

Recent works on this task focus mainly on convolutional neural network approaches. ^[¹⁴^] propose a CNN model using learning to rank approach, which computes a representation of both entries, candidate passage and question, and a similarity between these two representations using a pooling layer followed by similarity matrix computation. In ^[¹⁸^], the similarity of the two entries is evaluated by computing interactions between words of the two texts by an attention layer. ^[⁴^] propose a Multi-Perspective CNN for this task which was further used by ^[¹³^] with a triplet ranking loss function to learn pairwise ranking from both positive and negative samples.

^[¹⁵^] use the same model but use Question Classes to enhance the dataset with highlighting entities in it. Highlighting entities were done by mainly two ways called Bracketing (appending a special token before and after the entity occurrence) and Replacement (replacing the entity word with a special token) methods. Our work uses a similar technique by replacing the entity word with special tokens but allows to learn them according to the expected types. The leaderboard of TrecQA evaluation^¹ reports the state of the art scores from different methods reported by several articles.

2 Answer Sentence Selection

Answer sentence selection is a question answering task which is also referred sometimes as sentence reranking task. The task involves reranking a set of sentences S=S1,…,Sm for a question Q, so that the correct sentences are ranked first. Sentence set S can contain the mixture of both negative and positive sentences relevant to the question, often more than one positive sentence.

We model this task as a pairwise similarity scoring task. For each sentence related to a question, we compute a similarity score against the question sentence and answer sentence. i.e., Qi-Si,j,Qi-Si,j+1,Qi-Si,j+2,…Qi-Si,j+n.

2.1 RNN-Similarity

Recurrent neural networks such as LSTMs and GRUs are widely used in several NLP tasks like machine translation, sequence tagging, and question answering tasks such as reading comprehension and answer sentence selection.

We propose a simple model with recurrent neural networks and an attention mechanism to capture sequential semantic information of words in both questions and sentences and predict similarity scores between them. We refer to this model further in this article as RNN-Similarity model. Figure 1 shows the architecture of the model.

Fig. 1 Proposed RNN-Similarity model

Question words Q=q1,…,qm and Sentence words S=s1,…,sm are sequences which are encoded using an embedding layer of dimension D:

EQ=Eq1,…,Eqm, (1)

ES=Es1,…,Esn, (2)

A pre-attention mechanism captures the similarity between sentence words and questions words in the same layer. For this purpose, a feature F align shown in Equation 3 is added as a feature to the LSTM layer:

Falignpi=∑jai,jEqj, (3)

where a _i,j is,

ai,j=exp⁡αEsi⋅αEqj∑j'exp(⁡αEsi⋅αE qj', (4)

which computes the dot products between nonlinear mappings of word embeddings of question and sentence.

The above process is similar to ^[¹^] who use LSTMs to model Question and Paragraph to encode the words for reading comprehension task. We use 3-layer Bidirectional LSTM layers for both question and sentence encodings:

Eq1,…,Eqn=Bi-LSTME~q1,…,E~qn, (5)

Es1,…,Esn=Bi-LSTME~s1,…,E~sn, (6)

The LSTM output states are further connected to a linear layer and a sigmoid non-linear activation function is applied on the output of the linear layer which outputs the score ranging between 0-1, which signifies the similarity between the question and the answer sentence.

For the Expected Answer Types (EAT) version of question and sentences, we create special tokens for the entity type that are used for encoding the question Q and each sentence S.

2.2 Highlighting Single Entity and Multiple Entity Types

The authors of ^[¹⁵^] propose a method of replacing words by special token embeddings for highlighting entities that catch the EAT entity in sentences. In our work, this method is referred to as "EAT (single type)" in the following experiments. The entities belong to (HUM, LOC, ABBR, DESC, NUM or ENTY). HUM refers to a description, group, individual, title. LOC refers to city, country, mountain, state. ABBR refers to abbreviation, expansion. DESC refers to a definition, description, manner, reason. NUM refers to numerical values such as code, count, date, distance, money, order etc. ENTY refers to a numerous entity types such as animal, body, color, creation, currency, disease etc. More details regarding the taxonomy can be found in ^[⁹^].

The entities, irrespective of which class they belong to, are treated similarly by replacing them by two special tokens entity left for entity occurrences and max_entity_left for maximum occurring entity that corresponds to an entity that is at least twice the number of occurrences when compared to the second maximum occurring entity. Entity types are recognized using the named entity recognition tool. When an entity type in a sentence matches the EAT from the question, entity left token is used to replace the entity mentions in the sentences; same applies for the maximum occurring entity token max_entity_left as well.

Our proposition is to replace an entity according to the type it belongs to instead of replacing all kinds of entity by just one word i.e. entity_left. We do it based on the different types of EAT it belongs to based on the taxonomy used in the original work. The intuition behind this method is that the model would learn to better map the relations between question words and specific entity type tokens when used in a model with attention mechanisms, rather than learning the relation between question words and the same generic entity type token for all entities.

This way, we can learn a different behaviour with an entity about location and with an entity about a person for example.

The example in Table 1 line 3 refers to an example that has an EAT as "HUM" from the taxonomy, so we replace it as entity_hum. We do the same for other expected answer types such as entity_loc for " LOC" type, entity_enty for " ENTY" type, entity_num for "NUM" type, entity_desc for "DESC" type, entity_abbr for "ABBR" type. We replace the entity mentions in the text whose types are matching the EAT from questions.

Table 1 Three methods for replacing entities along with an example from TrecQA dataset

—	Method	Question	Sentence
1	Original text	Who is the author of the book, ‘The Iron Lady: a biography of Margaret Thatcher’	in ‘The Iron Lady,’ Young traces ...... the greatest woman political leader since Catherine the Great.
2	Replacement - ^[¹⁵^] (EAT Single type)	Who is the author of the book, ‘The Iron Lady: a biography of Margaret Thatcher’ max_entity_left entity_left	in ‘The Iron Lady,’ max_entity_left traces ...... the greatest woman political leader since entity_left.
3	EAT (Different types)	Who is the author of the book, ‘The Iron Lady: a biography of Margaret Thatcher’ max_entity_left entity_hum	in ‘The Iron Lady,’max_entity_left traces ...... the greatest woman political leader since entity_hum.
4	EAT (MAX + Different types)	Who is the author of the book, ‘The Iron Lady: a biography of Margaret Thatcher’ max_entity_hum entity_hum	in ‘The Iron Lady,’ max_entity_hum traces ...... the greatest woman political leader since entity_hum.

We also experiment with a variant where the max_entity_left is replaced with the entity type along with other entities. If the maximum entity is of type "HUM", we replace it as max_entity_hum. This method is referred to as "EAT (MAX + different types)" in the following experiments. We create a random word embedding ranging between (-0.5 -0.5) with dimension D for each of the EAT words and encode the word with this embedding when it appears in all our experiments.

3 Experiments

We perform experiments on three datasets namely 1) TrecQA, 2) WikiQA, 3) SQUAD-Sent with and without EAT annotations. Thus we had to develop our own annotation tools.

3.1 Annotation of the EAT

Since SQUAD-EAT (see section 3.3) is the result of a rule-based method with a high accuracy score (97.2% as reported in ^[⁹^]), we use it to train a multiclass classifier based on a CNN model for text classification^² by ^[⁶^], by modifying the outputs into a multi-class setting. We further refer to this as EAT Classifier. We use 300 dimensions GloVe embeddings by ^[¹⁰^].

The output classes of the classifier refer to a type based on the taxonomy such as ABBR, DESC, ENTY, HUM, LOC, NUM and a "NO_EAT" class to signify an EAT which is not in the above list of classes. We do not use the fine level taxonomy in this work because of a resulting large number of classes with sparse distribution of samples in the dataset. Below is an example from SQUAD-EAT with HUM:

Question: Which NFL team represented the AFC at Super Bowl 50?
Expected Answer Type: HUM.

We train the multi-class classifier model using the SQUAD-EAT dataset which gets an accuracy score of 95.17% on the SQUAD-EAT dev in our experiment, according to the annotation done by ^[¹⁵^] as reference.

3.2 Annotation of the Entities

We detect the entities in the sentences using Dbpedia Spotlight tool by ^[²^]. The detected entities by spotlight are verified for their entity type match using the Spacy NER tool which is mapped to EAT using the mapping shown in table 2. Only the matching entities are highlighted and others are discarded. We replace the special token by adding one for the maximum occurring entity, which is described in Section 2.2.

Table 2 Spacy named entity annotation scheme following OntoNotes 5 corpus mapped with EAT types

EAT	Spacy annotated tag

HUM	PERSON, ORG, NORP
LOC	LOC, GPE
ENTY	PRODUCT, EVENT, LANGUAGE, WORK OF ART, LAW, FAC
NUM	DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL, CARDINAL

3.3 The Data

TrecQA dataset is a standard dataset used to benchmark state of the art systems for answer sentence selection task. The authors of ^[⁹,¹⁵^] provide the EAT annotations for the TrecQA dataset based on their rule-based approach.

We modify the QA dataset SQUAD by ^[¹²^] designed for machine comprehension, into an answer sentence selection dataset to provide the answers in their original context. We name it as SQUAD-Sent. We do this by processing the dataset where each example is usually a triple of Question, Paragraph and Answer span (Text and the answer start offset in the paragraph) into a dataset where each triple is a Question, Sentence and Sentence label. The sentence label is 1 if the answer is present inside the sentence, else it is 0. We perform sentence tokenization using spacy toolkit^³ on paragraphs of SQUAD and perform a check for an exact match of answer strings in them.

SQUAD-Sent is a special case dataset where there is just one positive sentence per question and the other sentences are negative examples. The motivation to do this is because of the large scale property of this dataset, compared to the other datasets, with human-generated questions. For the expected answer types of SQUAD questions, we use SQUAD-EAT which is a dataset with EAT annotated questions on SQUAD v1 dataset questions which is annotated by the authors of ^[⁹,¹⁵^] on our request. WikiQA dataset by ^[¹⁷^] is another dataset used for answer sentence selection task which was built using Bing search engine query logs. We use a preprocessed version as used by ^[¹³^] which has removed certain examples without any positive answers and questions with more than 40 tokens to compare the scores. The questions and answer sentences are annotated with EAT information as described in section 3.1.

Table 3 shows the statistics of the datasets with EAT annotated questions and plain word level questions (regular datasets) and the number of entities annotated in each set. EAT version of TrecQA dataset is as reported in ^[¹⁵^] and available through this link^⁴.

Table 3 Statistics of datasets with plain and EAT annotated questions. '#' refers to "Number of."

Dataset	Split	#Plain Q	#EAT Q	#Entities

Trec QA	Train Dev Test	1229 82 100	649 76 8	29064 382 597
SQUAD-Sent	Train Dev Test	87,599 10,570 —	78,740 9,606 —	35087 4757 —
Wiki QA	Train Dev Test	873 126 243	859 124 236	132 4 38

3.4 Implementation

We implement the RNN-Similarity model in Pytorch, and we use MSELoss (Mean Squared Error loss) to minimize the error of predictions for relevance scores. We use adamax optimizer and keep the missing word embeddings as zero embeddings. We implement the EAT Classifier using the CNN model available online^⁵ and used Keras to implement the multiclass classifier which uses GloVe embeddings as input. The code for both the models along with default hyperparameters is publicly available on Github ^⁶.

3.5 Results

Table 4 shows various results on different versions of datasets. Note that the questions in the following experiments of Table 4 contain all the questions from the datasets, which includes questions which are highlighted with EAT and questions which are not highlighted with EAT as well. Note that we test our systems on the Raw version of TrecQA test dataset.

Table 4 Results reported on TrecQA, WikiQA, and SQUAD-Sent datasets. SQUAD-Sent dataset is a modified version for answer sentence selection task. RNN-S is RNN-Similarity model.

Datasets	Method	Acc.@1	MAP	MRR

TrecQA	Plain words - ^[¹³^] EAT words - ^[¹⁵^] Plain words - RNN-S EAT words (single type) - RNN-S EAT words (different types) - RNN-S EAT words (MAX+different types) - RNN-S	— — 78.95 85.26 85.26 86.32	78 83.6 80.24 85.28 85.48 85.42	83.4 86.2 84.81 89.16 88.11 88.86
WikiQA	Plain words - ^[¹³^] Plain words - ^[¹⁶^] Plain words - RNN-S EAT words (single type) - RNN-S EAT words (different types) - RNN-S EAT words (MAX+different types) - RNN-S	— — 56.79 56.38 58.4 57.20	70.9 75.59 69.07 68.63 70.04 69.17	72.3 77.00 70.55 70.59 71.56 70.89
SQUAD-Sent	Plain words - Implementation⁷ of model by ^[¹³^] Plain words - RNN-S EAT words (single type) - RNN-S EAT words (different types) - RNN-S EAT words (MAX+different types) - RNN-S	— 83.94 84.21 84.26 84.24	— — — — —	58.08 90.5 90.65 90.70 90.69

3.5.1 TrecQA

The current state of the art system is by ^[¹⁵^] that uses EAT on word level model of ^[¹³^]. Henceforth both results are presented. Our model RNN-Similarity on plain word level data fetches better result than the model of ^[¹³^] by 2.24 % on MAP and 1.41 % on MRR. Our EAT words (single type), EAT words (different types) and EAT words (MAX + different types) models outperforms the state of the art performance for both MAP (1.68%) and MRR (2.96%) scores of the previous state of the art model by ^[¹⁵^] where the MAP and MRR scores are higher for correct sentences being ranked as top 1.

3.5.2 WikiQA

Although a recent model by ^[¹⁶^] which uses kernel methods outperforms all the scores of our model, we note that the performance on our EAT level models is higher than the ones on plain words. Only a few number of entities are annotated by spotlight compared to other datasets which is shown in the table 3. To annotate entities better we experimented using Spacy NER types directly which resulted in more annotated entities but reduced the performance lower than the word level scores.

3.5.3 SQUAD-Sent

SQUAD official test set is hidden to the public users. Although the difference between word level and EAT word level is little, the difference highlights the fact that the entity words replaced in the sentence would not worsen the performance of the systems; instead it improves it subtly. We would like to note that the MAP and MRR values were the same because of the existence of just 1 positive sentence amongst other negative per question. Hence we only report MRR on this dataset. Plain words - ^[¹³^] performance is obtained using the implementation available online^⁸, which we experimented on SQUAD-Sent dataset.

One aspect to be highlighted is that the implementation^⁸ of word level model by ^[¹³^] originally made for TrecQA dataset performs poorly (58.05%) SQUAD-Sent dataset (maybe because SQUAD-Sent has only one positive answer sentence per question whereas other datasets have several ones) which motivated us to build a model (RNN-Similarity) which works robustly for all the three datasets we have experimented with, without changing any specific hyperparameters of these models. Table 5 shows various results on TrecQA and SQUAD-Sent datasets with only the questions which are annotated with EAT information in the train and test sets.

Table 5 Results reported on TrecQA and SQUAD-Sent datasets using RNN-Similarity model trained only on EAT annotated questions

Datasets	Method	Acc.@1	MAP	MRR

TrecQA (EAT)	EAT words (single type) EAT words (different types) EAT words (MAX+different types)	84.15 85.37 85.37	84.81 85.45 85.06	87.17 88.18 89.20
WikiQA (EAT)	EAT words (single type) EAT words (different types) EAT words (MAX+different types)	58.02 55.14 56.38	68.91 67.70 68.16	70.99 69.52 69.83
SQUAD - Sent (EAT)	EAT words (single type) EAT words (different types) EAT words (MAX+different types)	83.81 84.04 84.16	— — —	90.53 90.61 90.73

Training datasets with questions which contain EAT information only; if the question does not have a EAT value, it is discarded from the dataset below are the set of experiments and results:

— TrecQA (EAT): Apart from EAT words (MAX + different types) version of the dataset, the other two methods outperform word level models and EAT word level by ^[¹⁵^] where the dataset statistics of this method can also be found.
— SQUAD-Sent(EAT): There is a difference of 8,800 questions from SQUAD-Sent dataset, which is considerably a huge number of missing questions. Yet the results from these experiments, do not decrease a lot, but rather perform better than SQUAD-Sent's plain word level model compare to EAT (different types) data.
— WikiQA (EAT): We remove the questions with 'NO-EAT' class which were 23 questions overall. The results are better with EAT (single type) which shows that the method works well in certain cases better than different types of EAT.

The results reported in table 5 shows that there is not a significant improvement over different methods when trained only on questions with EAT information. Henceforth it is better to train models with the entire dataset and highlight EAT information only when the question contains the EAT information.

4 Conclusion and Future work

The Expected Answer Types are a useful piece of information that used to be extensively exploited in the traditional QA systems. Using them with the current state of the art DNN systems improves the system performance. We propose a simple model using recurrent neural networks which works robustly on three different datasets without any hyperparameter tuning and annotate entities belonging to the expected answer type of the question. Our model outperforms the previous state of the art systems in answer sentence task. We also propose a model to predict the expected answer type based on the question words using a multiclass classifier trained on a rule based system's output on a large scale QA dataset.

Future work involves using the expected answer types information in other downstream tasks such as in reading comprehension or multihop reading systems for extracting a short answer span.

Acknowledgements

We would like to thank Harish Tayyar Madabushi from the University of Birmingham for providing us with the annotated questions for SQUAD dataset.

This work is funded by the ANR project GoAsQ (ANR-15-CE23-0022).

References

1. Chen, D., Fisch, A., Weston, J., & Bordes, A. (2017). Reading wikipedia to answer open-domain questions. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). [ Links ]

2. Daiber, J., Jakob, M., Hokamp, C., & Mendes, P. N. (2013). Improving efficiency and accuracy in multilingual entity extraction. Proceedings of the 9th International Conference on Semantic Systems, ACM. [ Links ]

3. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. [ Links ]

4. He, H., Gimpel, K., & Lin, J. (2015). Multi-perspective sentence similarity modeling with convolutional neural networks. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. [ Links ]

5. Kamath, S., Grau, B., & Ma, Y. (2018). Verification of the expected answer type for biomedical question answering. Companion Proceedings of the The Web Conference 2018, WWW '18. [ Links ]

6. Kim, Y. (2014). Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). [ Links ]

7. Kolomiyets, O. & Moens, M.-F. (2011). A survey on question answering technology from an information retrieval perspective. Information Sciences, Vol. 181, No. 24. [ Links ]

8. Li, X. & Roth, D. (2006). Learning question classifiers: the role of semantic information. Natural Language Engineering, Vol. 12, No. 3. [ Links ]

9. Madabushi, H. T. & Lee, M. (2016). High accuracy rule-based question classification using question syntax and semantics. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. [ Links ]

10. Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). [ Links ]

11. Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). [ Links ]

12. Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. [ Links ]

13. Rao, J., He, H., & Lin, J. (2016). Noise-contrastive estimation for answer selection with deep neural networks. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM '16. [ Links ]

14. Severyn, A. & Moschitti, A. (2015). Learning to rank short text pairs with convolutional deep neural networks. Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, ACM. [ Links ]

15. Tayyar Madabushi, H., Lee, M., & Barnden, J. (2018). Integrating question classification and deep learning for improved answer selection. Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics. [ Links ]

16. Tymoshenko, K. & Moschitti, A. (2018). Cross-pair text representations for answer sentence selection. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). [ Links ]

17. Yang, Y., Yih, W.-t., & Meek, C. (2015). Wikiqa: A challenge dataset for open-domain question answering. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. [ Links ]

18. Yin, W., Schütze, H., Xiang, B., & Zhou, B. (2016). Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics, Vol. 4. [ Links ]

19. Yu, A. W., Dohan, D., Le, Q., Luong, T., Zhao, R., & Chen, K. (2018). Fast and accurate reading comprehension by combining self-attention and convolution. International Conference on Learning Representations. [ Links ]

Received: January 29, 2019; Accepted: March 04, 2019

^* Corresponding author is Sanjay Kamath. sanjay@limsi.fr

This is an open-access article distributed under the terms of the Creative Commons Attribution License