Community Question Answering

Community Question Answer is to automatically search for relevant answers among many responses provided for a given question (Answer Selection), and search for relevant questions to reuse their existing answers (Question Retrieval).

Classic Datasets

Dataset Domain #Question #Answer
TRECQA Open-domain 1,229 5,3417
WikiQA Open-domain 3,047 29,258
InsuranceQA Insurance 12,889 21,325
FiQA Financial 6,648 57,641
Yahoo! Answers Open-domain 50,112 253,440
SemEval-2015 Task 3 Open-domain 2,600 16,541
SemEval-2016 Task 3 Open-domain 4,879 36,198
SemEval-2017 Task 3 Open-domain 4,879 36,198
  • TRECQA dataset is created by Wang et. al. from TREC QA track 8-13 data, with candidate answers automatically selected from each question’s document pool using a combination of overlapping non-stop word counts and pattern matching. This data set is one of the most widely used benchmarks for answer sentence selection.

  • WikiQA is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering by Microsoft Research.

  • InsuranceQA is a non-factoid QA dataset from the insurance domain. Question may have multiple correct answers and normally the questions are much shorter than the answers. The average length of questions and answers in tokens are 7 and 95, respectively. For each question in the development and test sets, there is a set of 500 candidate answers.

  • FiQA is a non-factoid QA dataset from the financial domain which has been recently released for WWW 2018 Challenges. The dataset is built by crawling Stackexchange, Reddit and StockTwits in which part of the questions are opinionated, targeting mined opinions and their respective entities, aspects, sentiment polarity and opinion holder.

  • Yahoo! Answers is a web site where people post questions and answers, all of which are public to any web user willing to browse or download them. The data we have collected is the Yahoo! Answers corpus as of 10/25/2007. This is a benchmark dataset for communitybased question answering that was collected from Yahoo Answers. In this dataset, the answer lengths are relatively longer than TrecQA and WikiQA.

  • SemEval-2015 Task 3 consists of two sub-tasks. In Subtask A, given a question (short title + extended description), and several community answers, classify each of the answer as definitely relevance (good), potentially useful (potential), or bad or irrelevant (bad, dialog, non-english other). In Subtask B, given a YES/NO question (short title + extended description), and a list of community answers, decide whether the global answer to the question should be yes, no, or unsure.

  • SemEval-2016 Task 3 consists two sub-tasks, namely Question-Comment Similarity and Question-Question Similarity. In the Question-Comment Similarity task, given a question from a question-comment thread, rank the comments according to their relevance with respect to the question. In Question-Question Similarity task, given the new question, rerank all similar questions retrieved by a search engine.

  • SemEval-2017 Task 3 contains two sub-tasks, namely Question Similarity and Relevance Classification. Given the new question and a set of related questions from the collection, the Question Similarity task is to rank the similar questions according to their similarity to the original question. While the Relevance Classification is to rank the answer posts according to their relevance with respect to the question based on a question-answer thread.

Performance

TREC QA (Raw Version)

Model Code MAP MRR Paper
Punyakanok (2004) 0.419 0.494 Mapping dependencies trees: An application to question answering, ISAIM 2004
Cui (2005) 0.427 0.526 Question Answering Passage Retrieval Using Dependency Relations, SIGIR 2005
Wang (2007) 0.603 0.685 What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA, EMNLP 2007
H&S (2010) 0.609 0.692 Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions, NAACL 2010
W&M (2010) 0.595 0.695 Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering, COLING 2020
Yao (2013) 0.631 0.748 Answer Extraction as Sequence Tagging with Tree Edit Distance, NAACL 2013
S&M (2013) 0.678 0.736 Automatic Feature Engineering for Answer Selection and Extraction, EMNLP 2013
Backward (Shnarch et al., 2013) 0.686 0.754 Probabilistic Models for Lexical Inference, Ph.D. thesis 2013
LCLR (Yih et al., 2013) 0.709 0.770 Question Answering Using Enhanced Lexical Semantic Models, ACL 2013
bigram+count (Yu et al., 2014) 0.711 0.785 Deep Learning for Answer Sentence Selection, NIPS 2014
BLSTM (W&N et al., 2015) 0.713 0.791 A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering, ACL 2015
Architecture-II (Feng et al., 2015) 0.711 0.800 Applying deep learning to answer selection: A study and an open task, ASRU 2015
PairCNN (Severyn et al., 2015) official 0.746 0.808 Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks, SIGIR 2015
aNMM (Yang et al., 2016) official MatchZoo 0.750 0.811 aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model, CIKM 2016
HDLA (Tay et al., 2017) official 0.750 0.815 Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture, SIGIR 2017
PWIM (Hua et al. 2016) official 0.758 0.822 Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement, NAACL 2016
MP-CNN (Hua et al. 2015) official 0.762 0.830 Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, EMNLP 2015
HyperQA (Tay et al., 2017) official 0.770 0.825 Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018
MP-CNN (Rao et al., 2016) official 0.780 0.834 Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016
HCAN (Rao et al., 2019) 0.774 0.843 Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling, EMNLP 2019
MP-CNN (Tayyar et al., 2018) 0.836 0.863 Integrating Question Classification and Deep Learning for improved Answer Selection, COLING 2018
Pre-Attention (Kamath et al., 2019) 0.852 0.891 Predicting and Integrating Expected Answer Types into a Simple Recurrent Neural Network Model for Answer Sentence Selection, CICLING 2019
CETE (Laskar et al., 2020) 0.950 0.980 Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task LREC 2020

TREC QA (Clean Version)

Model Code MAP MRR Paper
W&I (2015) 0.746 0.820 FAQ-based Question Answering via Word Alignment, arXiv 2015
LSTM (Tan et al., 2015) official 0.728 0.832 LSTM-Based Deep Learning Models for Nonfactoid Answer Selection, arXiv 2015
AP-CNN (dos Santos et al. 2016) 0.753 0.851 Attentive Pooling Networks, arXiv 2016
L.D.C Model (Wang et al., 2016) 0.771 0.845 Sentence Similarity Learning by Lexical Decomposition and Composition, COLING 2016
MP-CNN (Hua et al., 2015) official 0.777 0.836 Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, EMNLP 2015
HyperQA (Tay et al., 2017) official 0.784 0.865 Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018
MP-CNN (Rao et al., 2016) official 0.801 0.877 Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016
BiMPM (Wang et al., 2017) official MatchZoo 0.802 0.875 Bilateral Multi-Perspective Matching for Natural Language Sentences, arXiv 2017
CA (Bian et al., 2017) official MatchZoo 0.821 0.899 A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection, CIKM 2017
IWAN (Shen et al., 2017) 0.822 0.889 Inter-Weighted Alignment Network for Sentence Pair Modeling, EMNLP 2017
sCARNN (Tran et al., 2018) 0.829 0.875 The Context-dependent Additive Recurrent Neural Net, NAACL 2018
MCAN (Tay et al., 2018) 0.838 0.904 Multi-Cast Attention Networks, KDD 2018
MP-CNN (Tayyar et al., 2018) 0.865 0.904 Integrating Question Classification and Deep Learning for improved Answer Selection, COLING 2018
CA + LM + LC (Yoon et al., 2019) 0.868 0.928 A Compare-Aggregate Model with Latent Clustering for Answer Selection, CIKM 2019
GSAMN (Lai et al., 2019) official 0.914 0.957 A Gated Self-attention Memory Network for Answer Selection, EMNLP 2019
TANDA (Garg et al., 2019) official 0.943 0.974 TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection, AAAI 2020
CETE (Laskar et al., 2020) 0.936 0.978 Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task, LREC 2020

WikiQA

Model Code MAP MRR Paper
ABCNN (Yin et al., 2016) official 0.6921 0.7108 ABCNN: Attention-based convolutional neural network for modeling sentence pairs, ACL 2016
Multi-Perspective CNN (Rao et al., 2016) official 0.701 0.718 Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks, CIKM 2016
HyperQA (Tay et al., 2017) official 0.705 0.720 Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks, WSDM 2018
KVMN (Miller et al., 2016) official 0.7069 0.7265 Key-Value Memory Networks for Directly Reading Documents, ACL 2016
BiMPM (Wang et al., 2017) official MatchZoo 0.718 0.731 Bilateral Multi-Perspective Matching for Natural Language Sentences, IJCAI 2017
IWAN (Shen et al., 2017) 0.733 0.750 Inter-Weighted Alignment Network for Sentence Pair Modeling, EMNLP 2017
CA (Wang and Jiang, 2017) official 0.7433 0.7545 A Compare-Aggregate Model for Matching Text Sequences, ICLR 2017
HCRN (Tay et al., 2018c) MatchZoo 0.7430 0.7560 Hermitian co-attention networks for text matching in asymmetrical domains, IJCAI 2018
Compare-Aggregate (Bian et al., 2017) official MatchZoo 0.748 0.758 A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection, CIKM 2017
RE2 (Yang et al., 2019) official MatchZoo 0.7452 0.7618 Simple and Effective Text Matching with Richer Alignment Features, ACL 2019
GSAMN (Lai et al., 2019) official 0.857 0.872 A Gated Self-attention Memory Network for Answer Selection, EMNLP 2019
TANDA (Garg et al., 2019) official 0.920 0.933 TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection, AAAI 2020

Updated: