Natural Language Inference

Natural Language Inference is the task of determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.

Classic Datasets

Dataset # sentence pair
SNLI 570K
MultiNLI 433K
SciTail 27K
  • SNLI is the short of Stanford Natural Language Inference, which has 570k human annotated sentence pairs. Thre premise data is draw from the captions of the Flickr30k corpus, and the hypothesis data is manually composed.
  • MultiNLI is short of Multi-Genre NLI, which has 433k sentence pairs, whose collection process and task detail are modeled closely to SNLI. The premise data is collected from maximally broad range of genre of American English such as non-fiction genres (SLATE, OUP, GOVERNMENT, VERBATIM, TRAVEL), spoken genres (TELEPHONE, FACE-TO-FACE), less formal written genres (FICTION, LETTERS) and a specialized one for 9/11.
  • SciTail entailment dataset consists of 27k. In contrast to the SNLI and MultiNLI, it was not crowd-sourced but created from sentences that already exist “in the wild”. Hypotheses were created from science questions and the corresponding answer candidates, while relevant web sentences from a large corpus were used as premises.

Performance

SNLI

Model Code Accuracy Paper
Match-LSTM (Wang et al. ,2016) MatchZoo 86.1 Learning Natural Language Inference with LSTM
Decomposable (Parikh et al., 2016) 86.3/86.8(Intra-sentence attention) A Decomposable Attention Model for Natural Language Inference
BiMPM (Wang et al., 2017) official MatchZoo 86.9 Bilateral Multi-Perspective Matching for Natural Language Sentences
Shortcut-Stacked BiLSTM (Nie et al., 2017) official 86.1 Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
ESIM (Chen et al., 2017) official MatchZoo 88.0/88.6(Tree-LSTM) Enhanced LSTM for Natural Language Inference
DIIN (Gong et al., 2018) official MatchZoo 88.0 Natural Language Inference over Interaction Space
SAN (Liu et al., 2018) 88.7 Stochastic Answer Networks for Natural Language Inference
AF-DMN (Duan et al., 2018) 88.6 Attention-Fused Deep Matching Network for Natural Language Inference
MwAN (Tan et al., 2018) 88.3 Multiway Attention Networks for Modeling Sentence Pairs
HBMP (Talman et al., 2018) official MatchZoo 86.6 Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture
CAFE (Tay et al., 2018) 88.5 Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference
DSA (Yoon et al., 2018) 86.8 Dynamic Self-Attention: Computing Attention over Words Dynamically for Sentence Embedding
Enhancing Sentence Embedding with Generalized Pooling (Chen et al., 2018) official 86.6 Enhancing Sentence Embedding with Generalized Pooling
ReSAN (Shen et al., 2018) official 86.3 Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling
DMAN (Pan et al., 2018) 88.8 Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference
DRCN (Kim et al., 2018) 90.1 Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information
RE2 (Yang et al., 2019) official MatchZoo 88.9 Simple and Effective Text Matching with Richer Alignment Features
MT-DNN (Liu et al., 2019) official 91.1(base)/91.6(large) Multi-Task Deep Neural Networks for Natural Language Understanding

MNLI

Model Code Matched Accuracy Mismatched Accuracy Paper
ESIM (Chen et al., 2017) official MatchZoo 76.8 75.8 Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference
Shortcut-Stacked BiLSTM (Nie et al., 2017) official 74.6 73.6 Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
HBMP (Talman et al., 2018) official MatchZoo 73.7 73.0 Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture
Generalized Pooling (Chen et al., 2018) official 73.8 74.0 Enhancing Sentence Embedding with Generalized Pooling
AF-DMN (Duan et al., 2018) 76.9 76.3 Attention-Fused Deep Matching Network for Natural Language Inference
DIIN (Gong et al., 2018) official MatchZoo 78.8 77.8 Natural Language Inference over Interaction Space
SAN (Liu et al., 2018) 79.3 78.7 Stochastic Answer Networks for Natural Language Inference
MwAN (Tan et al., 2018) 78.5 77.7 Multiway Attention Networks for Modeling Sentence Pairs
CAFE (Tay et al., 2018) 78.7 77.9 Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference
DRCN (Kim et al., 2018) 79.1 78.4 Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information
DMAN (Pan et al., 2018) 78.9 78.2 Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference

SciTail

Model Code Accuracy Paper
SAN (Liu et al., 2018) 88.4 Stochastic Answer Networks for Natural Language Inference
HCRN (Tay et al., 2018) 80.0 Hermitian Co-Attention Networks for Text Matching in Asymmetrical Domains
HBMP (Talman et al., 2018) official MatchZoo 86.0 Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture
CAFE (Tay et al., 2018) 83.3 Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference
RE2 (Yang et al., 2019) official MatchZoo 86.0 Simple and Effective Text Matching with Richer Alignment Features
MT-DNN (Liu et al., 2019) official 94.1(base)/95.0(large) Multi-Task Deep Neural Networks for Natural Language Understanding

Updated: