Natural Language Inference

Natural Language Inference is the task of determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.

Classic Datasets

Dataset	# sentence pair
SNLI	570K
MultiNLI	433K
SciTail	27K

SNLI is the short of Stanford Natural Language Inference, which has 570k human annotated sentence pairs. Thre premise data is draw from the captions of the Flickr30k corpus, and the hypothesis data is manually composed.
MultiNLI is short of Multi-Genre NLI, which has 433k sentence pairs, whose collection process and task detail are modeled closely to SNLI. The premise data is collected from maximally broad range of genre of American English such as non-fiction genres (SLATE, OUP, GOVERNMENT, VERBATIM, TRAVEL), spoken genres (TELEPHONE, FACE-TO-FACE), less formal written genres (FICTION, LETTERS) and a specialized one for 9/11.
SciTail entailment dataset consists of 27k. In contrast to the SNLI and MultiNLI, it was not crowd-sourced but created from sentences that already exist “in the wild”. Hypotheses were created from science questions and the corresponding answer candidates, while relevant web sentences from a large corpus were used as premises.

Performance

SNLI

Model	Code	Accuracy	Paper
Match-LSTM (Wang et al. ,2016)		86.1	Learning Natural Language Inference with LSTM
Decomposable (Parikh et al., 2016)	—	86.3/86.8(Intra-sentence attention)	A Decomposable Attention Model for Natural Language Inference
BiMPM (Wang et al., 2017)		86.9	Bilateral Multi-Perspective Matching for Natural Language Sentences
Shortcut-Stacked BiLSTM (Nie et al., 2017)		86.1	Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
ESIM (Chen et al., 2017)		88.0/88.6(Tree-LSTM)	Enhanced LSTM for Natural Language Inference
DIIN (Gong et al., 2018)		88.0	Natural Language Inference over Interaction Space
SAN (Liu et al., 2018)	—	88.7	Stochastic Answer Networks for Natural Language Inference
AF-DMN (Duan et al., 2018)	—	88.6	Attention-Fused Deep Matching Network for Natural Language Inference
MwAN (Tan et al., 2018)	—	88.3	Multiway Attention Networks for Modeling Sentence Pairs
HBMP (Talman et al., 2018)		86.6	Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture
CAFE (Tay et al., 2018)	—	88.5	Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference
DSA (Yoon et al., 2018)	—	86.8	Dynamic Self-Attention: Computing Attention over Words Dynamically for Sentence Embedding
Enhancing Sentence Embedding with Generalized Pooling (Chen et al., 2018)		86.6	Enhancing Sentence Embedding with Generalized Pooling
ReSAN (Shen et al., 2018)		86.3	Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling
DMAN (Pan et al., 2018)	—	88.8	Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference
DRCN (Kim et al., 2018)	—	90.1	Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information
RE2 (Yang et al., 2019)		88.9	Simple and Effective Text Matching with Richer Alignment Features
MT-DNN (Liu et al., 2019)		91.1(base)/91.6(large)	Multi-Task Deep Neural Networks for Natural Language Understanding

MNLI

Model	Code	Matched Accuracy	Mismatched Accuracy	Paper
ESIM (Chen et al., 2017)		76.8	75.8	Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference
Shortcut-Stacked BiLSTM (Nie et al., 2017)		74.6	73.6	Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
HBMP (Talman et al., 2018)		73.7	73.0	Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture
Generalized Pooling (Chen et al., 2018)		73.8	74.0	Enhancing Sentence Embedding with Generalized Pooling
AF-DMN (Duan et al., 2018)	—	76.9	76.3	Attention-Fused Deep Matching Network for Natural Language Inference
DIIN (Gong et al., 2018)		78.8	77.8	Natural Language Inference over Interaction Space
SAN (Liu et al., 2018)	—	79.3	78.7	Stochastic Answer Networks for Natural Language Inference
MwAN (Tan et al., 2018)	—	78.5	77.7	Multiway Attention Networks for Modeling Sentence Pairs
CAFE (Tay et al., 2018)	—	78.7	77.9	Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference
DRCN (Kim et al., 2018)	—	79.1	78.4	Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information
DMAN (Pan et al., 2018)	—	78.9	78.2	Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference

SciTail

Model	Code	Accuracy	Paper
SAN (Liu et al., 2018)	—	88.4	Stochastic Answer Networks for Natural Language Inference
HCRN (Tay et al., 2018)	—	80.0	Hermitian Co-Attention Networks for Text Matching in Asymmetrical Domains
HBMP (Talman et al., 2018)		86.0	Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture
CAFE (Tay et al., 2018)	—	83.3	Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference
RE2 (Yang et al., 2019)		86.0	Simple and Effective Text Matching with Richer Alignment Features
MT-DNN (Liu et al., 2019)		94.1(base)/95.0(large)	Multi-Task Deep Neural Networks for Natural Language Understanding