Natural Language Inference
Natural Language Inference is the task of determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.
Classic Datasets
- SNLI is the short of Stanford Natural Language Inference, which has 570k human annotated sentence pairs. Thre premise data is draw from the captions of the Flickr30k corpus, and the hypothesis data is manually composed.
- MultiNLI is short of Multi-Genre NLI, which has 433k sentence pairs, whose collection process and task detail are modeled closely to SNLI. The premise data is collected from maximally broad range of genre of American English such as non-fiction genres (SLATE, OUP, GOVERNMENT, VERBATIM, TRAVEL), spoken genres (TELEPHONE, FACE-TO-FACE), less formal written genres (FICTION, LETTERS) and a specialized one for 9/11.
- SciTail entailment dataset consists of 27k. In contrast to the SNLI and MultiNLI, it was not crowd-sourced but created from sentences that already exist “in the wild”. Hypotheses were created from science questions and the corresponding answer candidates, while relevant web sentences from a large corpus were used as premises.
SNLI
Model |
Code |
Accuracy |
Paper |
Match-LSTM (Wang et al. ,2016) |
|
86.1 |
Learning Natural Language Inference with LSTM |
Decomposable (Parikh et al., 2016) |
— |
86.3/86.8(Intra-sentence attention) |
A Decomposable Attention Model for Natural Language Inference |
BiMPM (Wang et al., 2017) |
|
86.9 |
Bilateral Multi-Perspective Matching for Natural Language Sentences |
Shortcut-Stacked BiLSTM (Nie et al., 2017) |
|
86.1 |
Shortcut-Stacked Sentence Encoders for Multi-Domain Inference |
ESIM (Chen et al., 2017) |
|
88.0/88.6(Tree-LSTM) |
Enhanced LSTM for Natural Language Inference |
DIIN (Gong et al., 2018) |
|
88.0 |
Natural Language Inference over Interaction Space |
SAN (Liu et al., 2018) |
— |
88.7 |
Stochastic Answer Networks for Natural Language Inference |
AF-DMN (Duan et al., 2018) |
— |
88.6 |
Attention-Fused Deep Matching Network for Natural Language Inference |
MwAN (Tan et al., 2018) |
— |
88.3 |
Multiway Attention Networks for Modeling Sentence Pairs |
HBMP (Talman et al., 2018) |
|
86.6 |
Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture |
CAFE (Tay et al., 2018) |
— |
88.5 |
Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference |
DSA (Yoon et al., 2018) |
— |
86.8 |
Dynamic Self-Attention: Computing Attention over Words Dynamically for Sentence Embedding |
Enhancing Sentence Embedding with Generalized Pooling (Chen et al., 2018) |
|
86.6 |
Enhancing Sentence Embedding with Generalized Pooling |
ReSAN (Shen et al., 2018) |
|
86.3 |
Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling |
DMAN (Pan et al., 2018) |
— |
88.8 |
Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference |
DRCN (Kim et al., 2018) |
— |
90.1 |
Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information |
RE2 (Yang et al., 2019) |
|
88.9 |
Simple and Effective Text Matching with Richer Alignment Features |
MT-DNN (Liu et al., 2019) |
|
91.1(base)/91.6(large) |
Multi-Task Deep Neural Networks for Natural Language Understanding |
MNLI
Model |
Code |
Matched Accuracy |
Mismatched Accuracy |
Paper |
ESIM (Chen et al., 2017) |
|
76.8 |
75.8 |
Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference |
Shortcut-Stacked BiLSTM (Nie et al., 2017) |
|
74.6 |
73.6 |
Shortcut-Stacked Sentence Encoders for Multi-Domain Inference |
HBMP (Talman et al., 2018) |
|
73.7 |
73.0 |
Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture |
Generalized Pooling (Chen et al., 2018) |
|
73.8 |
74.0 |
Enhancing Sentence Embedding with Generalized Pooling |
AF-DMN (Duan et al., 2018) |
— |
76.9 |
76.3 |
Attention-Fused Deep Matching Network for Natural Language Inference |
DIIN (Gong et al., 2018) |
|
78.8 |
77.8 |
Natural Language Inference over Interaction Space |
SAN (Liu et al., 2018) |
— |
79.3 |
78.7 |
Stochastic Answer Networks for Natural Language Inference |
MwAN (Tan et al., 2018) |
— |
78.5 |
77.7 |
Multiway Attention Networks for Modeling Sentence Pairs |
CAFE (Tay et al., 2018) |
— |
78.7 |
77.9 |
Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference |
DRCN (Kim et al., 2018) |
— |
79.1 |
78.4 |
Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information |
DMAN (Pan et al., 2018) |
— |
78.9 |
78.2 |
Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference |
SciTail