Response retrieval
Response retrieval/selection aims to rank/select a proper response from a dialog repository.
Automatic conversation (AC) aims to create an automatic human-computer dialog process for the purpose of question answering, task completion, and social chat (i.e., chit-chat). In general, AC could be formulated either as an IR problem that aims to rank/select a proper response from a dialog repository or a generation problem that aims to generate an appropriate response with respect to the input utterance. Here, we refer response retrieval as the IR-based way to do AC.
Example:
Classic Datasets
Dataset |
Partition |
#Context Response pair |
#Candidate per Context |
Positive:Negative |
Avg #turns per context |
UDC |
train/validation/test |
1M/500k/500k |
2/10/10 |
1:1/1:9/1:9 |
10.13/10.11/10.11 |
Douban |
train/validation/test |
1M/50k/10k |
2/2/10 |
1:1/1:1/1.18:8.82 |
6.69/6.75/6.45 |
MSDialog |
train/validation/test |
173k/37k/35k |
10/10/10 |
1:9/1:9/1:9 |
5.0/4.9/4.4 |
EDC |
train/validation/test |
1M/10k/10k |
2/2/10 |
1:1/1:1/1:9 |
5.51/5.48/5.64 |
Persona-Chat dataset |
8939/1000/968 |
20/20/20 |
1:19/1:19/1:19 |
7.35/7.80/7.76 |
|
CMUDoG dataset |
2881/196/537 |
20/20/20 |
1:19/1:19/1:19 |
12.55/12.37/12.36 |
|
- Ubuntu Dialog Corpus (UDC) contains multi-turn dialogues collected from chat logs of the Ubuntu Forum. The data set consists of 1 million context-response pairs for training, 0.5 million pairs for validation, and 0.5 million pairs for testing. Positive responses are true responses from humans, and negative ones are randomly sampled. The ratio of the positive and the negative is 1:1 in training, and 1:9 in validation and testing.
- Douban Conversation Corpus is an open domain dataset constructed from Douban group (a popular social networking service in China). The data set consists of 1 million context-response pairs for training, 50k pairs for validation, and 10k pairs for testing, corresponding to 2, 2, and 10 response candidates per context respectively. Response candidates on the test set, retrieved from Sina Weibo (the largest microblogging service in China), are labeled by human judges.
- MSDialog is a labeled dialog dataset of question answering (QA) interactions between information seekers and answer providers from an online forum on Microsoft products (Microsoft Community). The dataset contains more than 2,000 multi-turn information-seeking conversations with 10,000 utterances that are annotated with user intent on the utterance level.
- E-commerce Dialogue Corpus contains over 5 types of conversations (e.g. commodity consultation, logistics express, recommendation, negotiation and chitchat) based on over 20 commodities. The ratio of the positive and the negative
is 1:1 in training and validation, and 1:9 in testing.
$R_n@k$: recall at position $k$ in $n$ candidates.
Ubuntu Corpus
Model |
Code |
MAP |
$R_2@1$ |
$R_{10}@1$ |
$R_{10}@2$ |
$R_{10}@5$ |
Paper |
type |
Multi-View (Zhou et al. 2016) |
N/A |
— |
0.908 |
0.662 |
0.801 |
0.951 |
Multi-view Response Selection for Human-Computer Conversation, ACL 2016 |
multi-turn |
DL2R (Yan, Song and Wu 2016) |
N/A |
— |
0.899 |
0.626 |
0.783 |
0.944 |
Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System, SIGIR 2016 |
multi-turn |
SMN (Wu et al. 2017) |
|
0.7327 |
0.927 |
0.726 |
0.847 |
0.962 |
Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots, ACL 2017 |
Multi-turn |
DAM(Zhou et al. 2018) |
|
— |
0.938 |
0.767 |
0.874 |
0.969 |
Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network, ACL 2018 |
multi-turn |
DUA (Zhang et al. 2018) |
|
— |
— |
0.752 |
0.868 |
0.962 |
Modeling Multi-turn Conversation with Deep Utterance Aggregation, arXiv 2018 |
multi-turn |
DMN (Yang et al. 2018) |
|
0.7719 |
— |
— |
— |
— |
Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems, arXiv 2018 |
multi-turn |
U2U-IMN(Gu et al. 2019 a) |
|
0.866 |
0.945 |
0.790 |
0.886 |
0.973 |
Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019 |
multi-turn |
TripleNet(Ma et al. 2019) |
|
— |
0.943 |
0.79 |
0.885 |
0.97 |
TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots, arXiv 2019 |
multi-turn |
IMN(Gu et al. 2019 b) |
|
— |
0.946 |
0.794 |
0.889 |
0.974 |
Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019 |
multi-turn |
IOI-local(Tao et al. 2019) |
|
— |
0.947 |
0.796 |
0.894 |
0.974 |
One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues, ACL 2019 |
multi-turn |
MSN(Yuan et al. 2019) |
|
— |
— |
0.8 |
0.899 |
0.978 |
Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots, ACL 2019 |
multi-turn |
SA-BERT (Gu et al. 2020) |
|
— |
0.965 |
0.855 |
0.928 |
0.983 |
Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2020 |
multi-turn |
RoBERTaBASE-SS-DA (Lu et al. 2020) |
|
- |
0.955 |
0.826 |
0.909 |
0.978 |
Improving Contextual Language Models for Response Retrieval in Multi-Turn Conversation, SIGIR 2020 |
multi-turn |
SMN + ECMo (Tao et al. 2020) |
N/A |
- |
0.934 |
0.756 |
0.867 |
0.966 |
Improving Matching Models with Hierarchical Contextualized Representations for Multi-turn Response Selection, SIGIR 2020 |
multi-turn |
Douban Conversation Corpus
Model |
Code |
MAP |
MRR |
P@1 |
$R_{10}@1$ |
$R_{10}@2$ |
$R_{10}@5$ |
Paper |
type |
Multi-View (Zhou et al. 2016) |
N/A |
0.505 |
0.543 |
0.342 |
0.202 |
0.350 |
0.729 |
Multi-view Response Selection for Human-Computer Conversation, ACL 2016 |
multi-turn |
DL2R (Yan, Song and Wu 2016) |
N/A |
0.488 |
0.527 |
0.33 |
0.193 |
0.342 |
0.705 |
Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System, SIGIR 2016 |
multi-turn |
SMN (Wu et al. 2017) |
|
0.529 |
0.572 |
0.397 |
0.236 |
0.396 |
0.734 |
Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots, ACL 2017 |
Multi-turn |
DAM(Zhou et al. 2018) |
|
0.55 |
0.601 |
0.427 |
0.254 |
0.410 |
0.757 |
Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network, ACL 2018 |
multi-turn |
DUA (Zhang et al. 2018) |
|
0.551 |
0.599 |
0.421 |
0.243 |
0.421 |
0.780 |
Modeling Multi-turn Conversation with Deep Utterance Aggregation, arXiv 2018 |
multi-turn |
U2U-IMN(Gu et al. 2019 a) |
|
0.564 |
0.611 |
0.429 |
0.259 |
0.43 |
0.791 |
Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019 |
multi-turn |
TripleNet(Ma et al. 2019) |
|
0.564 |
0.618 |
0.447 |
0.268 |
0.426 |
0.778 |
TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots, arXiv 2019 |
multi-turn |
IMN(Gu et al. 2019 b) |
|
0.570 |
0.615 |
0.433 |
0.262 |
0.452 |
0.789 |
Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019 |
multi-turn |
IOI-local(Tao et al. 2019) |
|
0.573 |
0.621 |
0.444 |
0.269 |
0.451 |
0.786 |
One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues, ACL 2019 |
multi-turn |
MSN(Yuan et al. 2019) |
|
0.587 |
0.632 |
0.470 |
0.295 |
0.452 |
0.788 |
Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots, ACL 2019 |
multi-turn |
SA-BERT(Gu et al. 2020) |
|
0.619 |
0.659 |
0.496 |
0.313 |
0.481 |
0.847 |
Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2020 |
multi-turn |
RoBERTaBASE-SS-DA (Lu et al. 2020) |
|
0.602 |
0.646 |
0.460 |
0.280 |
0.495 |
0.847 |
Improving Contextual Language Models for Response Retrieval in Multi-Turn Conversation, SIGIR 2020 |
multi-turn |
SMN + ECMo (Tao et al. 2020) |
N/A |
0.549 |
0.593 |
0.409 |
0.247 |
0.416 |
0.774 |
Improving Matching Models with Hierarchical Contextualized Representations for Multi-turn Response Selection, SIGIR 2020 |
multi-turn |
MSDialog
E-commerce Corpus
Model |
Code |
MAP |
$R_{10}@1$ |
$R_{10}@2$ |
$R_{10}@5$ |
Paper |
type |
Multi-View (Zhou et al. 2016) |
N/A |
— |
0.421 |
0.601 |
0.861 |
Multi-view Response Selection for Human-Computer Conversation, ACL 2016 |
multi-turn |
DL2R (Yan, Song and Wu 2016) |
N/A |
— |
0.399 |
0.571 |
0.842 |
Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System, SIGIR 2016 |
multi-turn |
SMN (Wu et al. 2017) |
|
— |
0.453 |
0.654 |
0.886 |
Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots, ACL 2017 |
Multi-turn |
DAM(Zhou et al. 2018) |
|
— |
0.526 |
0.727 |
0.933 |
Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network, ACL 2018 |
multi-turn |
DUA (Zhang et al. 2018) |
|
— |
0.501 |
0.700 |
0.921 |
Modeling Multi-turn Conversation with Deep Utterance Aggregation, arXiv 2018 |
multi-turn |
U2U-IMN(Gu et al. 2019 a) |
|
0.759 |
0.616 |
0.806 |
0.966 |
Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019 |
multi-turn |
IMN(Gu et al. 2019 b) |
|
— |
0.621 |
0.797 |
0.964 |
Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2019 |
multi-turn |
IOI-local(Tao et al. 2019) |
|
— |
0.563 |
0.768 |
0.950 |
One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues, ACL 2019 |
multi-turn |
MSN(Yuan et al. 2019) |
|
— |
0.606 |
0.770 |
0.937 |
Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots, ACL 2019 |
multi-turn |
SA-BERT(Gu et al. 2020) |
|
— |
0.704 |
0.879 |
0.985 |
Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots, arXiv 2020 |
multi-turn |
RoBERTaBASE-SS-DA (Lu et al. 2020) |
|
- |
0.800 |
0.910 |
0.972 |
Improving Contextual Language Models for Response Retrieval in Multi-Turn Conversation, SIGIR 2020 |
multi-turn |
Persona-Chat dataset
Orinigal Persona
| Model | Code | $R_{20}@1$ | $R_{20}@2$ | $R_{20}@5$ | Paper | type |
| —- | —- | —- | —-| —- | —- | —- |
| RSM-DCK (Hua et al. 2020) | N/A | 0.7965 | 0.9021 | 0.9747 | Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems, CIKM 2020 | multi-turn |
Revised Persona
| Model | Code | $R_{20}@1$ | $R_{20}@2$ | $R_{20}@5$ | Paper | type |
| —- | —- | —- | —-| —- | —- | —- |
| RSM-DCK (Hua et al. 2020) | N/A | 0.7185 | 0.8494 | 0.9550 | Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems, CIKM 2020 | multi-turn |
CMUDoG dataset
| Model | Code | $R_{20}@1$ | $R_{20}@2$ | $R_{20}@5$ | Paper | type |
| —- | —- | —- | —-| —- | —- | —- |
| RSM-DCK (Hua et al. 2020) | N/A | 0.7925 | 0.8884 | 0.9666 | Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems, CIKM 2020 | multi-turn |