View on GitHub

awesome-neural-models-for-semantic-match

A curated list of papers dedicated to neural text (semantic) matching.

Ad-hoc Information Retrieval


Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection. Searches can be based on full-text or other content-based indexing. Here, the Ad-hoc information retrieval refer in particular to text-based retrieval where documents in the collection remain relative static and new queries are submitted to the system continually (cited from the survey).

The number of queries is huge. Some benchmark datasets are listed in the following,

Classic Datasets

Dataset Genre #Query #Collections
Robust04 news 250 0.5M
ClueWeb09-Cat-B web 150 50M
Gov2 .gov pages 150 25M
MS MARCO (Document Ranking) web pages 367,013 3.2M
MQ2007 .gov pages 1692 25M
MQ2008 .gov pages 794 25M

Neural Models

Robust04

Model Code MAP P@20 nDCG@20 Paper
DSSM MatchZoo 0.095 0.171 0.201 Learning Deep Structured Semantic Models for Web Search using Clickthrough Data, CIKM 2013
CDSSM MatchZoo 0.067 0.125 0.146 Learning Semantic Representations Using Convolutional Neural Networks for Web Search, WWW 2014
ARC-I MatchZoo 0.041 0.065 0.066 Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014
ARC-II MatchZoo 0.067 0.128 0.147 Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014
DRMM official MatchZoo 0.279 0.431 0.382 A Deep Relevance Matching Model for Ad-hoc Retrieval, CIKM 2016
KNRM official MatchZoo 0.352 0.409 End-to-End Neural Ad-hoc Ranking with Kernel Pooling, SIGIR 2017
CONV-KNRM MatchZoo 0.416 Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search, WSDM 2018
BERT-MaxP official 0.469 Deeper Text Understanding for IR with Contextual Neural Language Modeling, SIGIR 2019
CEDR-DRMM official 0.459 0.526 CEDR: Contextualized Embeddings for Document Ranking, SIGIR 2019
QINM official 0.294 0.408 0.453 A Quantum Interference Inspired Neural Matching Model for Ad-hoc Retrieval, SIGIR 2020

ClueWeb09-B

Model Code MAP P@20 nDCG@20 Paper
DSSM MatchZoo 0.054 0.185 0.132 Learning Deep Structured Semantic Models for Web Search using Clickthrough Data, CIKM 2013
CDSSM MatchZoo 0.064 0.214 0.153 Learning Semantic Representations Using Convolutional Neural Networks for Web Search, WWW 2014
ARC-I MatchZoo 0.024 0.089 0.073 Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014
ARC-II MatchZoo 0.033 0.123 0.087 Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014
DRMM officialMatchZoo 0.133 0.365 0.258 A Deep Relevance Matching Model for Ad-hoc Retrieval, CIKM 2016
CONV-KNRM MatchZoo 0.270 Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search, WSDM 2018
BERT-MaxP official 0.289 Deeper Text Understanding for IR with Contextual Neural Language Modeling, SIGIR 2019
QINM official 0.134 0.375 0.338 A Quantum Interference Inspired Neural Matching Model for Ad-hoc Retrieval, SIGIR 2020

MS MARCO (Document Ranking)

Model Code MRR@10 nDCG@10 Recall@10 Paper
MatchPyramid official MatchZoo 0.286 0.344 0.531 Text Matching as Image Recognition, AAAI 2016
Duet official MatchZoo 0.266 0.327 0.520 Learning to Match using Local and Distributed Representations of Text for Web Search, WWW 2017
Co-PACRR official MatchZoo 0.284 0.345 0.543 Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval, WSDM 2018
KNRM official MatchZoo 0.261 0.323 0.519 End-to-End Neural Ad-hoc Ranking with Kernel Pooling, SIGIR 2017
CONV-KNRM MatchZoo 0.283 0.345 0.542 Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search, WSDM 2018
BERT 0.352 0.417 0.623 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019
Transformer-Kernel 0.316 0.380 0.586 Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking, Arxiv 2020

MQ2007

Model Code MAP P@10 nDCG@10 Paper
DSSM MatchZoo 0.409 0.352 0.371 Learning Deep Structured Semantic Models for Web Search using Clickthrough Data, CIKM 2013
CDSSM MatchZoo 0.364 0.291 0.325 Learning Semantic Representations Using Convolutional Neural Networks for Web Search, WWW 2014
ARC-I MatchZoo 0.417 0.364 0.386 Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014
ARC-II MatchZoo 0.421 0.366 0.390 Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014
DRMM officialMatchZoo 0.467 0.388 0.440 A Deep Relevance Matching Model for Ad-hoc Retrieval, CIKM 2016
MatchPyramid official MatchZoo 0.434 0.371 0.409 Text Matching as Image Recognition, AAAI 2016
Duet official MatchZoo 0.474 0.398 0.453 Learning to Match using Local and Distributed Representations of Text for Web Search, WWW 2017
DeepRank official MatchZoo 0.497 0.412 0.482 DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval, CIKM 2017
HiNT official MatchZoo 0.502 0.447 0.490 Modeling Diverse Relevance Patterns in Ad-hoc Retrieval, SIGIR 2018

MQ2008

Model Code MAP P@10 nDCG@10 Paper
DSSM MatchZoo 0.391 0.221 0.178 Learning Deep Structured Semantic Models for Web Search using Clickthrough Data, CIKM 2013
CDSSM MatchZoo 0.395 0.222 0.175 Learning Semantic Representations Using Convolutional Neural Networks for Web Search, WWW 2014
ARC-I MatchZoo 0.424 0.311 0.187 Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014
ARC-II MatchZoo 0.421 0.229 0.181 Convolutional Neural Network Architectures for Matching Natural Language Sentences, NIPS 2014
DRMM officialMatchZoo 0.473 0.245 0.220 A Deep Relevance Matching Model for Ad-hoc Retrieval, CIKM 2016
MatchPyramid official MatchZoo 0.449 0.239 0.211 Text Matching as Image Recognition, AAAI 2016
Duet official MatchZoo 0.476 0.240 0.216 Learning to Match using Local and Distributed Representations of Text for Web Search, WWW 2017
DeepRank official MatchZoo 0.498 0.252 0.240 DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval, CIKM 2017
HiNT officialMatchZoo 0.505 0.255 0.244 Modeling Diverse Relevance Patterns in Ad-hoc Retrieval, SIGIR 2018