← home
Github: datasets/car.py

ir_datasets: TREC CAR

Index
  1. car
  2. car/v1.5
  3. car/v1.5/test200
  4. car/v1.5/train/fold0
  5. car/v1.5/train/fold1
  6. car/v1.5/train/fold2
  7. car/v1.5/train/fold3
  8. car/v1.5/train/fold4
  9. car/v1.5/trec-y1
  10. car/v1.5/trec-y1/auto
  11. car/v1.5/trec-y1/manual
  12. car/v2.0

"car"

An ad-hoc passage retrieval collection, constructed from Wikipedia and used as the basis of the TREC Complex Answer Retrieval (CAR) task.


"car/v1.5"

Version 1.5 of the TREC dataset. This version is used for year 1 (2017) of the TREC CAR shared task.

docs
30M docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5") for doc in dataset.docs_iter(): doc # namedtuple<doc_id, text> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5 docs 
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5') # Index car/v1.5 indexer = pt.IterDictIndexer('./indices/car_v1.5', meta={"docno": 40}) index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text']) 

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset dataset = prepare_dataset('irds.car.v1.5') for doc in dataset.iter_documents(): print(doc) # an AdhocDocumentStore break 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Dietz2017Car}

Bibtex:

@article{Dietz2017Car, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }
Metadata

"car/v1.5/test200"

Un-official test set consisting of manually-selected articles. Sometimes used as a validation set.

queries
2.0K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/test200") for query in dataset.queries_iter(): query # namedtuple<query_id, text, title, headings> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/test200 queries 
[query_id]    [text]    [title]    [headings]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/test200') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pipeline(dataset.get_topics('text')) 

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset topics = prepare_dataset('irds.car.v1.5.test200.queries') # AdhocTopics for topic in topics.iter(): print(topic) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
30M docs

Inherits docs from car/v1.5

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/test200") for doc in dataset.docs_iter(): doc # namedtuple<doc_id, text> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/test200 docs 
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/test200') # Index car/v1.5 indexer = pt.IterDictIndexer('./indices/car_v1.5', meta={"docno": 40}) index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text']) 

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset dataset = prepare_dataset('irds.car.v1.5.test200') for doc in dataset.iter_documents(): print(doc) # an AdhocDocumentStore break 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
4.7K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Paragraph appears under heading4.7K100.0%

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/test200") for qrel in dataset.qrels_iter(): qrel # namedtuple<query_id, doc_id, relevance, iteration> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/test200 qrels --format tsv 
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt from pyterrier.measures import * pt.init() dataset = pt.get_dataset('irds:car/v1.5/test200') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pt.Experiment( [pipeline], dataset.get_topics('text'), dataset.get_qrels(), [MAP, nDCG@20] ) 

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset qrels = prepare_dataset('irds.car.v1.5.test200.qrels') # AdhocAssessments for topic_qrels in qrels.iter(): print(topic_qrels) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Nanni2017BenchmarkCar,Dietz2017Car}

Bibtex:

@inproceedings{Nanni2017BenchmarkCar, title={Benchmark for complex answer retrieval}, author={Nanni, Federico and Mitra, Bhaskar and Magnusson, Matt and Dietz, Laura}, booktitle={ICTIR}, year={2017} } @article{Dietz2017Car, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }
Metadata

"car/v1.5/train/fold0"

Fold 0 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

queries
468K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold0") for query in dataset.queries_iter(): query # namedtuple<query_id, text, title, headings> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold0 queries 
[query_id]    [text]    [title]    [headings]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold0') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pipeline(dataset.get_topics('text')) 

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset topics = prepare_dataset('irds.car.v1.5.train.fold0.queries') # AdhocTopics for topic in topics.iter(): print(topic) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
30M docs

Inherits docs from car/v1.5

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold0") for doc in dataset.docs_iter(): doc # namedtuple<doc_id, text> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold0 docs 
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold0') # Index car/v1.5 indexer = pt.IterDictIndexer('./indices/car_v1.5', meta={"docno": 40}) index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text']) 

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset dataset = prepare_dataset('irds.car.v1.5.train.fold0') for doc in dataset.iter_documents(): print(doc) # an AdhocDocumentStore break 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.1M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Paragraph appears under heading1.1M100.0%

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold0") for qrel in dataset.qrels_iter(): qrel # namedtuple<query_id, doc_id, relevance, iteration> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold0 qrels --format tsv 
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt from pyterrier.measures import * pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold0') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pt.Experiment( [pipeline], dataset.get_topics('text'), dataset.get_qrels(), [MAP, nDCG@20] ) 

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset qrels = prepare_dataset('irds.car.v1.5.train.fold0.qrels') # AdhocAssessments for topic_qrels in qrels.iter(): print(topic_qrels) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Dietz2017TrecCar,Dietz2017Car}

Bibtex:

@inproceedings{Dietz2017TrecCar, title={TREC Complex Answer Retrieval Overview.}, author={Dietz, Laura and Verma, Manisha and Radlinski, Filip and Craswell, Nick}, booktitle={TREC}, year={2017} } @article{Dietz2017Car, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }
Metadata

"car/v1.5/train/fold1"

Fold 1 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

queries
467K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold1") for query in dataset.queries_iter(): query # namedtuple<query_id, text, title, headings> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold1 queries 
[query_id]    [text]    [title]    [headings]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold1') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pipeline(dataset.get_topics('text')) 

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset topics = prepare_dataset('irds.car.v1.5.train.fold1.queries') # AdhocTopics for topic in topics.iter(): print(topic) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
30M docs

Inherits docs from car/v1.5

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold1") for doc in dataset.docs_iter(): doc # namedtuple<doc_id, text> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold1 docs 
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold1') # Index car/v1.5 indexer = pt.IterDictIndexer('./indices/car_v1.5', meta={"docno": 40}) index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text']) 

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset dataset = prepare_dataset('irds.car.v1.5.train.fold1') for doc in dataset.iter_documents(): print(doc) # an AdhocDocumentStore break 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.1M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Paragraph appears under heading1.1M100.0%

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold1") for qrel in dataset.qrels_iter(): qrel # namedtuple<query_id, doc_id, relevance, iteration> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold1 qrels --format tsv 
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt from pyterrier.measures import * pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold1') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pt.Experiment( [pipeline], dataset.get_topics('text'), dataset.get_qrels(), [MAP, nDCG@20] ) 

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset qrels = prepare_dataset('irds.car.v1.5.train.fold1.qrels') # AdhocAssessments for topic_qrels in qrels.iter(): print(topic_qrels) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Dietz2017TrecCar,Dietz2017Car}

Bibtex:

@inproceedings{Dietz2017TrecCar, title={TREC Complex Answer Retrieval Overview.}, author={Dietz, Laura and Verma, Manisha and Radlinski, Filip and Craswell, Nick}, booktitle={TREC}, year={2017} } @article{Dietz2017Car, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }
Metadata

"car/v1.5/train/fold2"

Fold 2 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

queries
469K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold2") for query in dataset.queries_iter(): query # namedtuple<query_id, text, title, headings> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold2 queries 
[query_id]    [text]    [title]    [headings]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold2') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pipeline(dataset.get_topics('text')) 

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset topics = prepare_dataset('irds.car.v1.5.train.fold2.queries') # AdhocTopics for topic in topics.iter(): print(topic) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
30M docs

Inherits docs from car/v1.5

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold2") for doc in dataset.docs_iter(): doc # namedtuple<doc_id, text> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold2 docs 
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold2') # Index car/v1.5 indexer = pt.IterDictIndexer('./indices/car_v1.5', meta={"docno": 40}) index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text']) 

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset dataset = prepare_dataset('irds.car.v1.5.train.fold2') for doc in dataset.iter_documents(): print(doc) # an AdhocDocumentStore break 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.1M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Paragraph appears under heading1.1M100.0%

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold2") for qrel in dataset.qrels_iter(): qrel # namedtuple<query_id, doc_id, relevance, iteration> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold2 qrels --format tsv 
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt from pyterrier.measures import * pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold2') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pt.Experiment( [pipeline], dataset.get_topics('text'), dataset.get_qrels(), [MAP, nDCG@20] ) 

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset qrels = prepare_dataset('irds.car.v1.5.train.fold2.qrels') # AdhocAssessments for topic_qrels in qrels.iter(): print(topic_qrels) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Dietz2017TrecCar,Dietz2017Car}

Bibtex:

@inproceedings{Dietz2017TrecCar, title={TREC Complex Answer Retrieval Overview.}, author={Dietz, Laura and Verma, Manisha and Radlinski, Filip and Craswell, Nick}, booktitle={TREC}, year={2017} } @article{Dietz2017Car, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }
Metadata

"car/v1.5/train/fold3"

Fold 3 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

queries
463K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold3") for query in dataset.queries_iter(): query # namedtuple<query_id, text, title, headings> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold3 queries 
[query_id]    [text]    [title]    [headings]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold3') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pipeline(dataset.get_topics('text')) 

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset topics = prepare_dataset('irds.car.v1.5.train.fold3.queries') # AdhocTopics for topic in topics.iter(): print(topic) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
30M docs

Inherits docs from car/v1.5

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold3") for doc in dataset.docs_iter(): doc # namedtuple<doc_id, text> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold3 docs 
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold3') # Index car/v1.5 indexer = pt.IterDictIndexer('./indices/car_v1.5', meta={"docno": 40}) index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text']) 

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset dataset = prepare_dataset('irds.car.v1.5.train.fold3') for doc in dataset.iter_documents(): print(doc) # an AdhocDocumentStore break 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.0M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Paragraph appears under heading1.0M100.0%

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold3") for qrel in dataset.qrels_iter(): qrel # namedtuple<query_id, doc_id, relevance, iteration> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold3 qrels --format tsv 
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt from pyterrier.measures import * pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold3') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pt.Experiment( [pipeline], dataset.get_topics('text'), dataset.get_qrels(), [MAP, nDCG@20] ) 

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset qrels = prepare_dataset('irds.car.v1.5.train.fold3.qrels') # AdhocAssessments for topic_qrels in qrels.iter(): print(topic_qrels) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Dietz2017TrecCar,Dietz2017Car}

Bibtex:

@inproceedings{Dietz2017TrecCar, title={TREC Complex Answer Retrieval Overview.}, author={Dietz, Laura and Verma, Manisha and Radlinski, Filip and Craswell, Nick}, booktitle={TREC}, year={2017} } @article{Dietz2017Car, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }
Metadata

"car/v1.5/train/fold4"

Fold 4 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

queries
469K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold4") for query in dataset.queries_iter(): query # namedtuple<query_id, text, title, headings> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold4 queries 
[query_id]    [text]    [title]    [headings]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold4') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pipeline(dataset.get_topics('text')) 

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset topics = prepare_dataset('irds.car.v1.5.train.fold4.queries') # AdhocTopics for topic in topics.iter(): print(topic) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
30M docs

Inherits docs from car/v1.5

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold4") for doc in dataset.docs_iter(): doc # namedtuple<doc_id, text> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold4 docs 
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold4') # Index car/v1.5 indexer = pt.IterDictIndexer('./indices/car_v1.5', meta={"docno": 40}) index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text']) 

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset dataset = prepare_dataset('irds.car.v1.5.train.fold4') for doc in dataset.iter_documents(): print(doc) # an AdhocDocumentStore break 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.1M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Paragraph appears under heading1.1M100.0%

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/train/fold4") for qrel in dataset.qrels_iter(): qrel # namedtuple<query_id, doc_id, relevance, iteration> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/train/fold4 qrels --format tsv 
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt from pyterrier.measures import * pt.init() dataset = pt.get_dataset('irds:car/v1.5/train/fold4') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pt.Experiment( [pipeline], dataset.get_topics('text'), dataset.get_qrels(), [MAP, nDCG@20] ) 

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset qrels = prepare_dataset('irds.car.v1.5.train.fold4.qrels') # AdhocAssessments for topic_qrels in qrels.iter(): print(topic_qrels) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Dietz2017TrecCar,Dietz2017Car}

Bibtex:

@inproceedings{Dietz2017TrecCar, title={TREC Complex Answer Retrieval Overview.}, author={Dietz, Laura and Verma, Manisha and Radlinski, Filip and Craswell, Nick}, booktitle={TREC}, year={2017} } @article{Dietz2017Car, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }
Metadata

"car/v1.5/trec-y1"

Official test set of TREC CAR 2017 (year 1).

queries
2.3K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/trec-y1") for query in dataset.queries_iter(): query # namedtuple<query_id, text, title, headings> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/trec-y1 queries 
[query_id]    [text]    [title]    [headings]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/trec-y1') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pipeline(dataset.get_topics('text')) 

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset topics = prepare_dataset('irds.car.v1.5.trec-y1.queries') # AdhocTopics for topic in topics.iter(): print(topic) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
30M docs

Inherits docs from car/v1.5

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/trec-y1") for doc in dataset.docs_iter(): doc # namedtuple<doc_id, text> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/trec-y1 docs 
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/trec-y1') # Index car/v1.5 indexer = pt.IterDictIndexer('./indices/car_v1.5', meta={"docno": 40}) index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text']) 

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset dataset = prepare_dataset('irds.car.v1.5.trec-y1') for doc in dataset.iter_documents(): print(doc) # an AdhocDocumentStore break 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Dietz2017TrecCar,Dietz2017Car}

Bibtex:

@inproceedings{Dietz2017TrecCar, title={TREC Complex Answer Retrieval Overview.}, author={Dietz, Laura and Verma, Manisha and Radlinski, Filip and Craswell, Nick}, booktitle={TREC}, year={2017} } @article{Dietz2017Car, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }
Metadata

"car/v1.5/trec-y1/auto"

Official test set of TREC CAR 2017 (year 1), using automatic relevance judgments (assumed from hierarchical structure of pages, i.e., paragraphs under a header are assumed relevant.)

queries
2.3K queries

Inherits queries from car/v1.5/trec-y1

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/trec-y1/auto") for query in dataset.queries_iter(): query # namedtuple<query_id, text, title, headings> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/trec-y1/auto queries 
[query_id]    [text]    [title]    [headings]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/trec-y1/auto') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pipeline(dataset.get_topics('text')) 

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset topics = prepare_dataset('irds.car.v1.5.trec-y1.auto.queries') # AdhocTopics for topic in topics.iter(): print(topic) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
30M docs

Inherits docs from car/v1.5

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/trec-y1/auto") for doc in dataset.docs_iter(): doc # namedtuple<doc_id, text> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/trec-y1/auto docs 
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/trec-y1/auto') # Index car/v1.5 indexer = pt.IterDictIndexer('./indices/car_v1.5', meta={"docno": 40}) index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text']) 

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset dataset = prepare_dataset('irds.car.v1.5.trec-y1.auto') for doc in dataset.iter_documents(): print(doc) # an AdhocDocumentStore break 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
5.8K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Paragraph appears under heading5.8K100.0%

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/trec-y1/auto") for qrel in dataset.qrels_iter(): qrel # namedtuple<query_id, doc_id, relevance, iteration> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/trec-y1/auto qrels --format tsv 
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt from pyterrier.measures import * pt.init() dataset = pt.get_dataset('irds:car/v1.5/trec-y1/auto') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pt.Experiment( [pipeline], dataset.get_topics('text'), dataset.get_qrels(), [MAP, nDCG@20] ) 

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset qrels = prepare_dataset('irds.car.v1.5.trec-y1.auto.qrels') # AdhocAssessments for topic_qrels in qrels.iter(): print(topic_qrels) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Dietz2017TrecCar,Dietz2017Car}

Bibtex:

@inproceedings{Dietz2017TrecCar, title={TREC Complex Answer Retrieval Overview.}, author={Dietz, Laura and Verma, Manisha and Radlinski, Filip and Craswell, Nick}, booktitle={TREC}, year={2017} } @article{Dietz2017Car, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }
Metadata

"car/v1.5/trec-y1/manual"

Official test set of TREC CAR 2017 (year 1), using manual graded relevance judgments.

queries
2.3K queries

Inherits queries from car/v1.5/trec-y1

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/trec-y1/manual") for query in dataset.queries_iter(): query # namedtuple<query_id, text, title, headings> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/trec-y1/manual queries 
[query_id]    [text]    [title]    [headings]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/trec-y1/manual') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pipeline(dataset.get_topics('text')) 

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset topics = prepare_dataset('irds.car.v1.5.trec-y1.manual.queries') # AdhocTopics for topic in topics.iter(): print(topic) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
30M docs

Inherits docs from car/v1.5

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/trec-y1/manual") for doc in dataset.docs_iter(): doc # namedtuple<doc_id, text> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/trec-y1/manual docs 
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v1.5/trec-y1/manual') # Index car/v1.5 indexer = pt.IterDictIndexer('./indices/car_v1.5', meta={"docno": 40}) index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text']) 

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset dataset = prepare_dataset('irds.car.v1.5.trec-y1.manual') for doc in dataset.iter_documents(): print(doc) # an AdhocDocumentStore break 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
30K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
-2Trash42 0.1%
-1NO, non-relevant13K43.2%
0Non-relevant, but roughly on TOPIC9.2K31.2%
1CAN be mentioned3.1K10.5%
2SHOULD be mentioned2.0K6.7%
3MUST be mentioned2.5K8.3%

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v1.5/trec-y1/manual") for qrel in dataset.qrels_iter(): qrel # namedtuple<query_id, doc_id, relevance, iteration> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v1.5/trec-y1/manual qrels --format tsv 
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt from pyterrier.measures import * pt.init() dataset = pt.get_dataset('irds:car/v1.5/trec-y1/manual') index_ref = pt.IndexRef.of('./indices/car_v1.5') # assumes you have already built an index pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25') # (optionally other pipeline components) pt.Experiment( [pipeline], dataset.get_topics('text'), dataset.get_qrels(), [MAP, nDCG@20] ) 

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset qrels = prepare_dataset('irds.car.v1.5.trec-y1.manual.qrels') # AdhocAssessments for topic_qrels in qrels.iter(): print(topic_qrels) # An AdhocTopic 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Dietz2017TrecCar,Dietz2017Car}

Bibtex:

@inproceedings{Dietz2017TrecCar, title={TREC Complex Answer Retrieval Overview.}, author={Dietz, Laura and Verma, Manisha and Radlinski, Filip and Craswell, Nick}, booktitle={TREC}, year={2017} } @article{Dietz2017Car, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }
Metadata

"car/v2.0"

Version 2.0 of the TREC CAR dataset.

docs
30M docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets dataset = ir_datasets.load("car/v2.0") for doc in dataset.docs_iter(): doc # namedtuple<doc_id, text> 

You can find more details about the Python API here.

CLI
ir_datasets export car/v2.0 docs 
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt pt.init() dataset = pt.get_dataset('irds:car/v2.0') # Index car/v2.0 indexer = pt.IterDictIndexer('./indices/car_v2.0', meta={"docno": 40}) index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text']) 

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset dataset = prepare_dataset('irds.car.v2.0') for doc in dataset.iter_documents(): print(doc) # an AdhocDocumentStore break 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Dietz2017Car}

Bibtex:

@article{Dietz2017Car, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }
Metadata