Releases: allenai/ir_datasets
Releases · allenai/ir_datasets
 v0.5.1
What's Changed
- [MINOR FIX / TYPO] Update trec-robust04.yaml by @cakiki in #137
 - .z compression support for robust04 by @seanmacavaney in #139
 - moving msmarco-passage scoreddocs around by @seanmacavaney in #142
 - mmarco updates (files hosted elsewhere & new version of some sources) by @seanmacavaney in #145
 - new data available for mmarco (scoreddocs, docpairs, and dev/small) by @seanmacavaney in #146
 - added tripclick/train/hofstaetter-triples by @seanmacavaney in #147
 - additional versions of msmarco-passage triples by @seanmacavaney in #149
 - mMARCO v2 by @seanmacavaney in #150
 - Anchor Text for msmarco-document and msmarco-document-v2 by @seanmacavaney in #155
 - mmarco source files renamed by @seanmacavaney in #153
 - TREC CAsT 2019, 2020 by @seanmacavaney in #156
 - HC4 by @eugene-yang in #158
 - LoTTE dataset by @seanmacavaney in #159
 - kilt by @seanmacavaney in #161
 - some trec 2021 qrels released by @seanmacavaney in #162
 - some trec 2021 qrels released by @seanmacavaney in #171
 - CODEC by @seanmacavaney in #172
 - improved HTML/XML parser, TREC 7 and 8 by @seanmacavaney in #173
 - fixed and tested issue affecting some clueweb lookups by @seanmacavaney in #174
 - cache hc4 topics/qrels by @seanmacavaney in #176
 - wikiclir by @seanmacavaney in #178
 - NeuCLIR Collection 1 (documents and HC4-filtered subset) by @eugene-yang in #179
 - neuMARCO by @seanmacavaney in #181
 
New Contributors
- @cakiki made their first contribution in #137
 - @eugene-yang made their first contribution in #158
 
Full Changelog: v0.5.0...v0.5.1
v0.5.0
New Features:
- Metadata is included for all datasets, including record counts, without needing to download or process the data.
 - New entity type (
qlogs) for query log records 
New datasets:
- argsme & touche (thanks @heinrichreimer!)
 - aol-ia dataset
 - tripclick logs
 - trec-dl-2021 qrels (active participants only for now)
 
Miscellaneous:
- No longer updates root logger instance, allowing other applications to easily cusomise logging output from this package
 - Updates to documentation
 
v0.4.3
Added:
trec-fair-2021/evaltopicsclinicaltrials/2021/trec-ct-2021c4andc4/en-noclean-tr/trec-misinfo-2021wikir/en78kandwikir/ens78kmsmarco-passage-v2/trec-dl-2021andmsmarco-document-v2/trec-dl-2021mr-tydimmarco
Misc:
- some minor changes to 
cleancommand - msmarco-passage-v2 lookups now performed by ID instead of lz4
 - file linking info not shown when downloading small files
 - fixed 
cord19/fulltext - other minor fixes
 
v0.4.2
Adds the following datasets:
- MS MARCO Passage version 2
 - TREC Fair Ranking 2021
 
A few other minor improvements:
- Progress bars: units + totals in a few more places
 - Checks for adequate disk space before big downloads (can be disabled with an environment variable)
 
v0.4.1
- Adds version 2 of the MS MARCO document collection.
 - Using mirror.ir-datsets.com as a fallback for some small files
 - More examples in the documentation (the python API is now joined by the CLI and a PyTerrier example)
 - Improved bibtex, including a master bib file that can be imported papers (e.g., in overleaf).
 - Other minor improvements
 
v0.4.0
New datasets:
- BEIR suite
 - Cranfield
 - CLIRMatrix
 - DPR-W100
 - NQ
 - TREC DL Hard
 - TREC News
 - TripClick
 
Other:
- Download dashboard
 - Improved documentation for non-downloadable datasets
 - A beta "more pythonic API"
 - Speeding up library load time
 - Minor bug fixes, improvements, etc.
 
v0.3.3
dataset migration bugfix
v0.3.2
v0.3.2 version bump
v0.3.1
bump version for release
v0.3.0
slight updates to documentation code, bump version, rebuild docs