QuALITY

Question Answering with Long Input Texts, Yes!

QuALITY is a multiple-choice question answering dataset with context passages in English that have an average length of about 5,000 tokens. QuALITY is distributed under a CC BY 4.0 License. The dataset can be downloaded from the repo here. For more details about QuALITY, please refer to the paper: Pang et al. (2021).

For submission instructions, please refer to this page.

@article{pang2021quality,
  title={{QuALITY}: Question Answering with Long Input Texts, Yes!},
  author={Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel R.},
  journal={arXiv preprint arXiv:2112.08608},
  year={2021}
}
Leaderboard
  • Rankings are determined by the accuracy on the entire test set.
  • Accuracy = (number of correct answers) / (num of examples).
  • SAT-style score = (number of correct answers - (1/3) * number of incorrect answers + 0 * number of abstained answers) / (number of examples).
Model name Paper Code Accuracy SAT-style score
Test set Hard subset Test set Hard subset
0
2021/12
Human annotators
New York University
93.5 89.1 91.4 85.4
1
2021/12
Baseline model: DeBERTaV3-large with DPR-based extraction, with intermediate training on RACE
New York University
55.4 46.1 40.5 28.1
2
2021/12
Baseline model: RoBERTa-large with DPR-based extraction, with intermediate training on RACE
New York University
51.4 44.7 35.2 26.3
3
2021/12
Baseline model: DeBERTaV3-large with DPR-based extraction
New York University
49.0 41.2 32.0 21.6
4
2021/12
Question-only baseline: DeBERTaV3-large, with intermediate training on RACE
New York University
43.3 38.2 24.4 17.6
5
2021/12
Baseline model: RoBERTa-large with fastText-based extraction
New York University
42.7 35.7 23.6 14.3
6
2021/12
Question-only baseline: DeBERTaV3-large
New York University
39.7 35.2 19.6 13.5
7
2021/12
Baseline model: Longformer, with intermediate training on RACE
New York University
39.5 35.3 19.4 13.8
8
2021/12
Baseline model: Longformer
New York University
30.7 29.3 7.6 5.7