Meet HybridQA!

A large-scale multi-hop question answering dataset over heterogenesous information of both structured tabular and unstructured textual forms.

Why Hybrid Question Answering?



HIGH-QUALITY

Mechanical Turk;
Strict Quality Control



LARGE-SCALE

13k Wikipedia Tables;
293K hyperlinked passages;
70K natural questions.



HYRBID

Semantic Understanding;
Symbolic Reasoning.



Open-Domain

Reasoning over open domain Wikitables


Explore


We have designed an interface for you to view the data, please click here to explore the dataset and have fun!

Example


In the task, you are given a Wikipedia table with its hyperlinked passages, the goal is to answer a multi-hop question which involves informatino from both information forms (structured and unstructured data):

Download (Train/Test Data, Code)


All the code and data are provided in github. The leaderboard is hosted in codalab

Reasoning Types


The questions require multi-hops between two information forms, the most typic reasoning patterns are demonstrated as follows:

Paper


Please cite our paper as below if you use the Hybrid dataset.

@article{chen2020hybridqa,
  title={HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data},
  author={Chen, Wenhu and Zha, Hanwen and Chen, Zhiyu and Xiong, Wenhan and Wang, Hong and Wang, William},
  journal={Findings of EMNLP 2020},
  year={2020}
}
      

Contact



Have any questions or suggestions? Feel free to contact us!