Abstract

Document authors commonly use tables to support arguments presented in the text. But, because tables are usually separate from the main body text, readers must split their attention between different parts of the document. We present an interactive document reader that automatically links document text with corresponding table cells. Readers can select a sentence (or tables cells) and our reader highlights the relevant table cells (or sentences). We provide an automatic pipeline for extracting such references between sentence text and table cells for existing PDF documents that combines structural analysis of tables with natural language processing and rule-based matching. On a test corpus of 330 (sentence, table) pairs, our pipeline correctly extracts 48.8% of the references. An additional 30.5% contain only false negative (FN) errors – the reference is missing table cells. The remaining 20.7% contain false positive (FP) errors – the reference includes extraneous table cells and could therefore mislead readers. A user study finds that despite such errors, our interactive document reader helps readers match sentences with corresponding table cells more accurately and quickly than a baseline document reader.

@inproceedings{kim2018facilitating, author = {Kim, Dae Hyun and Hoque, Enamul and Kim, Juho and Agrawala, Maneesh}, title = {Facilitating Document Reading by Linking Text and Tables}, year = {2018}, isbn = {9781450359481}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3242587.3242617}, doi = {10.1145/3242587.3242617}, abstract = {Document authors commonly use tables to support arguments presented in the text. But, because tables are usually separate from the main body text, readers must split their attention between different parts of the document. We present an interactive document reader that automatically links document text with corresponding table cells. Readers can select a sentence (or tables cells) and our reader highlights the relevant table cells (or sentences). We provide an automatic pipeline for extracting such references between sentence text and table cells for existing PDF documents that combines structural analysis of tables with natural language processing and rule-based matching. On a test corpus of 330 (sentence, table) pairs, our pipeline correctly extracts 48.8\% of the references. An additional 30.5\% contain only false negatives (FN) errors -- the reference is missing table cells. The remaining 20.7\% contain false positives (FP) errors -- the reference includes extraneous table cells and could therefore mislead readers. A user study finds that despite such errors, our interactive document reader helps readers match sentences with corresponding table cells more accurately and quickly than a baseline document reader.}, booktitle = {Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology}, pages = {423–434}, numpages = {12}, keywords = {interactive documents, text analysis, visualization}, location = {Berlin, Germany}, series = {UIST '18} }

Facilitating Document Reading by Linking Text and Tables

Abstract

Paper PDF (5.5MB) | MP4 (87.7MB)

Data ZIP (6.9MB)

Video

Bibtex