Charts and graphs are ubiquitous forms of data representations, appearing in scientific papers, textbooks, reports, news articles and webpages. These visualizations leverage human visual processing to efficiently convey large amounts of quantitative information, and to illustrate trends and differences in the data. But, while people can easily interpret data from charts and graphs, machines cannot directly access this data. Today, a vast trove of information is locked inside data visualizations. In this project, we develop tools that allow machines to extract, data and structure from such visualizations and thereby enable data analysis, reuse and new forms of indexing across the collection of existing charts and graphs. All together the tools will provide a novel computational infrastructure for knowledge integration and sharing and impact a broad range of users including scientists, journalists, economists, social scientists, and educators.
Specifically, the project addresses three main goals. First, it develops computational models for interpreting visualizations to extract the underlying data, graphical marks, and mappings that relate the data to mark attributes. The approach is informed by recent work on human perception and cognition of visualizations. The aim is to build generalized computational models that can accurately extract data from visualizations and also mimic the way people decode information from visualizations. Second, it supports development of a suite of applications that enable analysis and repurposing of visualizations and data. Third, it applies automated visualization interpretation techniques at Internet scale and develops a search engine that indexes visualizations based on their underlying data and graphical structure. The search engine will accelerate data-driven analysis and discovery by facilitating browsing and retrieval of data that is currently locked in computationally inaccessible visualizations.