Tables and structured lists on Web pages are a potential source of valuable information which are often missing from state-of-the-art knowledge graphs, for various reasons. Our work is concerned with annotating such relational data sources with semantic information so that it can be queried or otherwise integrated into knowledge graphs such as DBpedia.
Here are examples of highly regular relational data on the Web that are undertandable to humans but cannot be immediately leveraged by computers:
This information could be leveraged for question answering and other applications if annotated with semantic predicates from a known knowledge graph, such as Freebase. Here's one example of an annotated table in Wikipedia.
Our WWW2018 paper is concerned with the specific problem of finding and ranking relations from a given Knowledge Graph (KG) that hold over pairs of entities juxtaposed in a table or structured list.
The state-of-the-art for this task is to attempt to link the entities mentioned in the table cells to objects in the KG and rank the relations that hold for those linked objects. As a result, these methods are hampered by the incompleteness and uneven coverage in even the best knowledge graphs available today.
Our approach for Semantic Table Understanding does not require entity linking. Instead, we on ranking relations using generative language models derived from Web-scale corpora.
In a nutshell, we use the FACC1 dataset, which identifies mentions to Freebase entities in the ClueWeb corpus, to find all relational phrases that are used to express each of the Freebase relations. To predict the relation for any pair of entities, we issue a standard Web search query to gather sentences connecting such entities, and rank the relational language models by their likelihood of generating such sentences.
Our method does not query the knowledge graph at all, and can produce quality results even when the entities in the table are missing in the KG. In the paper we also consider predicting a relation for multiple pairs of entities (i.e., multiple rows of the table).