Knowledge Bases have always been central to both the theory and practice of Artificial Intelligence. Today there are massive knowledge bases used by millions on a daily basis in applications such as search and question answering. Open knowledge bases like DBpedia and Wikidata are the fabric that keeps the Linked Open Data ecosystem alive, enabling scholarly research across all disciplines.

Precisely because of their importance, knowledge bases require constant updating to reflect changes to the world they represent. However, even commercial knowledge bases are hard to maintain by humans alone (unless in narrow domains). For general purpose knowledge bases, maintenance is often done through Relation Extraction (RE), the task of predicting whether any of the relations expressed in some mentioning entities known to the knowledge base.

One way to improve RE is to leverage Knowledge Base Embeddings (KBE), which were originally developed for the task of predicting missing links in the knowledge base. However, despite clear connections between RE and KBE, little has been done toward properly unifying these models in a principled way. We help close the gap with a framework that unifies the learning of RE and KBE models leading to significant improvements over the state-of-the-art in RE.


SOTA RE methods are, effectively, neural classifiers that label each sentence in the corpus with the knowledge graph relation that they express. On the other hand, KBE methods use a variety of techniques to embed entities and relations in a high dimensional space, not always using neural networks. Previous attempts to combine these disparate representations relied on simple additive schemes to combine independent predictions from independently trained models. We showed that such simple strategies can be outperformed by SOTA models alone Xu and Barbosa 2019.

We take a principled approach to unify these two worlds with the following simple architecture, which we call HRERE (Heterogeneous REpresentation for Neural Relation Extraction):

HRERE architecture

HRERE's backbone is a bi-directional long short term memory (LSTM) network with multiple levels of attention to learn representations of text expressing relations. The knowledge representation machinery (ComplEx) nudges the language model to agree with facts already in the KB, indirectly acting to clean the distant supervision data. Joint learning is guided by three loss functions: one for the language representation, another for the knowledge representation, and a third one to ensure these representations do not diverge.


We evaluate our model on the widely used NYT dataset (Riedel et al 2010) by aligning Freebase relations mentioned in the New York Times Corpus. Articles from years 2005-2006 are used for training while articles from 2007 are used for testing. As our KB, we used a Freebase subset with the 3M entities with highest degree (i.e., participating in most relations).

The chart below shows Precision/Recall curves for our base model HRERE-base and various competitive baselines. As we show in the paper, our model can be tuned to performed even better than the base model, achieving the state-of-the-art for the task.

Precision-Recall curve

HERERE is the state-of-the-art at the time of publication!


For more details, code and/or data, please check:

  • Code and data used in the paper
  • P. Xu and D. Barbosa. Connecting language and knowledge with heterogeneous representations for neural relation extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 3201–3206. Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. URL: [Bibtex]
  • P. Xu and D. Barbosa. Investigations on knowledge base embedding for relation prediction and extraction. CoRR, 2018. URL:, arXiv:1802.02114. [Bibtex]