PyKEEN Logo PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information)

Find us on:

Getting Started

This example shows how to train a model on a data set and test on another data set.

The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption and evaluates with rank-based evaluation.

from pykeen.pipeline import pipeline

results = pipeline(
     model='TransE',
     dataset='nations',
)
hits_at_10 = results.metric_results.get_metric('hits@10')

Full documentation can be found on ReadTheDocs.

Installation

PyKEEN can be installed with pip on Python version 3.6+.

pip install pykeen

More information at https://pykeen.readthedocs.io/en/latest/installation.html.

Citation

PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings.
Ali, M., Berrendorf, M., Hoyt, C. T., Vermue, L., Sharifzadeh, S., Tresp, V., & Lehmann, J. (2020).
arXiv, 2007.14175.

Posts

  • Using Clinical Data to Embed Patients

    The expression of each gene is often measured in groups of patients with a given disease to compare to healthy patients. It is then calculated which genes are higher, lower, or similar to healthy patients. We’ve used these calculations to introduce patients into a biomedical knowledge graph containing genes so we could generate an embedding for each patient using PyKEEN. After, we showed these embeddings are useful for classifying new patients and other downstream ML tasks.

  • Benchmarking Study

    We’ve run an unprecedented large benchmarking study. This image describes the results on the FB15k237 dataset across several knowledge graph embedding models, loss functions, training approaches, and usages of explicit modeling of inverse triples. This is just one of several datasets analyzed in this study. In our manuscript, we also assess the reproducibility of old models’ best reported hyperparameters.

  • Metaresearch Recommendations

    We used PyKEEN to train a scholarly recommendations system to suggest papers to read, grants to apply to, and collaborations to make.

  • Pathway Crosstalk Predictions

    We used PyKEEN to train a pathway crosstalk analysis platform that identifies which biological pathways are connected, giving further insight into normal human pathophysiology and potentially leading to novel hypotheses for understanding the aetiology of complex disease leading to novel drug discovery.

subscribe via RSS