PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information)
Find us on:
Getting Started
Train the TransE model on the Nations dataset with:
from pykeen.pipeline import pipeline
results = pipeline(model='TransE', dataset='Nations')
hits_at_10 = results.get_metric('hits@10')
The default training loop uses the stochastic local closed world assumption, uniform negative sampling, and evaluates with rank-based evaluation. How to configure these options, and more, are included in the documentation.
Installation
PyKEEN can be installed with pip
on Python version 3.7+. More details on
Read the Docs.
$ pip install pykeen
History
- October 21, 2021: The PyKEEN large-scale benchmarking paper is accepted for publication in TPAMI.
- March 5, 2021: The PyKEEN 1.0 software paper is accepted for publication in JMLR.
- July 28, 2020: With all the improvements made to support the PyKEEN benchmarking paper, the PyKEEN 1.0 software paper is posted to arXiv.
- June 25, 2020: PyKEEN version 1.0.0 is published to PyPI to complement the large-scale benchmarking paper.
- June 23, 2020: After nearly a year of work, the PyKEEN large-scale benchmarking paper is posted to arXiv.
- August 2019: PyKEEN joins Twitter @keenuniverse.
- April 2019: Max Berrendorf, Laurent Vermue, and Sahand Sharifzadeh join the team.
- February 13, 2019: The PyKEEN biological application manuscript is accepted by Oxford Bioinformatics.
- November 23, 2018: The PyKEEN’s biological application preprint is posted to bioRxiv.
- October 9, 2018: Our first release on PyPI and first build on ReadTheDocs.
- July 2019: Charles Tapley Hoyt and Daniel Domingo-Fernández join the team.
- June 6, 2018: Our first commit on GitHub. Initially, this repository was authored by Mehdi Ali under the Smart Data Analytics organization, but as the project and the team grew, we moved it to its own organization.
Citation
PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph
Embeddings.
Ali, M., Berrendorf, M., Hoyt, C. T., Vermue, L., Sharifzadeh, S., Tresp, V., & Lehmann, J. (2020).
Journal of Machine Learning Research, 22(82), 1–6.
BibTeX Entry:
@article{pykeen2021software,
author = {Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Sharifzadeh, Sahand and Tresp, Volker and Lehmann, Jens},
journal = {Journal of Machine Learning Research},
number = {82},
pages = {1--6},
title = {{PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings}},
url = {https://jmlr.org/papers/v22/20-825.html},
volume = {22},
year = {2021}
}
Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models under a Unified Framework
Ali, M., Berrendorf, M., Hoyt, C.T., Vermue, L., Galkin, M., Sharifzadeh, S., Fischer, A., Tresp, V., & Lehmann, J. (2021).
Transactions on Pattern Analysis and Machine Intelligence, 2021.
BibTeX Entry:
@article{pykeen2021benchmarking,
author = {Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Galkin, Mikhail and Sharifzadeh, Sahand and Fischer, Asja and Tresp, Volker and Lehmann, Jens},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
title = {{Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models under a Unified Framework}},
year = {2021},
pages = {1-1},
doi = {10.1109/TPAMI.2021.3124805},
url = {https://doi.org/10.1109/TPAMI.2021.3124805}
}
Posts
Using Clinical Data to Embed Patients
The expression of each gene is often measured in groups of patients with a given disease to compare to healthy patients. It is then calculated which genes are higher, lower, or similar to healthy patients. We’ve used these calculations to introduce patients into a biomedical knowledge graph containing genes so we could generate an embedding for each patient using PyKEEN. After, we showed these embeddings are useful for classifying new patients and other downstream ML tasks.
Benchmarking Study
We’ve run an unprecedented large benchmarking study. This image describes the results on the FB15k237 dataset across several knowledge graph embedding models, loss functions, training approaches, and usages of explicit modeling of inverse triples. This is just one of several datasets analyzed in this study. In our manuscript, we also assess the reproducibility of old models’ best reported hyperparameters.
Metaresearch Recommendations
We used PyKEEN to train a scholarly recommendations system to suggest papers to read, grants to apply to, and collaborations to make.
Pathway Crosstalk Predictions
We used PyKEEN to train a pathway crosstalk analysis platform that identifies which biological pathways are connected, giving further insight into normal human pathophysiology and potentially leading to novel hypotheses for understanding the aetiology of complex disease leading to novel drug discovery.
subscribe via RSS