Emgraph

Emgraph

Foreword

Emgraph (Embedding graphs) is a Python library for graph representation learning.

GitHub Repo stars PyPI - Downloads latest release Tensorflow 2

Abstract

Emgraph is an open source Python library that I designed and developed to simplify the process of designing, training, and evaluating graph embedding models for knowledge graph representation learning. Leveraging the power and flexibility of TensorFlow 2 as its backend, Emgraph provides users with a simple yet powerful API that enables them to easily develop, train, and evaluate their own graph embedding models. The library supports CPU/GPU (CUDA cores) and includes multiple well-known models, dataset preprocessors, data loaders, and standard APIs. Emgraph is well-documented and developed using test-driven development (TDD) principles, ensuring its reliability and scalability. Additionally, the library is scalable, making it ideal for a wide range of research and development needs. Whether you’re an academic researcher or a data scientist, Emgraph provides helpful tools. It is worth mentioning that the Emgraph was initially a part of the Bigraph, that later we decided to seperate it from the parent library to keep the API more structured.

Features

  • Support for CPU and GPU (including CUDA cores): The library can be used on both CPU and GPU, allowing for efficient processing of large datasets. It also supports CUDA cores, which can accelerate the computation process even further.
  • Standard API for intuitive interface and easy use: The library comes with a standard API that is both intuitive and easy to use. This allows users to easily access and utilize the library's functionalities easily.
  • Dataset preprocessor for simplifying data preparation: The dataset preprocessor functionality simplifies the process of data preparation by handling tasks such as data cleaning. This saves users time and effort and helps ensure that the data is of high quality.
  • Data loader for efficient data management: The data loader functionality enables users to efficiently manage large datasets by loading data in batches. This helps to reduce memory usage and improves processing speed.
  • Abstraction: The library provides a high level of abstraction, which means that users can work with complex concepts without needing to have a detailed understanding of the underlying implementation. This makes it easier for users to focus on the high-level goals of their project, rather than getting bogged down in the technical details.
  • Open source: The library is open source, which means that anyone can access and modify the code. This makes it easier for users to customize the library to their specific needs and to contribute to the development of the library.
  • Easy-to-use: The library is designed to be easy to use, with a user-friendly interface and clear documentation. This makes it easier for users to get started with the library and to quickly become productive. Additionally, the library's intuitive design means that users can spend less time on technical tasks and more time on the creative aspects of their project.
  • Built using TensorFlow 2 as a backend for compatibility with other machine learning and AI tools: The library is built using TensorFlow 2 as a backend, which ensures compatibility with other commonly used machine learning and AI tools. This makes it easier for users to integrate the library into their existing workflows and pipelines.
  • Well-documented codebase for easy navigation and understanding: The codebase is well-documented, with clear explanations of each function and method. This makes it easy for users to navigate and understand the code, even if they are not familiar with the specific implementation.
  • Test-driven development approach for high code quality and reliability: The library has been developed using a test-driven development approach, which ensures that the code is of high quality and is reliable. This approach involves writing tests for each function and method before writing the code itself, which helps catch bugs and errors early in the development process.
  • Multiple well-known models (e.g. TransE, DistMult, ComplEx): The library comes with a variety of pre-implemented models that have been well-researched and widely used in the field of knowledge graph representation learning. This allows users to easily compare and experiment with different models to find the best fit for their project.

Tech stack and contributions

  • TensorFlow 2
  • Numpy
  • Pandas
  • Scikit learn
  • TensorBoard
  • SQLite
  • driven development (TDD)
  • PyTest
  • Sphinx
  • Python

Installation

$ pip install emgraph


Keywords

Deep-learning, Machine-learning, Graph, Knowledge-graph-embedding, Algorithms, python library