Skip to content

Open Source codebase of SIGMOID, the Scalable Infrastructure for Generic Model Optimization on Inhomogeneous Datasets

License

Notifications You must be signed in to change notification settings

mindsdb/open-sigmoid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

open-sigmoid

Open Source codebase of SIGMOID, the Scalable Infrastructure for Generic Model Optimization on Inhomogeneous Datasets.

Description

SIGMOID stands for Scalable Infrastructure for Generic Model Optimization on Inhomogeneous Datasets. It is an infrastructure in the sense that is is not a single computer program but rather a collection of them. The main goal of sigmoid is to provide scalabitility to an already existing model. In short, this means

  • Making it possible to train a arbitrary model using as much data as possible without changing the model at all.
  • Provide the output product in a form-factor that suits large-scale HPC compute infrastructure.
  • Accomplish the above with zero Human intervention.

High-level overview

Data-driven model scaling

A key distinction between sigmoid and already existing solutions is that sigmoid relies on the training data itself to provide scalability. We call this method "data-driven model scaling" (D2MS).

sigmoid attempts to achieve D2MS by combining self-supervised Deep Learning methods and unsupervised clustering algorithms to detect underlying data partitions in the dataset; loosely speaking, a partition is a subset of the data where every all elements are similar to one another.

sigmoid then trains an arbitrary number of models in a way that makes every model become specialized (fine-tuned) for data coming from one particular partition. This way, no instance of the model gets to "see" the entire dataset.

Finally, after the training process, sigmoid provides the user with a "pool" of models (the specialists) and a "routing" model (a switch). Inference then comes down to feeding new data to the switch, which redirects the data to the respective specialist to perform the actual inference.

High level flow-diagram of sigmoid

Installation

sigmoid is written in Python, so to install it from source need a Python Environment (recommended to use pyenv) and poetry.

pip install poetry
poetry install --only main

About

Open Source codebase of SIGMOID, the Scalable Infrastructure for Generic Model Optimization on Inhomogeneous Datasets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages