Become a sponsor to Henry Ndubuaku
AI Research Engineer who builds and shares open-source models, particularly through the NanoDL project.
Developing and training transformer-based models is typically resource-intensive and time-consuming and AI/ML experts frequently need to build smaller-scale versions of these models for specific problems. Jax, a low-resource yet powerful framework, accelerates the development of neural networks, but existing resources for transformer development in Jax are limited. NanoDL addresses this challenge with the following features:
-
A wide array of blocks and layers, facilitating the creation of customised transformer models from scratch.
-
An extensive selection of models like LlaMa2, Mistral, Mixtral, GPT3, GPT4 (inferred), T5, Whisper, ViT, Mixers, GAT, CLIP, and more, catering to a variety of tasks and applications.
-
Data-parallel distributed trainers so developers can efficiently train large-scale models on multiple GPUs or TPUs, without the need for manual training loops.
-
Dataloaders, making the process of data handling for Jax/Flax more straightforward and effective.
-
Custom layers not found in Flax/Jax, such as RoPE, GQA, MQA, and SWin attention, allowing for more flexible model development.
-
GPU/TPU-accelerated classical ML models like PCA, KMeans, Regression, Gaussian Processes etc., akin to SciKit Learn on GPU.
-
Modular design so users can blend elements from various models, such as GPT, Mixtral, and LlaMa2, to craft unique hybrid transformer models.
-
A range of advanced algorithms for NLP and computer vision tasks, such as Gaussian Blur, BLEU etc.
-
Each model is contained in a single file with no external dependencies, so the source code can also be easily used.
The name "NanoDL" stands for Nano Deep Learning. Following the success of Phi models, the long-term goal is to build and train nano versions of all available models, while ensuring they compete with the original models in performance, while limiting the number of parameters at 1B. Models are exploding in size, therefore gate-keeping experts and companies with limited resources, there is a need to remedy this. Trained weights will be made available via this library.