Important
🎯 DataHack Summit 2024 | 📆 August 10 2024 | 📍 Bengaluru, India
Explore this comprehensive repository on LLMs, covering everything from the basics of NLP to fine-tuning and even RLHF. If you find the resources helpful, consider giving it a star ⭐ to show your support and help others discover it.
- Overview of Generative AI and the basics of language modeling.
- ⭐ Hands-On:
- Getting Started: Text Representation
- Language Modeling Basics and Text Generation using a basic LM.
- Transformer Architectures: Detailed look into the Transformer architecture that powers modern LLMs.
- GPT Series of Models: Overview of the evolution of GPT models.
- Evaluation Metrics and Benchmarks: Methods to evaluate and benchmark LLM performance.
- ⭐ Hands-On: Training a mini Transformer model and experimenting with GPT-2 for text generation.
- Training Process and Scaling Laws: Understand how LLMs are trained and the laws governing their scaling.
- PEFT: Learn Parameter-Efficient Fine-Tuning methods.
- LoRA: Introduction to Low-Rank Adaptation.
- Instruction Tuning: Techniques for fine-tuning models using instructions.
- RLHF: Reinforcement Learning from Human Feedback and its applications.
- ⭐ Hands-On:
- Instruction Tuning: Text 2 SQL using LLaMA3.1
- RLHF Hands-on: Sentiment aligment for generating movie reviews
- Prompt Engineering: Crafting effective prompts to get desired outputs.
- Prompt Hacking and Backdoors
- Vector Databases: Using vector databases for efficient data retrieval.
- RAGs: Techniques for retrieval-augmented generation.
- Beyond Prompting: Understanding Frameworks such as DSPY
- ⭐ Hands-On:
- Implementing basic prompt engineering techniques and
- Building a simple RAG system.
- Handson with DSPY
- Next Steps: Speculative topics on future advancements.
- Beyond: Future possibilities and directions for LLM research.
- Basics/hands-on experience of working with python
- Basic understanding of linear algebra and machine larning
- Basic understanding of Deep Neural Networks
- Basics/hands-on experience with pytorch
- Access to google-colab or similar python environment
- Access to chatGPT or Google-Bard (free access)
Important
- Follow Step by Step for a quick setup. This should work as-is for Mac/Linux based systems.
- If you already have your own way of managing dependencies, checkout pyproject.toml for poetry or requirements.txt for pip based systems
- The requirements.txt file is generated using the command
poetry export --without-hashes --format=requirements.txt > requirements.txt
-
We will make use of :
pyenv
for python version managementvirtualenv
for virtual environment managementpoetry
for dependency management
-
Pyenv:
brew install pyenv
orcurl https://pyenv.run | bash
-
VirtualEnv:
- install:
brew install pyenv-virtualenv
orgit clone https://github.com/pyenv/pyenv-virtualenv.git $(pyenv root)/plugins/pyenv-virtualenv
- add this to your .rc file:
eval "$(pyenv virtualenv-init -)"
- install:
-
Poetry:
- install:
curl -sSL https://install.python-poetry.org | python3 -
or- or check here
- install:
-
Setup:
- Local Mac/Linux :If you have
make
available, simply execute:make setup
otherwise: - RunPod or other Similar Providers: simply execute:
make runpod_setup
otherwise: - If you are using other ways of dependency management:
- Python Environment:
-
pyenv install 3.11.9
pyenv virtualenv 3.11.9 datahack
cd <path to this repo clone>
pyenv activate datahack
poetry install
<- Make surepyproject.toml
file is available in directory you execute this command OR- use the
requirements.txt
file for reference.
- Setup
nvm
/node
and installlocaltunnel
- Python Environment:
-
- Local Mac/Linux :If you have