From 5a92873f5f648cdbd6db3aadd90eeb61e3071c5c Mon Sep 17 00:00:00 2001 From: LPM Date: Wed, 13 Nov 2024 10:43:18 +0100 Subject: [PATCH] add execution section in README.md removed TODO as duplicate to paper --- README.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index bb8238a..c203c7e 100644 --- a/README.md +++ b/README.md @@ -15,11 +15,16 @@ Right now, there are two lists of models to choose from (feel free to customize) The first one contains models with less than ten billion parameters plus Microsoft's `Phi-3-Medium`. The second one contains model with up to 34 billion parameters as this was the physical limit that our hardware could handle. -## TODO +## Execution + +Requirements: +* python +* transformers +* capable GPU + +1. Edit the `config.yaml` to your liking +2. run `python pipeline.py` + +The script generates folders for each step with the results. -- Find GPU clusters that support models with more than 34B params -- Include more models, create a curated list of well performing models -- Use the output of the pipeline for fine tuning of a much smaller model and evaluate -- Add a method to handle really large knowledge graphs (e.g. subsampling and splitting into multiple chunks that fit the context size) -- Add a GUI for model selection and progress monitoring (low priority)