Build, user, and technical documentation Software architecture description
- Python3
Install the requirements by running the following command from the root file
pip install -r requirements.txt
To create a yaml file with the CNCF Lanscape follow this instructions:
-
Poblate with repository data
-
You will need a Github token to access the API. Refer to Creating a personal access token. Copy and paste it in the appropiate location in the script landscape_explorer.py replacing "test_token".
-
Go to the folder src/scripts
cd src/scripts
-
Execute landscape_explorer.py
python landscape_explorer.py
-
-
Poblate with scraped data from websites
- Go to the folder src/landscape_scraper and execute
scrapy crawl docs -O output.json
- Go to the folder src/scripts and execute:
python augment_landscape.py
- The desired landscape_augmented_repos_websites.yml will be in the sources folder
The 'run_all.sh' script automates environment setup, ETL processes, and Q&A generation tasks.
- Environment Variables: Create a
.env
file in the root directory with the following content:
GITHUB_TOKEN=<YOUR_GITHUB_TOKEN>
HF_TOKEN=<YOUR_HUGGING_FACE_TOKEN>
Replace '<YOUR_GITHUB_TOKEN>' with your GitHub token obtained as described earlier, and '<YOUR_HUGGING_FACE_TOKEN>' with your Hugging Face token, which can be found at (https://huggingface.co/settings/tokens)
- Execute from Root Directory: Run the script from the root directory of your project.
./script.sh [etl] [qa] <data_set_id>
This command executes the ETL process, uploading the output to the specified dataset:
./script.sh SuperOrganization/WorldDataset
You can find a jupyter notebook that you can use to train using Google Colab or, if you have the resources, locally, in
src/scripts/training/initial_colab_training.ipynb
.
Additionally, if you want to train on a server, you can find necessary scripts in
src/hpc_scripts
. Copy this directory and then follow the instructions below.
To execute an example training script, run
./training_job.sbatch
in
src/hpc_scripts/training
This will start
src/hpc_scripts/training/model_training.py.
The hyperparameters were found using hyperparameter tuning, they might need to get changed to your specific use case.
If you want to use the model with Local-ai, run local-ai in a docker container, using a docker image provided by local-ai from docker hub. You also need to pass a model configuration file to the docker container to tell local-ai which model to implement. All necessary commands are provided in
src/scripts/GUI/preparation_scripts.sh
Note
If you want to use a GPU with local-ai, you need to:
- Install Nvidia driver and cuda toolkit.
- Install Nvidia container toolkit.
- Pull and run local-ai image from docker hub. You can find all necessary commands in
src/scripts/GUI/preparation_scripts.sh
aswell.