Skip to content

Latest commit

 

History

History
67 lines (45 loc) · 5.27 KB

README.md

File metadata and controls

67 lines (45 loc) · 5.27 KB

Open-source LLM for simplifying and understanding the CNCF ecosystem.

License: MIT Kubernetes Version GitHub language count GitHub last commit GitHub issues

📖 About

Deep Cloud Native Computing Foundation, or DeepCNCF for short, is an open-source LLM aimed to simplify the Cloud Native ecosystem by resolving information overload and fragmentation within the CNCF landscape. The aim is to effortlessly provide users with detailed, context-related answers to any CNCF project.

It was developed as part of the AMOS project. Our industry partner is Kubermatic. The project consists of a pipeline to gather necessary information (including documentation, pdfs, yaml files, jsons, readmes, and corresponding StackOverflow question/answer pairs) about CNCF Landscape projects, create a question/answer pair dataset from the collected data using Google Gemma, merge it with gathered StackOverflow question/answer pairs, and finetune the Google Gemma 2B IT model, Google Gemma 7B IT model, and Google Gemma-2 9B IT using the gathered data.

🚀 Features

📊 Datasets

🤖 Models

📁 Folder Structure

  • Deliverables Contains all AMOS specific homeworks referenced with the sprint number they were due to.
  • Documentation Contains the documentation on how to run the project
  • src Contains all the sourcecode of the project.
    • src/hpc_scripts Contains sricpts that were specifically tailored to run on the HPC (High Performance Cluster) of the FAU. This is mostly for interacting with LLM's
    • src/scripts Contains all general purpose scripts (i.e. scraping data from CNCF Landscape and Stackoverflow, data formatting, deploying the model)
    • src/landscape_scraper Contains scripts for scraping the webpages of the CNCF landscape.
  • test Contains all unit tests and integration tests.

🤔 Getting Started

If you want to run the data gathering and training pipelines yourself or if you want to use them to gather your own data, follow the steps provided in the Documentation

Additional information can be found in the Wiki