-
Notifications
You must be signed in to change notification settings - Fork 2
Design Documentation
The “Ailixir” project is a framework that utilises Large Language Models (LLMs) to develop customised AI agents tailored to various domains (e.g., medical, business, biology). These agents can communicate via text and voice, adapt to user input and refine interactions over time. They are based on handpicked content sources (e.g. youtube videos, scientific papers), from which information is retrieved to customise the LLM.
The user provides a set of input sources and/or a set of user-specific parameters to configure the data pipeline. The actual information from these input sources (e.g. youtube video, scientific paper, ...) will then be retrieved in the data pipeline components to later feed the customised AI agent. When the agent has been generated, a user can have a conversation with the agent through a chat or voice interface. The app frontend has not been implemented this far into the project.
The Config component provides functionality to define sources to scrape information from (also called scraping targets). It currently supports:
- Youtube channels (retrieve information from all videos on the channel)
- Podcasts from the Peter Attia Podcast
- Articles from arXiv and PubMed
- Recipes from allrecipes
- Nutrition related blog posts from NutritionFacts
Furthermore, the component is used to set up a local database to store the information scraped from the input sources.
The Orchestrator component manages the information retrieval process by scheduling scraping jobs and making sure that no source is scraped twice (storing an id for each scraped source). This component will be run after the input config is set up in order to perform the data acquisition.
For each type of input source (as detailed in the Input Config section), there exists a scraper. Scrapers are responsible for retrieving information - transcript and metadata - from a provided source. Metadata includes for instance publication date, authors, etc.. Once a source was scraped, the data is saved in the local database and an id is stored for the specific source, in order to avoid scraping from it again.
The RAG component takes the scraped information stored in the local database and creates embeddings based on them. This context data is stored in a vector database which will be used by the LLM. More detailed information on RAG can for example be found here.
We use the LangChain framework to generate answers from a user prompt, the generated context data and the LLM. This component is currently still being worked on.
We use the computing resources of the FAU HPC to scrape data and store it. This data will be fed to LLMs later in the development. We are adapting a Google Cloud solution to work with our embeddings database for the RAG Generator which is currently being implemented.
- The Authentication Module interacts with the Authentication Server to verify user credentials and manage sessions.
- Users interact with the React Native app, triggering actions like querying the RAG system.
- The Frontend (Networking Layer) sends requests to the Backend via HTTP.
- Data Processing:
- The LangChain Processor first interacts with the Retrieval Engine to fetch relevant documents from the Document Store.
- A new prompt will be created with the fetched documents and passed to the LLM.
- (The response and possibly the retrieved documents are stored in Storage as needed.)
- (Processed data and model outputs are stored in the Storage.)
- Generated responses are sent back to the frontend via HTTP.