A simple and easy to use library for doing LLM inference directly from Delphi. It can load GGUF formatted LLMs into CPU or GPU memory. Uses Vulkan back end for acceleration.
- Download Dllama and extract to a desired location.
- Download a GGUF model from Hugging Face (supported by llama.cpp). I've been testing using Hermes-2-Pro-Mistral-7B-GGUF.
- If you have a Vulkan supported GPU, it will be accelerated for faster inference, otherwise if will use the CPU. You will not be able to use a model larger than your available resources, so take note of the amount of memory that it requires.
- See the examples in
installdir\examples
folder on how to use Dllama in Delphi. Be sure to update theCModelPath
andCModelFilename
constants used by the examples to valid values on your system. - This project was built using Delphi 12.1 (latest), Windows 11 (latest), Intel Core i5-12400F 2500 Mhz 6 Cores, 12 logical, 36GB RAM, NVIDIA RTX 3060 GPU 12GB RAM.
- Please test it and feel free to submit pull requests. I want to make it into something very cool for us Delphi developers.
- If this project is useful to you, consider starring the repo, sponsoring it, spreading the word, etc. Any help is greatly welcomed and appreciated.
A query example:
uses
System.SysUtils,
Dllama.Utils,
Dllama;
const
// update to your model path
CModelPath = 'C:\LLM\gguf';
// update to your model filename
CModelFilename = 'Hermes-2-Pro-Mistral-7B.Q4_0.gguf';
var
LOllama: TOllama;
LUsage: TOllama.Usage;
LResponse: string;
LUsage: TDllama.Usage;
begin
// clear console
Console.Clear();
// display title
Console.PrintLn('Dllama - Query Example'+Console.CRLF, Console.MAGENTA);
LDllama := TDllama.Create();
// set model path
LDllama.SetModelPath(CModelPath);
// try to load model
if LDllama.LoadModel(CModelFilename, 1024) then
begin
// show loaded model filename
Console.ClearLine(Console.WHITE);
Console.Print('Loaded model: "%s"'+Console.CRLF+Console.CRLF, [GetModelFilename()], Console.CYAN);
// add messages
LDllama.AddSystemMessage('You are a helpful AI assistant.');
LDllama.AddUserMessage('How to make KNO3?');
// display user message
Console.Print(LDllama.GetUserMessage(), Console.DARKGREEN);
// do inference
if LDllama.Inference(LResponse, @LUsage) then
begin
// display usage
Console.PrintLn();
Console.PrintLn();
Console.PrintLn('Tokens :: Input: %d, Output: %d, Total: %d',
[LUsage.InputTokens, LUsage.OutputTokens, LUsage.TotalTokens], Console.BRIGHTYELLOW);
Console.PrintLn('Speed :: Input: %3.2f t/s, Output: %3.2f t/s',
[LUsage.TokenInputSpeed, LUsage.TokenOutputSpeed], Console.BRIGHTYELLOW);
end;
// unload model
LDllama.UnloadModel();
end
else
// failed to load model
Console.Print(Console.CRLF+'failed to load model!', Console.RED)
LDllama.Free();
end.
Dllama01.mp4
Our development motto:
- We will not release products that are buggy, incomplete, adding new features over not fixing underlying issues.
- We will strive to fix issues found with our products in a timely manner.
- We will maintain an attitude of quality over quantity for our products.
- We will establish a great rapport with users/customers, with communication, transparency and respect, always encouragingng feedback to help shape the direction of our products.
- We will be decent, fair, remain humble and committed to the craft.