Replies: 2 comments 1 reply
-
I've thought of using C++ implementation of those model |
Beta Was this translation helpful? Give feedback.
-
Following from #44 Soon There'll be a released of 4bit Quantization directly from Huggingface itself so maybe we don't really need to do anything? lol |
Beta Was this translation helpful? Give feedback.
-
I have a suggestion for this project to be even better. we just use koboldcpp. because it can run Pygmalion 6b with only 4gb of ram, it can be accelerated with GPU. it took 12 seconds for the reply response. with this AIwaifu can be smarter because it uses the 6b model
link to koboldcpp https://github.com/LostRuins/koboldcpp
I also made a koboldcpp version of AIwaifu https://github.com/andri-jpg/AIwaifu-png
it's just that, I interact with the localhost using selenium, I haven't been able to integrate directly into localhost with python
for the demo, you can see here https://youtu.be/TzU27v9Hf6Q
Beta Was this translation helpful? Give feedback.
All reactions