Vision model for chat now working #573
w4ffl35
announced in
Announcements
Replies: 1 comment
-
The performance fix I mentioned above has been committed and merged into master. I'll be looking into ways to improve this feature over the next few days. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The vision model, albeit often times incorrect, is now working as expected in the master branch. That is, a worker runs in a separate thread collecting images from your Webcam. The images are run through a vision model which labels them.
The labels are stored in an array. The last 10 labels are given to the LLM chat prompt. In this way, AI Runner can observe you as you use it. This isn't particularly useful beyond novelty at the moment, but the goal is to turn this feature into a meaningful input feed for the prompt so that the chatbot can respond more naturally to the user's facial expressions and environment.
Additionally, in its current form, the vision model slows the LLM because it is creating labels while generating text. I will be working on some modifications which will improve performance in this area.
Beta Was this translation helpful? Give feedback.
All reactions