Vision model for chat now working #573

w4ffl35 · 2024-02-27T05:37:03Z

w4ffl35
Feb 27, 2024
Maintainer

The vision model, albeit often times incorrect, is now working as expected in the master branch. That is, a worker runs in a separate thread collecting images from your Webcam. The images are run through a vision model which labels them.
The labels are stored in an array. The last 10 labels are given to the LLM chat prompt. In this way, AI Runner can observe you as you use it. This isn't particularly useful beyond novelty at the moment, but the goal is to turn this feature into a meaningful input feed for the prompt so that the chatbot can respond more naturally to the user's facial expressions and environment.

Additionally, in its current form, the vision model slows the LLM because it is creating labels while generating text. I will be working on some modifications which will improve performance in this area.

w4ffl35 · 2024-02-27T15:26:06Z

w4ffl35
Feb 27, 2024
Maintainer Author

The performance fix I mentioned above has been committed and merged into master. I'll be looking into ways to improve this feature over the next few days.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAPSIZE

Vision model for chat now working #573

{{title}}

Replies: 1 comment

{{title}}

Select a reply

CAPSIZE

Vision model for chat now working #573

w4ffl35 Feb 27, 2024 Maintainer

Replies: 1 comment

w4ffl35 Feb 27, 2024 Maintainer Author

w4ffl35
Feb 27, 2024
Maintainer

w4ffl35
Feb 27, 2024
Maintainer Author