-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement modern Image recognition model(s) #986
Comments
I see the models haven't been updated for some years now, that would be really great to be able to implement some new ones! As far as I understand this app uses tensorflow-js to run models. If someone has any suggestion, would be great! I've found this kinda leaderboard on image classification : https://paperswithcode.com/sota/image-classification-on-imagenet |
FYI I did some tests with the latest https://github.com/vikhyat/moondream , it's a really impressive model for it's size (1.8B parameters), but unfortunately it's a pretty general vision LLM, thus not made for image classification specifically. I've tried to pass a list of keywords in the input but I didn't have great results, I guess long text inputs aren't well handled by such a small LLM. Anyway I'm not sure it's even possible to convert it to tensorflow model, but seeing the progress those kind of models have had recently the current Recognize model really feels old now :/ |
I agree. Nextcloud GmbH is unlikely to spend efforts on revamping the models, sadly. I'm more than happy to guide any contributor willing to commit to implementing new stuff. (A good thing to work on would be #73 ) |
That could be interesting, didn't notice this feature proposal before! |
Kinda related, in recent news Mozilla is releasing an image recognition model for generating images alt text: https://hacks.mozilla.org/2024/05/experimenting-with-local-alt-text-generation-in-firefox-nightly/ The current model is apparently quite good and very small, and the training / model creation code is also open. |
I can see a use-case in including harmful content keywords for automated flagging/removal purposes for organizations. |
Wow Microsofts newest Florence 2 model looks awesome (Open Source). Just not sure how helpful it is, but it could extract a description and do an object detection. |
Meta's brand new open-source Segment Anything Model 2 may be very useful for object segmentation for recognition. |
Describe the feature you'd like to request
The AI space is evolving quite fast and there are fantastic models coming out every few months. I would really like to see that this extension starts using modern AI models to improve tagging capabilities. Ideally make this extension model agnostic so that later everybody can use the model they like.
Describe the solution you'd like
A good candidate is Recognize Anything Model - RAM - on huggingface. Please see here https://huggingface.co/spaces/xinyu1205/recognize-anything
You can directly test it also on the website.
Describe alternatives you've considered
I dont see alternatives for proper AI models :)
The text was updated successfully, but these errors were encountered: