-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recognize runs out of memory when clustering faces #1033
Comments
Hello 👋 Thank you for taking the time to open this issue with recognize. I know it's frustrating when software I look forward to working with you on this issue |
Have you tried using the batch size parameter for the cluster-faces command? |
No - I didn't see that documented. Currently seeing clustering on 38000 recognized faces failing on 24GB of RAM. Will try batch parameter and report back. I'm currently running the cluster command in a while loop with batch-size 3000 and the outstanding clusters are dropping (from the admin->recognize page.) It looks like the tool is doing what it should. It would be nice for the guidance on the recognize config page to suggest that batch-size be used with occ cluster-faces command. That might minimize the number of annoyance bug reports raised about out of memory. Edit again: I've created a Pull request with a small change to the description of the occ cluster-faces command. |
Thank you. Closing this for now :) |
Which version of recognize are you using?
5.0.3
Enabled Modes
Face recognition
TensorFlow mode
Normal mode
Downstream App
Memories App
Which Nextcloud version do you have installed?
27.1.3
Which Operating system do you have installed?
Ubuntu 22.04
Which database are you running Nextcloud on?
maria db
Which Docker container are you using to run Nextcloud? (if applicable)
No response
How much RAM does your server have?
16GB
What processor Architecture does your CPU have?
x86_64
Describe the Bug
Recognize eventually consumes all memory when running occ cluster-faces from the cli with large sets of photo collections.
In my particular case, I have approx 37,000 faces identified. The clustering stage always fails and PHP is killed due to out of memory.
The issue is the tensorflow library being used. I don't know how the API to that works, but Recognize won't work on large photo sets with many many faces. The solution is not to throw memory at the problem. Recognize should be architected to handle libraries like this and not simply allow memory usage to grow unbounded.
Expected Behavior
I expect a recommended tool like Recognize to be stable and well behaved against a wide variety of libraries. If the tool cannot handle a large photo set then it shouldn't be advertised as the facial recognition application for Nextcloud.
To Reproduce
Run Recognize against a large dataset of faces.
Debug log
The only debug that shows what is happening is the "Killed" message when the php occ process is killed and messages in syslog showing that php has been killed due to oom.
$ grep Killed /var/log/syslog
Nov 17 05:37:10 cloud kernel: [455477.743345] Out of memory: Killed process 83279 (php) total-vm:7361252kB, anon-rss:2855060kB, file-rss:3060kB, shmem-rss:0kB, UID:33 pgtables:14068kB oom_score_adj:0
Nov 17 06:06:27 cloud kernel: [457235.666042] Out of memory: Killed process 83583 (php) total-vm:7514852kB, anon-rss:3070040kB, file-rss:3604kB, shmem-rss:0kB, UID:33 pgtables:14384kB oom_score_adj:0
Nov 17 14:28:35 cloud kernel: [ 2075.671577] Out of memory: Killed process 1534 (php) total-vm:12587752kB, anon-rss:6328024kB, file-rss:2528kB, shmem-rss:0kB, UID:33 pgtables:24308kB oom_score_adj:0
Nov 17 18:17:15 cloud kernel: [15794.989675] Out of memory: Killed process 3104 (php) total-vm:12861156kB, anon-rss:6557152kB, file-rss:3256kB, shmem-rss:0kB, UID:33 pgtables:24800kB oom_score_adj:0
The text was updated successfully, but these errors were encountered: