Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recognize runs out of memory when clustering faces #1033

Closed
deddc23efb opened this issue Nov 17, 2023 · 4 comments
Closed

Recognize runs out of memory when clustering faces #1033

deddc23efb opened this issue Nov 17, 2023 · 4 comments
Labels

Comments

@deddc23efb
Copy link

Which version of recognize are you using?

5.0.3

Enabled Modes

Face recognition

TensorFlow mode

Normal mode

Downstream App

Memories App

Which Nextcloud version do you have installed?

27.1.3

Which Operating system do you have installed?

Ubuntu 22.04

Which database are you running Nextcloud on?

maria db

Which Docker container are you using to run Nextcloud? (if applicable)

No response

How much RAM does your server have?

16GB

What processor Architecture does your CPU have?

x86_64

Describe the Bug

Recognize eventually consumes all memory when running occ cluster-faces from the cli with large sets of photo collections.
In my particular case, I have approx 37,000 faces identified. The clustering stage always fails and PHP is killed due to out of memory.
The issue is the tensorflow library being used. I don't know how the API to that works, but Recognize won't work on large photo sets with many many faces. The solution is not to throw memory at the problem. Recognize should be architected to handle libraries like this and not simply allow memory usage to grow unbounded.

Expected Behavior

I expect a recommended tool like Recognize to be stable and well behaved against a wide variety of libraries. If the tool cannot handle a large photo set then it shouldn't be advertised as the facial recognition application for Nextcloud.

To Reproduce

Run Recognize against a large dataset of faces.

Debug log

The only debug that shows what is happening is the "Killed" message when the php occ process is killed and messages in syslog showing that php has been killed due to oom.
$ grep Killed /var/log/syslog
Nov 17 05:37:10 cloud kernel: [455477.743345] Out of memory: Killed process 83279 (php) total-vm:7361252kB, anon-rss:2855060kB, file-rss:3060kB, shmem-rss:0kB, UID:33 pgtables:14068kB oom_score_adj:0
Nov 17 06:06:27 cloud kernel: [457235.666042] Out of memory: Killed process 83583 (php) total-vm:7514852kB, anon-rss:3070040kB, file-rss:3604kB, shmem-rss:0kB, UID:33 pgtables:14384kB oom_score_adj:0
Nov 17 14:28:35 cloud kernel: [ 2075.671577] Out of memory: Killed process 1534 (php) total-vm:12587752kB, anon-rss:6328024kB, file-rss:2528kB, shmem-rss:0kB, UID:33 pgtables:24308kB oom_score_adj:0
Nov 17 18:17:15 cloud kernel: [15794.989675] Out of memory: Killed process 3104 (php) total-vm:12861156kB, anon-rss:6557152kB, file-rss:3256kB, shmem-rss:0kB, UID:33 pgtables:24800kB oom_score_adj:0

@deddc23efb deddc23efb added the bug Something isn't working label Nov 17, 2023
Copy link

Hello 👋

Thank you for taking the time to open this issue with recognize. I know it's frustrating when software
causes problems. You have made the right choice to come here and open an issue to make sure your problem gets looked at
and if possible solved.
I try to answer all issues and if possible fix all bugs here, but it sometimes takes a while until I get to it.
Until then, please be patient.
Note also that GitHub is a place where people meet to make software better together. Nobody here is under any obligation
to help you, solve your problems or deliver on any expectations or demands you may have, but if enough people come together we can
collaborate to make this software better. For everyone.
Thus, if you can, you could also look at other issues to see whether you can help other people with your knowledge
and experience. If you have coding experience it would also be awesome if you could step up to dive into the code and
try to fix the odd bug yourself. Everyone will be thankful for extra helping hands!
One last word: If you feel, at any point, like you need to vent, this is not the place for it; you can go to the forum,
to twitter or somewhere else. But this is a technical issue tracker, so please make sure to
focus on the tech and keep your opinions to yourself. (Also see our Code of Conduct. Really.)

I look forward to working with you on this issue
Cheers 💙

@marcelklehr
Copy link
Member

Have you tried using the batch size parameter for the cluster-faces command?

@deddc23efb
Copy link
Author

deddc23efb commented Nov 17, 2023

No - I didn't see that documented. Currently seeing clustering on 38000 recognized faces failing on 24GB of RAM. Will try batch parameter and report back.
Update: Using batch-size gets things working. I'm able to run at batch-size 3000 which is stable at about 1.1GB RAM. Batch-size 5000 started a worrying memory climb again.

I'm currently running the cluster command in a while loop with batch-size 3000 and the outstanding clusters are dropping (from the admin->recognize page.)

It looks like the tool is doing what it should. It would be nice for the guidance on the recognize config page to suggest that batch-size be used with occ cluster-faces command. That might minimize the number of annoyance bug reports raised about out of memory.

Edit again: I've created a Pull request with a small change to the description of the occ cluster-faces command.

@marcelklehr
Copy link
Member

Thank you. Closing this for now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

2 participants