Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Exhaustion (too many open files) #106

Open
nutjob4life opened this issue Aug 23, 2020 · 1 comment
Open

Resource Exhaustion (too many open files) #106

nutjob4life opened this issue Aug 23, 2020 · 1 comment
Labels

Comments

@nutjob4life
Copy link
Member

Server logs occasionally show errno 24, "Too many open files", under moderate load with pas.plugins.ldap on:

  • Plone 5.1.5
  • pas.plugins.ldap 1.7.2
  • python_ldap 3.2.0
  • node.ext.ldap 1.0b10

End users say they "got kicked out of Plone" and "can't log back in for some time". The symptoms are that lsof -p PID (where PID is the Zope instance process ID) shows a steadily increasing number of TCP connections¹ to the LDAP server². The Zope instance log shows:

SERVER_DOWN: {u'info': 'Too many open files', 'errno': 24, 'desc': u"Can't contact LDAP server"}

There is a memcached running with Plone; telnet to its port and asking for stats shows it is indeed populated with info—although it seems like it's not using that info given the rising number of LDAP client connections.

The problem occurs less frequently on:

  • Plone 5.2.1
  • pas.plugins.ldap 1.8.0
  • python_ldap 3.2.0
  • node.ext.ldap 1.0b12

The number of LDAP connections in this configuration continue to rise up to a point but they will suddenly plummet and seem to be reclaimed. Users don't report being "kicked out of Plone" as much.

The problem appears on unmodified Plone sites as well with no custom add-ons, testing by running 3 or 4 concurrent curl --cookie __ac="…" http://localhost…/folder_contentsin loops.

This report is summarized from this thread on the Plone community. See the thread for additional details.

¹The problem appears with Unix local socket connections too.
²Appears with OpenLDAP slapd 2.4.50 and Apache Directory Service 2.0.0.AM24; Micro$oft AD not available for testing.

@jensens jensens added the bug label Aug 24, 2020
@fredvd
Copy link
Member

fredvd commented Aug 24, 2020

Which OS occurs this issue on and what is the current soft/hard limit for the max number of open files per process? Lots of FD's is not necessarily an issue, unless raising the hard limit to 8192 or higher still depletes them over time.

This is with Linux/unix based OS'es visible/configurable with the ulimit command.

Open file limits can be tricky: the default setting are sometimes too low (1024-2048) for user processes and they can also count towards subprocesses.

So if you start a process manager, which starts Zope, memcaches, haproxy and Varnish as subprocesses sometimes all open files/sockets/FS's from those processes together are the single 2048 limit. (Older) Haproxy for example needs 1000s of them.

Another caveat is setting a higher limit permanently: this is sometimes different limit for a shell vs a process manager started from systemd or the user crontab's @reboot stanza.

So you restart the process manager from an ssh session, every thing is smooth, until the weekly server restart and Plone hangs itself on 1024 max open files.

I even noticed differences in the past when using sudo -u plone bash which messed up open file limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants