Avoid reporting socket errors via Sentry observer #1026

chriskuehl · 2024-11-19T07:19:33Z

After merging #1009, we started getting a small number of socket errors reported via Sentry:

OSError: [Errno 9] Bad file descriptor
  File "src/gevent/greenlet.py", line 908, in gevent._gevent_cgreenlet.Greenlet.run
  File "gevent/pywsgi.py", line 719, in read_requestline
    line = self.rfile.readline(MAX_REQUEST_LINE)
  File "socket.py", line 720, in readinto
    return self._sock.recv_into(b)
  File "gevent/_socketcommon.py", line 691, in recv_into
    return self._sock.recv_into(*args)

I traced this down to two potential causes, neither of which are user-impacting:

Cause 1: The Sentry observer reports unhandled exceptions from greenlets, even though we handle the exceptions

The Sentry observer hooks into gevent's print_exception function in order to report all unhandled exceptions to Sentry:

baseplate.py/baseplate/observers/sentry.py

Lines 123 to 131 in efd4459

    
           class _SentryUnhandledErrorReporter: 
        
               """Hook into the Gevent hub and report errors outside request context.""" 
        
               @classmethod 
        
               def install(cls) -> None: 
        
                   from gevent import get_hub 
        
                   gevent_hub = get_hub() 
        
                   gevent_hub.print_exception = cls(gevent_hub)

This means that unhandled errors will always get reported via Sentry, even though the read_requestline function is intended to raise socket errors which are handled in the calling function:
https://github.com/gevent/gevent/blob/a50ca62df1bb84c2960ba34c655f18129d845b19/src/gevent/pywsgi.py#L769-L776

(socket.error is an alias for OSError, so this catches all OSError exceptions and uses them as a signal to close the connection.)

To fix this, I wrapped the function so that the greenlet returns a tuple (result, exception) rather than raising the exception. This no longer gets reported to Sentry.

Cause 2: There's a possible race condition where the socket can be closed before the `read_requestline()` greenlet runs

Imagine the shutdown event becomes set. The "main" greenlet for the connection does the below:

baseplate.py/baseplate/server/wsgi.py

Lines 80 to 83 in efd4459

    
           if self._shutdown_event in ready: 
        
               real_read_requestline.kill() 
        
               # None triggers the base class to close the connection. 
        
               return None

Returning None causes the caller to close the connection which is normally not a problem, but depending on the order the greenlet execution happens and when the kill signal is delivered, it could close the connection while the read_requestline greenlet is still running, which causes it to try to read from a closed socket.

I was able to reproduce this via adding gevent.sleep()s to force the race condition, but I suspect this is not the cause of the production errors. To fix this, I added a join() on the read_requestline greenlet to ensure it finishes executing before we return.

Validation

I was able to reproduce both of the above cases locally using the test I added, but it's difficult to consistently reproduce the issues in a test since it's hard to force race conditions without adding gevent.sleep() in random places.

Avoid reporting socket errors via Sentry observer

c056c1b

chriskuehl marked this pull request as ready for review November 19, 2024 16:45

chriskuehl requested a review from a team as a code owner November 19, 2024 16:45

chriskuehl requested review from scotthew1 and redloaf November 19, 2024 16:45

redloaf approved these changes Nov 20, 2024

View reviewed changes

chriskuehl merged commit ebbf2ee into develop Nov 20, 2024
4 checks passed

chriskuehl deleted the fix-socket-errors-reported-via-sentry branch November 20, 2024 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid reporting socket errors via Sentry observer #1026

Avoid reporting socket errors via Sentry observer #1026

chriskuehl commented Nov 19, 2024 •

edited

Loading

	class _SentryUnhandledErrorReporter:
	"""Hook into the Gevent hub and report errors outside request context."""

	@classmethod
	def install(cls) -> None:
	from gevent import get_hub

	gevent_hub = get_hub()
	gevent_hub.print_exception = cls(gevent_hub)

	if self._shutdown_event in ready:
	real_read_requestline.kill()
	# None triggers the base class to close the connection.
	return None

Avoid reporting socket errors via Sentry observer #1026

Avoid reporting socket errors via Sentry observer #1026

Conversation

chriskuehl commented Nov 19, 2024 • edited Loading

Cause 1: The Sentry observer reports unhandled exceptions from greenlets, even though we handle the exceptions

Cause 2: There's a possible race condition where the socket can be closed before the read_requestline() greenlet runs

Validation

chriskuehl commented Nov 19, 2024 •

edited

Loading

Cause 2: There's a possible race condition where the socket can be closed before the `read_requestline()` greenlet runs