Restructuring the SSL event handling #228

tilsche · 2018-05-29T16:20:30Z

As per the OpenSSL documentation:

Any TLS/SSL I/O function can lead to either of SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE. In particular, SSL_read() or SSL_peek() may want to write data and
SSL_write() may want to read data.

The previous implementation considered this only partially and inconsistently. Particularly a write could call a read when it was caused by a readable event, but then the read failed and changed the state to reading and no more writing was done.

This PR simplifies the whole read/write handling. Basically the first read/write is always attempted¹ to find out what SSL wants (SSL_ERROR_WANT_READ/WRITE). This is saved in the state and used for fd monitoring. For the next process(), these flags are used as filters. read, write, and proceed is always invoked directly from process.

¹ The constructor does not attempt read/write, it sets the want_flags to readable | writable instead to watch and react to any kind of fd events.

Fixes #207 . This is only weakly tested with ASIO and libev. It needs more test & review.

Thanks to @bmario @kinnarr

always try to write if there is something in the buffer, always try to read remember why SSL failed to use as monitoring flags - and filter in process()

…nt the return value

… called monitor->events

Also refactors redundant code into a helper function

tilsche · 2018-09-04T12:30:58Z

Could we maybe get some feedback to this PR? I'm also wondering how this relates to 80ce632

EmielBruijntjes · 2018-10-27T10:07:44Z

There were indeed problems in handling SSL connections. The changes in 80ce632 were supposed to fix this. Does the issue still exist?

tilsche · 2018-10-27T15:03:40Z

We have not tested 80ce632 because we have been running this PR's code-base for the last 5 months.

I don't think 80ce632 covers all problematic cases. The state handling is still messy. Consider the example that we observed failing:

Particularly a write could call a read when it was caused by a readable event, but then the read failed and changed the state to reading and no more writing was done.

I can see this still happening as there is no guarantee that the read will fully succeed just because the socket was actually readable (as opposed to wrongly cached readable).

Overall I think running your own checks on the socket is the wrong way to go if you are in an async loop. This introduces possibilities for and redundancy, as well as many redundant syscalls and thus performance issues.

That said the blocking select call in flush seems highly problematic to me as well, but that may be another topic...

tilsche · 2018-11-22T14:58:41Z

@EmielBruijntjes We took a look at the recent b81bc34. It seems that it still uses the limited states which can end up inconsistent.

While reproducing issues with this is extremely difficult, I believe there is a clear argument for the code. We are happy and would appreciate to discuss this further.

EmielBruijntjes · 2018-11-24T00:10:12Z

I do agree that the earlier implementation of the SslConnected class was not correct. But since the latest changes I have the impression that it does work. Can you explain which (edge) cases we are missing with the current implementation?

tilsche · 2018-11-26T12:13:30Z

The case I am concerned about is the following:

SSLConnected::write
_out.sendto() returns > 0
still data in _out
socket becomes readable because of handshake data coming in
falling through the loop because of isReadable()
SSLConnected::receive
receive will handle the handshake but fails with SSL_ERROR_WANT_READ because there is no actual payload data to receive
_state = state_receiving
onIdle(readable)
we could write now, but process is never called
even if process would be called we would be stuck in _state == state_receiving which fails with WANT_READ as long as the server doesn't send anything, maybe never

EmielBruijntjes · 2018-11-26T16:07:19Z

About point 7: does this indeed happen? I though that SSL_ERROR_WANT_READ is only returned when the read-operation should be repeated (because it was only partially ready). If the read-operation fails because no data is available, it returns 0 and goes back to the idle state.

tilsche · 2018-11-26T16:24:19Z

If SSL_read returns 0, it is the error case (<= 0).

Also:

  SSL_ERROR_NONE
      The TLS/SSL I/O operation completed.  This result code is returned if and only if ret > 0.

So I don't see a way to fall back to regular idle with 0 bytes received.

EmielBruijntjes · 2018-11-26T16:49:25Z

About step 10: is that permitted? If SSL_ERROR_WANT_READ was returned by an earlier call, is it then allowed to call ssl_write() instead of repeating the ssl_read() operation?

EmielBruijntjes · 2018-11-26T17:02:30Z

I don't think it is permitted to mix ssl_read() and ssl_write() calls if one of them returned WANT_READ or WANT_WRITE (check https://stackoverflow.com/questions/35138931/can-you-interleave-calls-to-ssl-read-ssl-write-when-one-of-them-returns-ss for example). As far as I see, the only option then is to repeat the call. The scenario that you are painting is, simplified, as follows:

socket becomes readable because of handshake data coming in.
we call ssl_read() because of this.
this returns SSL_WANT_READ because the data was only handshake data and there is no actual data.
from now on the only legal method to call is ssl_read() because openssl does not allow calls to ssl_write() or ssl_shutdown() until the previous ssl_read() is finished.
the program will never write again.

This is either an (unlikely) bug in openssl, or openssl already deals with this and this scenario does not occur.

tilsche · 2018-11-27T08:54:18Z

I think the StackOverflow answer is wrong, but there is also a link to the mailing list which discusses a slight variation scenario I am concerned about:

You call SSL_write. It returns -1, WANT_READ, because a renegotiation is
in progress.

1A) The renegotiation data arrives.

Independently, you SSL_read. It reads the renegotiation data but no
application data, so it returns WANT_READ.

You select for read, since OpenSSL told you to do so. The select will
take forever, because the renegotiation data has already arrived and been
read. Your SSL_write would succeed immediately now.

When SSL_read returns anything it invalidates any prior WANT_* return from
SSL_write because the SSL_read may have made forward progress. Any call to
an SSL_* function may consume data. You cannot assume that because the data
hadn't been read when you called SSL_read it still hasn't been read after
you called SSL_write.

This will screw you on renegotiations.

In AMQP-CPP, step 1) would cause a repeated SSL_write instead of SSL_read. But I don't see any reason why this 1A) could not happen without 1). The mailing list post strongly indicates that mixing SSL_read/SSL_write is ok.

One point regarding this PR is that the want-flags should be reset to their default whenever any of the functions has received new want flags. I somehow think the default want-flags for SSL_write should be only writable.

Unfortunately the official documentation is indeed not very helpful. There is one paragraph in man SSL_get_error

It is safe to call SSL_read() or SSL_read_ex() when more data is available even when the call that set this error was an SSL_write() or SSL_write_ex(). However if the call
was an SSL_write() or SSL_write_ex(), it should be called again to continue sending the application data.

The second sentence is rather confusing - I suspect it means "it should eventually be called again". It doesn't say whether it is safe to call SSL_write after a SSL_read failed.

tilsche · 2018-11-27T09:14:36Z

There's also this paragraph in man SSL_get_error (emphasis mine):

Caveat: Any TLS/SSL I/O function can lead to either of SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE. In particular, SSL_read_ex(), SSL_read(), SSL_peek_ex(), or SSL_peek()
may want to write data and SSL_write() or SSL_write_ex() may want to read data. This is mainly because TLS/SSL handshakes may occur at any time during the protocol
(initiated by either the client or the server); SSL_read_ex(), SSL_read(), SSL_peek_ex(), SSL_peek(), SSL_write_ex(), and SSL_write() will handle any pending handshakes.

One might read into this

Server starting a handshake can very well happen, and thus my "failed read - never write" scenario
The listed functions all handle pending handshakes, thus you may mix them - and should reset the want-flags from previous calls to their presumed defaults.

EmielBruijntjes · 2018-11-27T17:12:02Z

Thanks for all your input. I find it very interesting because it would mean that I made some mistakes in other software too. But there are still a lot of unanswered questions here:

Is it ok to mix ssl_read and ssl_write? I always read the documentation so that the same call had to be repeated. The documentation that you quote does indeed say that it is ok to start writing in the middle of a read-operation, but not the other way around (at least that is how I read it). And you are quoting openssl 1.1, while we want to support openssl 1.0 too, which does not contain this text. Maybe openssl 1.1 is more tolerant than 1.0?
But even if it is not forbidden to mix them, is it then also wrong to not do that, and repeat the same operation until it is finished, before switching to the other operation? My opinion is that this is not wrong.
Does it ever happen that a peer starts a renegotiation handshake without it being part of a normal data transfer or shutdown? In other words: there is no intention to send data, but still the peer sends a SSL handshake? This is the 1A scenario. This would indeed trigger a ssl_read, that could indeed lead to an infinite want-read situation. But I doubt this happens. I think that the handshakes only take place when some meaningful data is sent over the line too (afaik openssl does not have an internal timer that spontaneously start sending data over the line, the other party must thus have called ssl_write() or ssl_shutdown() to trigger the handshake). But it should not be so difficult to write a small server application that sends no data, but that does periodically send renegotiating handshakes? That would be the perfect demonstration that our SslConnected class is conceptually wrong.

tilsche · 2018-11-28T08:56:05Z

I don't have authoritative answers, just my interpretation of the documentation / mailing list.

I did not find anything that says that you can't. Nothing says that you must repeat an operation before calling the other. For what it's worthSSL_read says (emphasis mine) "The calling process then must repeat the call after taking appropriate action to satisfy the needs of the read function", but SSL_get_error says "can be retried later", "... can be called again". I don't know about 1.1, to some extent I am assuming that it is just refinement in the documentation.
Well - if you need to send data you must call SSL_write - in that sense it's wrong to not call it. I think the problem with AMQP is that it's asynchronous whereas with most other protocols its always clear who needs to send and who needs to receive next.
I read SSL_get_error that way: "TLS/SSL handshakes may occur at any time during the protocol (initiated by either the client or the server)".

EmielBruijntjes · 2018-12-03T07:41:18Z

My colleague has been further looking into this. We now think that it could indeed be possible to mix calls to ssl_read() and ssl_write(), but we do have to check it more closely to be sure. We also discovered that although the openssl 1.0 docs says that a repeated ssl_write() must be called with exactly the same parameters as the previous call (our interpretation: with the same memory addresses), that the call can in reality also be repeated with the same data, but stored on a different location. This would allow us to copy buffers only when a repeat is necessary, instead of prematurely before each call just in case the call fails.

Anyway, there is some room for improvement in the SSL handling, because we no longer have to delay writing if a read operation is also in progress, and the other way around. And we do not have to buffer all outgoing data, but only the data that could not be sent at the first attempt. Whether the current implementation is now also broken, is not yet sure.

tilsche and others added 6 commits May 29, 2018 17:43

Change examples to use SSL

b16ccfd

Fix includes

c2a9b49

Reorganize the SSL event loop

3994c90

always try to write if there is something in the buffer, always try to read remember why SSL failed to use as monitoring flags - and filter in process()

Can't process in certain situation because we need to take into accou…

3c777dd

…nt the return value

Avoid setting up redundant read/write handlers when process() already…

930aa1c

… called monitor->events

Fixes calling of proceed() without checking the result.

02e3961

Also refactors redundant code into a helper function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructuring the SSL event handling #228

Restructuring the SSL event handling #228

tilsche commented May 29, 2018

tilsche commented Sep 4, 2018

EmielBruijntjes commented Oct 27, 2018

tilsche commented Oct 27, 2018 •

edited

Loading

tilsche commented Nov 22, 2018

EmielBruijntjes commented Nov 24, 2018

tilsche commented Nov 26, 2018

EmielBruijntjes commented Nov 26, 2018 •

edited

Loading

tilsche commented Nov 26, 2018

EmielBruijntjes commented Nov 26, 2018

EmielBruijntjes commented Nov 26, 2018 •

edited

Loading

tilsche commented Nov 27, 2018

tilsche commented Nov 27, 2018

EmielBruijntjes commented Nov 27, 2018

tilsche commented Nov 28, 2018

EmielBruijntjes commented Dec 3, 2018

Restructuring the SSL event handling #228

Are you sure you want to change the base?

Restructuring the SSL event handling #228

Conversation

tilsche commented May 29, 2018

tilsche commented Sep 4, 2018

EmielBruijntjes commented Oct 27, 2018

tilsche commented Oct 27, 2018 • edited Loading

tilsche commented Nov 22, 2018

EmielBruijntjes commented Nov 24, 2018

tilsche commented Nov 26, 2018

EmielBruijntjes commented Nov 26, 2018 • edited Loading

tilsche commented Nov 26, 2018

EmielBruijntjes commented Nov 26, 2018

EmielBruijntjes commented Nov 26, 2018 • edited Loading

tilsche commented Nov 27, 2018

tilsche commented Nov 27, 2018

EmielBruijntjes commented Nov 27, 2018

tilsche commented Nov 28, 2018

EmielBruijntjes commented Dec 3, 2018

tilsche commented Oct 27, 2018 •

edited

Loading

EmielBruijntjes commented Nov 26, 2018 •

edited

Loading

EmielBruijntjes commented Nov 26, 2018 •

edited

Loading