-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various cleanup and fixes #302
Conversation
7e59d29
to
d2282c1
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #302 +/- ##
==========================================
+ Coverage 80.61% 81.17% +0.55%
==========================================
Files 49 49
Lines 4121 4254 +133
==========================================
+ Hits 3322 3453 +131
- Misses 653 656 +3
+ Partials 146 145 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
dd83fe5
to
5384b60
Compare
What was the cause of stalling? |
I see you added mutex to payloadQueue and reassemblyQueue. Were these really necessary when we already have mutex at Association (and also Stream) level? Mutexes are not computationally free. (Where were the data races?) |
Still undetermined and need a consistent repro but where I left off was the reassembly queue getting SSNs that are less than the ones it currently expects since too many messages were sent and the SSN overflowed. |
Running with -race wold have these occasionally fail. Do you think I should go back and try to use the locks already in the association/streams? Although not computationally free, the I think the cost is similar to our use of sync.Cond/channels for the wait/signal/broadcast system on reads and writes. |
@enobufs, it seems like a good idea if I just split this PR up into more agreeable chunks so that we can discuss changes individually. I'll do that soon! |
@enobufs good call on the mutexes. Whatever messing around I was doing was causing races to happen but I cannot get them to reproduce, so I'm excluding those in other PRs! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment I forgot to hit 'submit' ...
@@ -177,7 +187,7 @@ type Association struct { | |||
cumulativeTSNAckPoint uint32 | |||
advancedPeerTSNAckPoint uint32 | |||
useForwardTSN bool | |||
useZeroChecksum bool | |||
useZeroChecksum uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are moving to go 1.19, we should use atomic.Bool.
Description
I've spent the past week debugging what's going on with throughput and stalling when sending a lot of small and large data through SCTP. I have draft work in my fork that adds the modified Nagle's algorithm as well as a missing sender side implementation of addressing Silly Window Syndrome (this is a MUST in the RFC). Both of these help considerably on increasing throughput and stability but I need to verify it doesn't cause any unexpected regressions on throughput before moving forward. They also are hiding what I think may be some bug in this library that causes no progress to be made, even when RTXs are happening, so I want to solve that first.
Changes