Meeting notes 2/6/2018 #27

divega · 2018-02-07T00:34:51Z

Today went through several points on the results at https://github.com/aspnet/DataAccessPerformance/wiki/Latest-benchmark-figures-for-PostgreSQL and a conversation we had with @stephentoub on possible additional optimizations for Npgsql, Peregrine and SqlClient.

Here are some raw notes (@stephentoub, @anpete and @roji, feel free to chime in) :

On Linux, make sure you are using sockets in blocking mode, otherwise you will be in unblocking mode which has overhead. It is enough to do one async operation to switch the mode for the lifetime of the socket (this could be a problem currently on both Npgsql and SqlClient)
Make sure there are no locks (Npgsql already removed locking from connection pool, but SqlClient may still have locking there and in other places)
- If something is locking and causing contention in the APIs underneath we can detect that in a profile (@vancem showed us how to do it some time ago, he did find this in SqlClient)

There are known implementation differences for sync (see blocking vs. non-blocking modes above)
Certain things just have different costs, e.g. double parsing, throwing exceptions is more expensive
@sebastienros is investigating a difference in behavior caused by the way we deploy the benchmarks. This could explain a 20% different on Linux.
@roji has been unable to get results for the database only using async on Linux (there is a bug he is looking at)

Make sure the state machine is as small as it can be. @stephentoub usually looks at the code generated by the compiler looking for low hanging fruit.
In some cases it may make sense to use custom awaitables to be allocation free.
That said, generally the cost is in the I/O and in the synchronous code.
Don't apply refactorings that make the code harder to understand (e.g. continuation style) unless you have proof that it helps. The code generated by the compiler is getting better!
Npgsql uses a NoSynchronizationContesScope in each entry point. It is worth trying with ConfigureAwait(false) instead. Resetting the synchronization context causes 2 thread-local storage accesses on each entry point, while ConfigureAwait(false) may make the state machine larger, and the code less clean, but avoids the thread-local storage access
For network stream consider using new methods in System.Memory, e.g. ReceiveAsync(?) that returns a ValueTask and CopyToAsync(?) to reduce allocations
Consider pre-pinning buffers to decrease fragmentation and GC cost. There are new System.Memory has APIs you can use to avoid pinning more than once
Peregrine and Npgsql: Look for places that are allocating an Action. In 2.1 for Task and ValueTask this shouldn't be generated. It may be a custom awaitable
Task shows up in the memory profile. It seems that Task.Run is returning a Task that is not already completed but it becomes completed just after the state machine checks, within the same method.

Possible impact of SSL on benchmark results
Possible unintended switching the mode of the socket to non-blocking (which may affect Linux)
Possible impact of locking in connection pool and other places. @vancem showed us this showed up in profiles

divega added meeting-notes and removed meeting-notes labels Feb 7, 2018

Provide feedback