Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of proxy application and TAP handler #2

Open
NullHypothesis opened this issue Dec 12, 2022 · 6 comments
Open

Improve performance of proxy application and TAP handler #2

NullHypothesis opened this issue Dec 12, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@NullHypothesis
Copy link
Contributor

I've been working on some tooling that can help us measure nitriding's networking performance. So far, I have a minimal Go Web server that implements a simple "hello world" handler. I tested the Web server in three scenarios:

  1. Docker: In a Docker container (with no nitriding or enclaves involved), which serves as our baseline.
  2. Nitriding-nrp: In an enclave, with the Web service receiving connections directly from clients.
  3. Nitriding: In an enclave, with nitriding acting as a reverse proxy in front of the Web service.

All three scenarios use HTTP only, to eliminate the computational overhead of TLS. I then used baton to measure the requests per second that the Web service can sustain. The results are:

image

The numbers aren't great. Let's use this issue to do some debugging, identify bottlenecks, and improve the networking code.

@NullHypothesis NullHypothesis added the enhancement New feature or request label Dec 12, 2022
@NullHypothesis NullHypothesis self-assigned this Dec 12, 2022
@NullHypothesis
Copy link
Contributor Author

See also brave/star-randsrv#58.

@rillian
Copy link
Contributor

rillian commented Apr 25, 2023

What was the baton command line? Concurrency level made a significant difference to throughput in my tests. We should make sure we're measuring the tunnel and proxy's capacity and not just the latency.

@NullHypothesis
Copy link
Contributor Author

Yesterday, I measured requests per second (for a simple "hello world" Web server) for an increasing number of baton threads:

All setups can sustain more reqs/sec as the number of sender threads increases—except when we use nitriding's reverse proxy, which sees a reduction in reqs/sec. Some time this week, I'll take a closer look at Go's reverse proxy implementation to see what easy improvements we can make.

@NullHypothesis
Copy link
Contributor Author

Elaborating on the above: The "Enclave" setup constitutes the approximate upper limit that we can achieve with nitriding. This setup has no nitriding: it consists of a Web server that binds directly to the VSOCK interface, and a custom baton that sends requests directly to the VSOCK interface.

At this point, there are two significant bottlenecks:

  1. Nitriding's (or rather: Go's) HTTP reverse proxy. In this thread, someone argues that the reverse proxy does poorly when faced with synthetically-generated requests.
  2. Nitriding's tap interface (and the user space TCP stack that comes with it) and the gvproxy that runs on the EC2 host. It's not clear which one is the worse offender. We should measure these two components in isolation, and then focus on the slower one.

@NullHypothesis
Copy link
Contributor Author

I stumbled upon an issue that describes the problem we're seeing: golang/go#6785. Increasing MaxIdleConnsPerHost makes a significant difference. In a preliminary test, I set it to 1000, which makes the reverse proxy perform almost identically to the "no reverse proxy" setup:

For posterity, a few other things I've tried:

  • Run stress tests with bombardier instead of baton. Bombardier promises to be faster because it's built on top of fasthttp instead of Go's built-in net/http (which baton is presumably using). In my tests, bombardier actually performs slightly worse than baton, achieving an average of 1,027 reqs/sec in the "Nitriding" setup whereas baton achieves 1,195.
  • Use a BufferPool for the reverse proxy. Go's reverse proxy implementation is allocating a 32 KB buffer for each incoming request. A buffer pool allows for the reuse of buffers and also mitigates the garbage collector's work. Unfortunately, this made no difference in the numbers. Regardless, it's probably a good idea to add this.

@NullHypothesis
Copy link
Contributor Author

For the record, we just merged PR brave/nitriding#61, which improves the status quo.

@NullHypothesis NullHypothesis transferred this issue from brave/nitriding May 1, 2023
@NullHypothesis NullHypothesis removed their assignment Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants