Zuul is an L7 application gateway that provides capabilities for dynamic routing, monitoring, resiliency, security, and more.
It is the front door for all requests from devices and web sites to the backend of the Netflix streaming application.
The volume and diversity of Netflix API traffic sometimes results in production issues arising quickly and without warning. We need a system that allows us to rapidly change behavior in order to react to these situations.
Zuul uses a range of different types of filters that enables us to quickly and nimbly apply functionality to our edge service. These filters help us perform the following functions:
- Authentication and Security - identifying authentication requirements for each resource and rejecting requests that do not satisfy them.
- Insights and Monitoring - tracking meaningful data and statistics at the edge in order to give us an accurate view of production.
- Dynamic Routing - dynamically routing requests to different backend clusters as needed.
- Stress Testing - gradually increasing the traffic to a cluster in order to gauge performance.
- Load Shedding - allocating capacity for each type of request and dropping requests that go over the limit.
- Static Response handling - building some responses directly at the edge instead of forwarding them to an internal cluster.
- Multiregion Resiliency - routing requests across AWS regions in order to diversify our ELB usage (by Isthmus).
The Netty handlers on the front and back of the filters are mainly responsible for handling the network protocol, web server, connection management and proxying work. With those inner workings abstracted away, the filters do all of the heavy lifting:
- The inbound filters run before proxying the request and can be used for authentication, routing, or decorating the request.
- The endpoint filters can either be used to return a static response or proxy the request to the backend service (or origin as we call it).
- The outbound filters run after a response has been returned and can be used for things like gzipping, metrics, or adding/removing custom headers.
Zuul’s functionality depends almost entirely on the logic that you add in each filter. That means you can deploy it in multiple contexts。
An overarching idea was that while the best source of data for a servers’ latency is the client-side view, the best source of data on a servers’ utilization is from the server itself.
And that combining these 2 sources of data should give us the most effective load-balancing.
Problems with JSQ
JSQ is generally implemented by counting the number of in-use connections to a server from only the local load balancer. It works very well for a single load-balancer, but has a significant problem if used in isolation across a cluster of load balancers.
That illustrates how a single load balancers’ viewpoint can be entirely different from the actual reality.
One solution to this problem could be to share the state of each load balancers’ inflight counts with all other load balancers … but then you have a distributed state problem to solve.
An alternative simpler solution — and one that we’ve chosen — is to instead rely on the servers reporting to each load balancer how utilized they are.
Server-Reported Utilization
Using each server’s viewpoint on their utilization has the advantage that it provides the aggregate of all load balancers that are using that server, and therefore avoids the JSQ problem of an incomplete picture.
There were 2 ways we could have implemented this — either:
- Actively poll for each servers’ current utilization using health-check endpoints.
- Passively track responses from the servers annotated with their current utilization data.
We chose the 2nd option, as it was simple, allowed for frequent updating of this data, and avoided the additional load placed on servers of having N load balancers poll M servers every few seconds.
An impact of this passive strategy is that the more frequently a load balancer sends a request to one server, the more up-to-date it’s view of that servers’ utilization.
Choice-of-2 Algorithm
So far, we’ve had one guide (that is, load balancer) with a complete view of the queues and response time in the arrivals hall. That guide tries to make the best choice for each traveler based on the information he knows.
Now consider what happens if we have several guides, each directing travelers independently. The guides have independent views of the queue lengths and queue wait times – they only consider the travelers that they send to each queue.
So approaches above don’t work well with multiple “guides”(load balancers).
Instead of making the absolute best choice using incomplete data, with “power of two choices” you pick two queues at random and chose the better option of the two, avoiding the worse choice.
“Power of two choices” is efficient to implement. You don’t have to compare all queues to choose the best option each time; instead, you only need to compare two. And, perhaps unintuitively, it works better at scale than the best choice algorithms. It avoids the undesired herd behavior by the simple approach of avoiding the worst queue and distributing traffic with a degree of randomness.
In Zuul, Random with Two Choices chooses between two randomly selected servers based on which currently has fewer active connections. This is the same selection criterion as used for the Least Connections algorithm.
https://netflixtechblog.com/open-sourcing-zuul-2-82ea476cb2b3
https://netflixtechblog.com/netflix-edge-load-balancing-695308b5548c
https://www.nginx.com/blog/nginx-power-of-two-choices-load-balancing-algorithm/