Kubernetes Scalability thresholds

Background

Since 1.6 release Kubernetes officially supports 5000-node clusters. However, the question is what that actually means. As of early Q3 2017 we are in the process of defining set of performance-related SLIs (Service Level Indicators) and SLOs (Service Level Objectives).

However, no matter what SLIs and SLOs we have, there will always be some users coming and saying that their cluster is not meeting the SLOs. And in most cases it appears that the reason behind is that we (as developers) have silently assumed something (e.g. there will be no more than 10000 services in the cluster) and users were not aware of that.

This document is trying to explicitly summarize limits for the number of objects in the system that we are aware of and state if we will try to relax them in the future or not.

Kubernetes thresholds

We start with explicit definition of quantities and thresholds we assume are satisfied in the cluster. This is followed by an explanation for some of those. Important notes about the numbers:

In most cases, exceeding these thresholds doesn’t mean that the cluster fails over - it just means that its overall performance degrades.
Some thresholds below (e.g. total number of all objects, or total number of pods or namespaces) are given for the largest possible cluster. For smaller clusters, the limits are proportionally lower.
The thresholds obviously differ between different Kubernetes releases (hopefully each of them is non-decreasing). The numbers we present are for the current release (Kubernetes 1.7 release).
There are a lot of factors that influence the thresholds, e.g. etcd version or storage data format. For each of those we assume the default from the release to avoid providing numbers for huge number of combinations of those.
The “Head threshold” is representing the status of Kubernetes head. This column should be snapshotted at every release to produce per-release thresholds (and dedicated column for each release should then be added).

Quantity	Head threshold	Long term goal
Total number of all objects	250000	1000000
Number of nodes	5000	5000
Number of pods	150000	500000
Number of pods per node¹	110	500
Number of pods per core¹	10	10
Number of namespaces (ns)	10000	100000
Number of pods per ns	15000	50000
Number of services	10000	100000
Number of all services backends	TBD	500000
Number of backends per service	5000	5000
Number of deployments per ns	20000	10000
Number of pods per deployment	TBD	10000
Number of jobs per ns	TBD	1000
Number of daemon sets per ns	TBD	100
Number of stateful sets per ns	TBD	100
Number of secrets per ns	TBD	TBD
Number of secrets per pod	TBD	TBD
Number of config maps per ns	TBD	TBD
Number of config maps per pod	TBD	TBD
Number of storageclasses	TBD	TBD
Number of roles and rolebindings	TBD	TBD

There are also thresholds for other types, but for those the numbers depend also on the environment (bare metal or which cloud provider) the cluster is running in. These include:

Quantity	Head threshold	Long term goal
Number of ingresses	TBD	TBD
Number of PersistentVolumes	TBD	TBD
Number of PersistentVolumeClaims per ns	TBD	TBD
Number of PersistentVolumeClaims per node	TBD	TBD

The rationale for some of those numbers:

Total number of objects
There is a limitation on the total number of objects on the system, as this affects among others etcd and its resource consumption.
Number of nodes
We believe that having clusters with more than 5000 nodes is not the best option and users should consider splitting into multiple clusters. However, we may consider bumping the long term goal at some time in the future.
Number of services and endpoints
Each service port and each service backend has a corresponding entry in iptables. Number of backends of a given service impact the size of the Endpoints objects, which impacts size of data that is being sent all over the system.
Number of objects of a given type per namespace
This holds for different objects (pods, secrets, deployments, ...). There are a number of control loops in the system that need to iterate over all objects in a given namespace as a reaction to some changes in state. Having large number of objects of a given type in a single namespace can make those loops expensive and slow down processing given state changes.

¹ The limit for number of pods on a given node is in fact minimum from the “pod per node” and “pods per core times number of cores of a node”.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thresholds.md

thresholds.md

Kubernetes Scalability thresholds

Background

Kubernetes thresholds

Files

thresholds.md

Latest commit

History

thresholds.md

File metadata and controls

Kubernetes Scalability thresholds

Background

Kubernetes thresholds