Cluster Drain Management
Safely migrate workloads in blue/green cluster upgrades
Overview
A common painpoint in managing Kubernetes upgrades is handling the movement of applications themselves onto newly upgraded nodes. Whenever a Kubernetes upgrade happens, especially in the cloud, the knock-on effect is every node and every pod is rebooted. The default cadence of this is node-by-node, a kubectl drain
command is effectively run for each node restarting any pod in the process. Since this isn't knowledgeable of the deployments controlling those pods, it is easy to cause unavailability during the upgrade process, especially if applications are not coded to tolerate restarts gracefully. The ClusterDrain
custom resource is meant to solve this.
Definition
Fundamentally the CRD is quite, simple, it can look something like this:
apiVersion: deployments.plural.sh/v1alpha1 kind: ClusterDrain metadata: name: drain-{{ cluster.metadata.master_version }} spec: flowControl: maxConcurrency: 10 labelSelector: matchLabels: deployments.plural.sh/drainable: "true"
What this is doing is whenever a cluster version is changed, it will create a new ClusterDrain
then apply two constraints:
labelSelector
- this will only apply to deployments, statefulsets, and daemonsets matching that label selector.flowControl
- a maximum concurrency level the drain process will utilize. This can prevent too many pod restarts occurring concurrently.
As long as a deployment, statefulset, daemonset matches that selector, it will basically perform a graceful restart on the deployment - accomplished by adding a temporary annotation to its podTemplate subfield. This will inherit the existing rollout policies of that controller, which developers will have tuned their applications to tolerate (otherwise they could never release their code to k8s).
An example resource opted-in to the drain would be:
apiVersion: apps/v1 kind: Deployment metadata: annotations: deployments.plural.sh/drain-wave: "0" labels: deployments.plural.sh/drainable: "true" name: argo-rollouts namespace: argo-rollouts spec: replicas: 2 selector: matchLabels: app.kubernetes.io/component: rollouts-controller app.kubernetes.io/instance: argo-rollouts app.kubernetes.io/name: argo-rollouts strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: labels: app.kubernetes.io/component: rollouts-controller app.kubernetes.io/instance: argo-rollouts app.kubernetes.io/name: argo-rollouts spec: containers: - args: - '--healthzPort=8080' - '--metricsport=8090' - '--loglevel=info' - '--logformat=text' - '--kloglevel=0' - '--leader-elect' image: quay.io/argoproj/argo-rollouts:v1.7.0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /healthz port: healthz scheme: HTTP initialDelaySeconds: 30 periodSeconds: 20 successThreshold: 1 timeoutSeconds: 10 name: argo-rollouts ports: - containerPort: 8090 name: metrics protocol: TCP - containerPort: 8080 name: healthz protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /metrics port: metrics scheme: HTTP initialDelaySeconds: 15 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 4 restartPolicy: Always serviceAccountName: argo-rollouts terminationGracePeriodSeconds: 30