Cluster Drain Management

Safely migrate workloads in blue/green cluster upgrades

Overview

A common painpoint in managing Kubernetes upgrades is handling the movement of applications themselves onto newly upgraded nodes. Whenever a Kubernetes upgrade happens, especially in the cloud, the knock-on effect is every node and every pod is rebooted. The default cadence of this is node-by-node, a kubectl drain command is effectively run for each node restarting any pod in the process. Since this isn't knowledgeable of the deployments controlling those pods, it is easy to cause unavailability during the upgrade process, especially if applications are not coded to tolerate restarts gracefully. The ClusterDrain custom resource is meant to solve this.

Definition

Fundamentally the CRD is quite, simple, it can look something like this:

yaml

apiVersion: deployments.plural.sh/v1alpha1
kind: ClusterDrain
metadata:
  
  name: drain-{{ cluster.metadata.master_version }}
  
spec:
  flowControl:
    maxConcurrency: 10
  labelSelector:
    matchLabels:
      deployments.plural.sh/drainable: "true"

What this is doing is whenever a cluster version is changed, it will create a new ClusterDrain then apply two constraints:

labelSelector - this will only apply to deployments, statefulsets, and daemonsets matching that label selector.
flowControl - a maximum concurrency level the drain process will utilize. This can prevent too many pod restarts occurring concurrently.

As long as a deployment, statefulset, daemonset matches that selector, it will basically perform a graceful restart on the deployment - accomplished by adding a temporary annotation to its podTemplate subfield. This will inherit the existing rollout policies of that controller, which developers will have tuned their applications to tolerate (otherwise they could never release their code to k8s).

An example resource opted-in to the drain would be:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployments.plural.sh/drain-wave: "0"
  labels:
    deployments.plural.sh/drainable: "true"
  name: argo-rollouts
  namespace: argo-rollouts
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/component: rollouts-controller
      app.kubernetes.io/instance: argo-rollouts
      app.kubernetes.io/name: argo-rollouts
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app.kubernetes.io/component: rollouts-controller
        app.kubernetes.io/instance: argo-rollouts
        app.kubernetes.io/name: argo-rollouts
    spec:
      containers:
        - args:
            - '--healthzPort=8080'
            - '--metricsport=8090'
            - '--loglevel=info'
            - '--logformat=text'
            - '--kloglevel=0'
            - '--leader-elect'
          image: quay.io/argoproj/argo-rollouts:v1.7.0
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: healthz
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 20
            successThreshold: 1
            timeoutSeconds: 10
          name: argo-rollouts
          ports:
            - containerPort: 8090
              name: metrics
              protocol: TCP
            - containerPort: 8080
              name: healthz
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /metrics
              port: metrics
              scheme: HTTP
            initialDelaySeconds: 15
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 4
      restartPolicy: Always
      serviceAccountName: argo-rollouts
      terminationGracePeriodSeconds: 30