thoughts | THEDEN

Useful Tricks & Lessons I've learned Managing a Kubernetes Cluster

Debugging Pods

You can use ephemeral debug containers but as of writing they’re a feature state for Kubernetes v1.18 [alpha].

What I use is kubectl edit to modify a pod (or Deployment, StatefulSet etc.) YAML to update the command

apiVersion: v1
kind: Pod
metadata:
  name: unstable-pod
spec:
  containers:
  - name: unstable-pod
    image: foobar
    command:
    - sh
    - -c
    - "tail -f /dev/null"

From there I can kubectl exec into the pod and manually start the entrypoint. Useful when you want to keep the pod running when main PIDs crash or if you want to play around with configs quickly without editing YAML configs and restarting the pod.

Using podManagementPolicy

Mostly used for StatefulSets. Setting podManagementPolicy: "Parallel" means the ordered pods will run and be terminated in parallel. Useful for when you don’t care about pods starting serially in order. Keep in mind, if you scale a broken deployment with n pods, you’ll have n CrashLoopBackOffs.

Defaulting to Retain PVs

For all our PVs, we create a StorageClass with something like

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard-pd-retain
provisioner: kubernetes.io/gce-pd
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: pd-standard

Where reclaimPolicy: Retain ensures if PVs are deleted, on the storage backed side (EBS, GCP Disks etc.) are preserved. The general idea is not to let Kubernetes actually delete any data, the PV will disappear but the volume will still live. This is useful when you accidentally delete a PV, since it can be reattached to a new PVC for recovery (I’ve had to do this).

Deep-dive Through Abstractions

When something goes awry, and you probably don’t have any sophisticated monitoring because you may not have the time to build it yet, and if the issue urgent you’ll probably need to get familiar with checking the right places. Lets say for example, if something is broken, and it’s not immediately obvious if it’s an application or a cluster-wide issue, the order I’d look into things are:

Usually the problem will emerge. I’ve yet had had to connect to the nodes themselves to debug issues.

Ditching StatefulSets

StatefulSets have their place, but in my opinion they’re not ideal when you really want to keep state. Useful for say general compute pods where it’s useful to keep state between pod restarts and the like. But say you’re running a properly stateful service like HDFS or ES on Kubernetes. A Deployment + PVC is always better. Here are some comparisons, say for a HDFS (namenode) pod with n datanodes:

If you require any of these I would recommend using a Deployment + PVC + PV. There’s some work involved in scaling, since you’re basically creating your own “StatefulSet”, but you have more control of each deployment, and it will save a lot of headache if you run into any issues. I may be wrong here, if anyone has had success with solving the above problems with StatefulSets I’d like to know about them.

Using kubectl diff and validate --dry-run

Both useful for your CI of course, but generally a good idea to check diff of a resource you’re about to update. I’ve been using it as often as I use git diff. Just make sure you use a client post 1.18.1 due to this issue where a bug causes diff to apply

Switching Between Clusters and Namespaces Quickly

I personally use kubectx + kubens to easily switch between different clusters and namespaces. I also renames both binaries to kcontext and knamespace since I don’t want to ruin my kubectl autocomplete.

Keeping it Simple

I use kustomize + envsubst + python3 to template via overlays. I’m not into templating YAML (kustomize is advertised as “template-free” templating, I guess it is but that comes with limitations and it’s own YAML HELL). I don’t use helm (more moving parts, black box, “package management” aspect to it isn’t a feature for me etc.), or try not to i.e., either build my own YAML or generate on from the chart. I try not use use Kubernetes Operators or too many CRDs etc., generally keeping things stock. Of course complexity is always difficult to manage with real deadlines.

I would probably approach YAML creation programatically (especially if the Kubernetes use-case is SaaS-like), however being one person managing clusters, I couldn’t imagine getting things shipped—but there may be other opportunities to do some elegant work here in the future.

Probably the most important thing I’ve learned is the hardest part of managing a Kubernetes cluster is pushing back on the complexity. It’s easy run with it, to the point where the cluster complexity-space is larger than people can manage. I’m also starting to think over-engineering is a real problem, and with time systems tend to converge as technical debt. Some of these are hard to predict or know. After 3 years managing a single Kubernetes cluster, what we considered a great idea or “best practice” two years ago now may seem archaic. Sometimes you internalise the complexity and don’t realise the knowledge your team or others have in the system—of course, documentation solves many problems, but I’m still always thinking of reducing complexity.

Written May 2020.

← Demystifying XML (an attempt)