Useful Tricks & Lessons I've learned Managing a Kubernetes Cluster
You can use ephemeral debug containers but as of writing they’re a feature state for
Kubernetes v1.18 [alpha].
What I use is
kubectl edit to modify a pod (or
StatefulSet etc.) YAML to update the
apiVersion: v1 kind: Pod metadata: name: unstable-pod spec: containers: - name: unstable-pod image: foobar command: - sh - -c - "tail -f /dev/null"
From there I can
kubectl exec into the pod and manually start the entrypoint. Useful when you want to keep the pod running when main PIDs crash or if you want to play around with configs quickly without editing YAML configs and restarting the pod.
Mostly used for StatefulSets. Setting
podManagementPolicy: "Parallel" means the ordered pods will run and be terminated in parallel. Useful for when you don’t care about pods starting serially in order. Keep in mind, if you scale a broken deployment with
n pods, you’ll have
Defaulting to Retain PVs
For all our PVs, we create a
StorageClass with something like
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: standard-pd-retain provisioner: kubernetes.io/gce-pd reclaimPolicy: Retain allowVolumeExpansion: true volumeBindingMode: WaitForFirstConsumer parameters: type: pd-standard
reclaimPolicy: Retain ensures if PVs are deleted, on the storage backed side (EBS, GCP Disks etc.) are preserved. The general idea is not to let Kubernetes actually delete any data, the PV will disappear but the volume will still live. This is useful when you accidentally delete a PV, since it can be reattached to a new PVC for recovery (I’ve had to do this).
Deep-dive Through Abstractions
When something goes awry, and you probably don’t have any sophisticated monitoring because you may not have the time to build it yet, and if the issue urgent you’ll probably need to get familiar with checking the right places. Lets say for example, if something is broken, and it’s not immediately obvious if it’s an application or a cluster-wide issue, the order I’d look into things are:
- Go by my first hunch and follow that (as we’re naturally inclined to do). After a while you learn the general pitfalls, or have a feeling it may be due to a recent change.
- Starting by
kubectl describeof the pods that have issues and check the events
kubectl get eventsin the namespace. Then
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get pods -o wideto see the node the pod is running on. Then
kubectl describethe node to see if there are issues with it. Might as well check the other nodes just in case
- If it’s an issue with PVCs, check the persistent volumes, sometimes they get lost,
kubectl get pvor
kubectl describe pv. If PVs are stuck in
Terminating, edit the PV and delete the
- If it’s a performance issue, use
watch -n5 kubectl top nodes(or
kubectl top pods)
- If it’s a scaling issue check the autoscaler
kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler
- If it’s a networking issue check the Kubernetes DNS. We’ve ran into this issue, where all the
kube-dnspods were scheduled on one preemptible node, guaranteeing DNS downtime
- Check the VMs on your cloud provider if it’s compute issues, or the disk if it’s storage. Sometimes it’s preemptible/spot nodes just being down (or the general provider itself)
Usually the problem will emerge. I’ve yet had had to connect to the nodes themselves to debug issues.
StatefulSets have their place, but in my opinion they’re not ideal when you really want to keep state. Useful for say general compute pods where it’s useful to keep state between pod restarts and the like. But say you’re running a properly stateful service like HDFS or ES on Kubernetes. A
PVC is always better. Here are some comparisons, say for a HDFS (namenode) pod with
- If you want to remove datanode number
nin a StatefulSet, you can’t, you have to scale them down since they’re ordered
- If you want to migrate the PVs to a different cluster, I honestly wouldn’t know how to, since the
volumeClaimTemplateswill create new PVCs + PVs
- If you accidentally delete the StatefulSet you lose the
volumeClaimTemplates, and I wouldn’t know how to reattach the old PVs if they’re set to
- You can’t have datanodes in a StatefulSet have different PVC sizes,
requests, env vars, or anything useful. They stay identical. Say you need to resize a few datanode disks to save costs, or run the last two of the pods with fewer resource and you need to do it—I can’t think of any easy way to do this.
If you require any of these I would recommend using a
PV. There’s some work involved in scaling, since you’re basically creating your own “StatefulSet”, but you have more control of each deployment, and it will save a lot of headache if you run into any issues. I may be wrong here, if anyone has had success with solving the above problems with StatefulSets I’d like to know about them.
kubectl diff and
Both useful for your CI of course, but generally a good idea to check diff of a resource you’re about to update. I’ve been using it as often as I use
git diff. Just make sure you use a client post
1.18.1 due to this issue where a bug causes
Switching Between Clusters and Namespaces Quickly
I personally use kubectx + kubens to easily switch between different clusters and namespaces. I also renames both binaries to
knamespace since I don’t want to ruin my
Keeping it Simple
python3 to template via overlays. I’m not into templating YAML (kustomize is advertised as “template-free” templating, I guess it is but that comes with limitations and it’s own
YAML HELL). I don’t use helm (more moving parts, black box, “package management” aspect to it isn’t a feature for me etc.), or try not to i.e., either build my own YAML or generate on from the chart. I try not use use Kubernetes Operators or too many CRDs etc., generally keeping things stock. Of course complexity is always difficult to manage with real deadlines.
I would probably approach YAML creation programatically (especially if the Kubernetes use-case is SaaS-like), however being one person managing clusters, I couldn’t imagine getting things shipped—but there may be other opportunities to do some elegant work here in the future.
Probably the most important thing I’ve learned is the hardest part of managing a Kubernetes cluster is pushing back on the complexity. It’s easy run with it, to the point where the cluster complexity-space is larger than people can manage. I’m also starting to think over-engineering is a real problem, and with time systems tend to converge as technical debt. Some of these are hard to predict or know. After 3 years managing a single Kubernetes cluster, what we considered a great idea or “best practice” two years ago now may seem archaic. Sometimes you internalise the complexity and don’t realise the knowledge your team or others have in the system—of course, documentation solves many problems, but I’m still always thinking of reducing complexity.