Debugging Kubernetes postStart Hooks
I recently ran into an issue where running a postStart
hook in a Kubernetes Pod was failing. Kubernetes is normally pretty good about making errors obvious, but lifecycle hooks are a bit of a rough edge. My normal method for debugging them is to run the script locally in a Docker container, but the script was being injected into a Volume via a ConfigMap, so it seemed like the "run it on your laptop approach" wouldn't quite simulate reality well enough. Here's what I did instead.
Assumptions
The debugging steps here assume that you have a fairly high level of permissions in your Kubernetes cluster. You'll need to be able to run these commands against your Namespace: kubectl get pods
, kubectl exec
, and kubectl apply
. You'll also need to be using a container image that has a shell in it.
Create a workspace
Debugging is largely a process of forming and testing hypotheses, and whenever I'm experimenting I like to start with a blank directory. I have an alias in my Bash profile for creating them: cd "$(mktemp -d)"
. This step is 100% optional, because hey, it's your filesystem. The other thing I like to do when debugging is set some environment variables to save me some typing. I prefix them with "THE" because it's fewer characters to type than "DEBUG": export THEPOD=$(kubectl get pod -oname <pod-that-is-failing>
.
Make a test copy of the failing Pod
This debugging method tests hypotheses about failing postStart
hooks by execing into a Pod and running them; the Pod has to be running for that to happen. postStart
hooks fail Pod startup, so the first thing to do is create a modified version of the Pod that actually starts. There are a few of ways of doing this. If you're working in a true sandbox you can run kubectl edit
on the Pod or its controlling Deployment, remove the postStart
hook, and move onto the next step. If you're working in an environment where you don't want to edit things directly, do this instead:
- Make a copy of the Pod definition:
kubectl get pod $THEPOD -o yaml > pod.yaml
- Edit the Pod definition to remove the
postStart
hook and any Kubernetes-generated information such ascreatedTimestamp
,ownerReferences
, andstatus
. - Make your modified Pod easily deletable by giving it a memorable name and adding a
metadata.label
entry to search on. I personally usejturner-delete-me-<four-random-letters>
for a name andcleanup: "true"
as a label. - Apply the modified Pod definition:
kubectl apply -f pod.yaml
Debug the postSart hook
Unfortunately, this is the "draw the rest of the owl" portion of the post. Every debugging session will start with kubectl exec -ti jturner-delete-me-xyza sh
and running the postStart
hook, but what's actually wrong will vary from hook to hook. In my case, I was trying to be too clever—any amount of clever in shell scripts is too clever—in the postStart
script. Kubernetes seems to have not allow a command to exit non-zero in postStart
hooks, even if that non-zero exit is expected.
Clean up
Be kind to your fellow cluster users and clean up after yourself: kubectl delete pod -l cleanup=true
.
Takeaways
- Resist the urge to be too clever in your
postStart
scripts.postStart
scripts can fill in feature gaps around container topology, but they're one of the most blunt instruments in the Kubernetes toolchain. Keep them simple. - Making test copies of cluster resources is cheap and easy for the most part. Reach for it as an alternative to editing things directly on the cluster.
Hopefully this is helpful the next you find yourself staring at cryptic kubectl describe
output about a failing postStart
hook.