Simulating Spot Instance Termination to Test AWS Node Termination Handler
Spot instances are becoming a popular way to achieve large, 70%+ cost savings on the cloud, but it can be difficult to ensure your Kubernetes workloads continue working correctly when spot instances are reclaimed. For example, what happens when a spot instance is reclaimed but there is no more capacity available in the cluster?
Fortunately, AWS gives a spot instance a 2-minute warning before terminating a reclaimed instance. This warning allows a Kubernetes cluster to combine the power of the AWS Node Termination Handler with Kubernetes Autoscaler to cleanly handle spot reclamations without having to worry about service interruptions or insufficient capacity in your cluster.
The AWS Termination Handler makes sure that your workloads are cleanly drained by stopping new requests from being made to the pods in the reclaimed instance through the Kubernetes cordon operation and starts rescheduling those pods in other nodes through the Kubernetes drain operation. But what happens if there isn’t enough capacity in the remaining nodes? This is where the Kubernetes Autoscaler comes in.
The Kubernetes Autoscaler can detect that there is not enough capacity left in the remaining instances and asks AWS to spin up new instances. For this to work correctly it is important to make sure that your ASGs, the Termination Handler, and the Autoscaler are configured correctly; however, that explanation is outside the scope of this article, what I want to focus on is how do we test this mechanism?
AWS gives a Spot Instance a 2-minute warning by changing the response of the metadata API. The metadata API endpoint /spot/instance-action usually returns a 404 response, but when an instance is about to be reclaimed the metadata API Endpoint http://169.254.169.254/latest/meta-data/spot/instance-action returns with a 200 response along with a JSON body that looks like:
{"action":"terminate","time":"2020-05-07T04:38:00.078Z"}
The AWS Termination Handler polls the metadata API for this event as can be seen here: https://github.com/aws/aws-node-termination-handler/blob/v1.3.1/pkg/ec2metadata/ec2metadata.go#L135-L153
With this information, we know that we can simulate this event by mocking the response of the metadata API and configuring the instance to hit our mock response.
Being the l̶a̶z̶y̶ efficient programmer that I am, I found a service that already does this. I can then deploy this service to my Kubernetes cluster by applying the following manifest
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ec2-spot-termination-simulator
name: spot-term-simulator
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: ec2-spot-termination-simulator
template:
metadata:
labels:
app: ec2-spot-termination-simulator
spec:
containers:
- image: shoganator/ec2-spot-termination-simulator:1.0.1
imagePullPolicy: Always
name: spot-term-simulator
env:
- name: PORT
value: "80"
dnsPolicy: ClusterFirst
restartPolicy: Always
terminationGracePeriodSeconds: 1
---
apiVersion: v1
kind: Service
metadata:
labels:
app: spot-term-simulator
name: ec2-spot-termination-simulator
spec:
ports:
- name: http
port: 8082
nodePort: 30000
protocol: TCP
targetPort: 80
selector:
app: ec2-spot-termination-simulator
sessionAffinity: None
type: NodePort
This deploys the mock service but I still need to override the metadata API, before we do this we need to figure out which node the service is running on and shell into that node.
In my cluster I don’t have SSH access to the Kubernetes instances by design, but I can still gain access to the instance configuration through the aws-node pods. I open up a shell in one of these pods through the command:
kubectl exec -it aws-node-<id> bash
We then need to install 2 yum packages so that we can reroute the metadata API, net-tools and socat:
yum install -y net-tools
yum install -y socat
We can finally reroute the API traffic and test that our Node Termination Handler and Cluster Autoscaler work correctly by running the commands:
ifconfig lo:0 169.254.169.254 up
socat TCP4-LISTEN:80,fork TCP4:127.0.0.1:30000
As soon as we run these commands, we can see that the cluster has started draining the pods, however, I don’t have enough capacity in my cluster to immediately reschedule the pods which causes some pods to be pending
Our cluster autoscaler, however, notices this and immediately asks for new nodes; 90 seconds later our node is ready to go and schedules the pending pods:
Now I’m happy with my configuration and can sleep in peace knowing that spot reclamations are not going to ruin my night.