May 1, 2023 10:24 pm

Povilas

What are Kubernetes Controllers?

Kubernetes Controllers are designed to manage the lifecycle of certain resources, like Deployments and Services. They are responsible for keeping the cluster’s desired state in sync with its actual state. For example, the following code snippet shows a simple Controller loop:

for {

  desired := getDesiredState()

  current := getCurrentState()

  makeChanges(desired, current)

}

Note: this example is taken from the following documentation.

So the interesting problem with Controllers is that there is this dissociation of the request from the thing happening. So, for example, when you submit a request, it gets stored in the database as the desired state, and you receive a response. Then after some time, the Controller picks the request up, sees what has changed, and executes operations.

This dissociation makes it hard to measure the latency of Controller applications. Since the usual monitoring methods focus on direct request-response patterns, they might not be suitable for capturing the latency of Controllers. Next, we look into how Kube Proxy solves this problem when measuring the latency of programming network endpoints.

What is Kube Proxy?

Kube Proxy is a controller-type application that runs on every Node and manages node-level networking rules and routes. These routes make the Kubernetes Service object live – routing and load-balancing traffic between the pods.

Kube Proxy latency of network programming

Before we look into how Kube Proxy measures latency, let’s first take a look at the following Kube Proxy’s work-in-progress Service Level Objective:

Latency of programming in-cluster load balancing mechanism (e.g. iptables), measured from when service spec or list of its Ready pods change to when it is reflected in load balancing mechanism, measured as 99th percentile over last 5 minutes aggregated across all programmers.

Note: This is taken from https://github.com/kubernetes/community/blob/master/sig-scalability/slos/network_latency.md.

To measure the above SLO, Kubernetes developers used an interesting technique that can be used to monitor Kubernetes Controller latency. The technique is this:

When somebody updates a Service, or an Endpoint, Endpoint Controller running inside Controller Manager adds a special endpoints.kubernetes.io/last-change-trigger-time annotation, which holds the time of the change. The following example shows the Endpoints object holding the annotation:

apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    endpoints.kubernetes.io/last-change-trigger-time: "2023-04-25T09:08:53Z"
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: CoreDNS
  name: kube-dns
  namespace: kube-system
subsets:
...

Then Kube Proxy uses the annotation when applying the change to the Node. Kube Proxy uses the endpoints.kubernetes.io/last-change-trigger-time annotation to compute how long it took to apply the change and exposes it via network_programming_duration_seconds Prometheus histogram metric. This approach gives you an excellent way to measure latency from when a user requested a change to when it was actually applied.

The technique to monitor Kubernetes Controller Latency

So, you can generalize this monitoring technique for generic Kubernetes Controllers:

  1. When storing or updating Custom Resource in Kubernetes, you store the change time in an annotation.
  2. On the Controller side, when applying the change, you need to use the annotation to measure the actual time it took to apply the request and expose a metric.

Conclusion

What’s nice about this technique is that if your Controller is unavailable for a few hours, it will accurately measure the time it took to service the request. Additionally, you can expand this technique to Tracing. In addition to adding last-change-trigger-time annotation, you can pass the trace header in another annotation, connecting the entire flow in a single trace.

What sucks about this technique is that there is no generic metric that works with other controller-type applications. Like http_request_duration_seconds for HTTP services. Thus, for every Controller, you need to understand the application and implement this technique, which means for many open-source operators, these metrics are unavailable.

Alternatively, you can monitor Kubernetes Controllers using work queue metrics. See the How To Monitor Kubernetes Controllers blog post to learn more.

Sign up and never miss an article 

About the Author

I'm Povilas Versockas, a software engineer, blogger, Certified Kubernetes Administrator, CNCF Ambassador, and a computer geek.

>