April 9, 2023 10:48 am

Povilas

Introduction

The Four Golden Signals and the RED method are fantastic for monitoring user-facing request-driven services. But how do you apply The Four Golden Signals to non-request-driven services, such as – Kubernetes Controllers?

What are Kubernetes Controllers?

Kubernetes Controllers are designed to manage the lifecycle of certain resources, like Deployments and Services. They are responsible for keeping the cluster’s desired state in sync with its actual state. For example, the following code snippet shows a simple Controller loop:

for {
  desired := getDesiredState()
  current := getCurrentState()
  makeChanges(desired, current)
}

Note: this example is taken from the following documentation.

So the interesting problem with Controllers is that there is this dissociation of the request from the thing happening. For example, when you submit a request, it gets stored in the database as the desired state, and you receive a response. Then after some time, the Controller picks the request up, sees what has changed, and executes operations. This dissociation makes it hard to monitor Controller applications using the Four Golden Signals or the RED Method since you only store the data in the request flow.

So instead, you should monitor the makeChanges part of the Controller loop. And to do that, you can use the fact that Kubernetes Controllers are mostly written using client-go Informer framework that uses internal work queues and workers poping items from the queues. The following chart shows what is happening inside the Controller:

Description of Kubernetes client-go Informer framework.

The Four Golden Signals for Kubernetes Controllers

So to apply the Four Golden Signals method, I propose that you monitor the internal work queues. For example, you can watch the following metrics:

  • Latency – How long does it take to process an event from the work queue?
  • Errors – How often does the Controller retry an item due to an error?
  • Requests – How much work does the Controller get in its queue?
  • Saturation – How many items are in the queue? Or how long an item sits in the queue?


In practice, this means that you can use client-go work queue metrics. If you are using Prometheus, typically, you can track these signals via the following PromQL queries:

Latency:If you

histogram_quantile(0.99, sum(rate(workqueue_work_duration_seconds_bucket{job=‚Äúcontroller’}[5m])) by (name, le))

Errors:

sum(rate(workqueue_retries_total{job="controller'}[5m]) by (name))

Requests:

sum(rate(workqueue_retries_total{job="controller'}[5m])) by (name)

Saturation (queue depth):

sum by(workqueue_depth{job="controller-manager'}) (name)

Saturation (how long an item sits in a queue):

histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{job="controller"}[5m])) by (name, le))

Conclusion

Kubernetes Controllers are responsible for keeping the cluster’s desired state in sync with its actual state. The dissociation of the request from the thing happening makes it difficult to monitor them using the Four Golden Signals or the RED Method. Instead, one can use client-go Informer framework metrics to monitor the internal work queue’s latency, errors, requests, and saturation signals.

References

Sign up and never miss an article 

About the Author

I'm Povilas Versockas, a software engineer, blogger, Certified Kubernetes Administrator, CNCF Ambassador, and a computer geek.

>