Tracking request duration with Prometheus

I recently started using Prometheus for instrumenting and I really like it! It has a cool concept of labels,  a functional query language & a bunch of very useful functions like rate(), increase() & histogram_quantile().

Adding metrics to your app is easy,  just import prometheus client and register metrics HTTP handler http.Handle("/metrics", promhttp.Handler())

This one-liner adds HTTP /metrics endpoint to HTTP router. By default client exports memory usage, number of goroutines, gc information and other runtime information. Exporting metrics as HTTP endpoint makes the whole dev/test lifecycle easy, as it is really trivial to check whether your newly added metric is now exposed.

One thing I struggled on is how to track request duration.

Prometheus doesn’t have a built in Timer metric type, which is often available in other monitoring systems. It has only 4 metric types: Counter, Gauge, Histogram and Summary.

It turns out that client library allows you to create a timer using: prometheus.NewTimer(o Observer) and record duration using ObserveDuration() method. Provided Observer can be either Summary, Histogram or a Gauge. So, which one to use?


In Prometheus Histogram is really a cumulative histogram (cumulative frequency). Histogram is made of a counter, which counts number of events that happened, a counter for a sum of event values and another counter for each of a bucket. Buckets count how many times event value was less than or equal to the bucket’s value.

Let’s take a look at the example:

Imagine that you create a histogram with 5 buckets with values: 0.5, 1, 2, 3, 5. Let’s call this histogram http_request_duration_seconds and 3 requests come in with durations 1s, 2s, 3s. Then you would see that /metrics endpoint contains:

# HELP http_request_duration_seconds request duration histogram
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.5"} 0
http_request_duration_seconds_bucket{le="1"} 1
http_request_duration_seconds_bucket{le="2"} 2
http_request_duration_seconds_bucket{le="3"} 3
http_request_duration_seconds_bucket{le="5"} 3
http_request_duration_seconds_bucket{le="+Inf"} 3
http_request_duration_seconds_sum 6
http_request_duration_seconds_count 3

Here we can see that:

sum is 1s + 2s + 3s = 6,

count is 3, because of 3 requests

bucket {le=”0.5″}  is 0, because none of the requests where <= 0.5 seconds

bucket {le=”1″}  is 1, because one of the requests where <= 1 seconds

bucket {le=”2″}  is 2, because two of the requests where <= 2 seconds

bucket {le=”3″}  is 3, because all of the requests where <= 3 seconds

So what can we do with it?

Thirst thing to note is that when using Histogram we don’t need to have a separate counter to count total HTTP requests, as it creates one for us.

We could calculate average request time by dividing sum over count. In PromQL it would be:

http_request_duration_seconds_sum / http_request_duration_seconds_count

Also we could calculate percentiles from it. Prometheus comes with a handy histogram_quantile function for it. For example calculating 50% percentile (second quartile) for last 10 minutes in PromQL would be:

histogram_quantile(0.5, rate(http_request_duration_seconds_bucket[10m])

Which results in 1.5.

Wait, 1.5? Shouldn’t it be 2? (50th percentile is supposed to be the median, the number in the middle)

I even computed the 50th percentile using cumulative frequency table (what I thought prometheus is doing) and still ended up with 2.

As it turns out, this value is only an approximation of computed quantile. You can find more information on what type of approximations prometheus is doing in histogram_quantile doc.

By the way, be warned that percentiles can be easily misinterpreted.

It’s important to understand that creating a new histogram requires you to specify bucket boundaries up front. The default values, which are 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10 are tailored to broadly measure the response time in seconds and probably won’t fit your app’s behavior.

This creates a bit of a chicken or the egg problem, because you cannot know bucket boundaries until you launched the app and collected latency data and you cannot make a new Histogram without specifying (implicitly or explicitly) the bucket values. So I guess the best way to move forward is launch your app with default bucket boundaries, let it spin for a while and later tune those values based on what you see.


A Summary is like a histogram_quantile() function, but percentiles are computed in the client. Summary is made of a count and sum counters (like in Histogram type) and resulting quantile values.

For example, we want to find 0.5, 0.9, 0.99 quantiles and the same 3 requests with 1s, 2s, 3s durations come in.

Then /metrics endpoint should contain:

http_request_duration_seconds{quantile="0.5"} 2
http_request_duration_seconds{quantile="0.9"} 3
http_request_duration_seconds{quantile="0.99"} 3
http_request_duration_seconds_sum 6
http_request_duration_seconds_count 3

So we can see that:

sum is 1s + 2s + 3s = 6,

count is 3, because of 3 requests.

{quantile=”0.5″} is 2, meaning 50th percentile is 2.

{quantile=”0.9″} is 3, meaning 90th percentile is 3.

{quantile=”0.99″} is 3, meaning 99th percentile is 3. 

Summaries are great if you already know what quantiles you want. You just specify them in SummaryOpts objectives map with it’s error window. For example:

map[float64]float64{0.5: 0.05}, which will compute 50th percentile with error window of 0.05.

Of course there are a couple of other parameters you could tune (like MaxAge, AgeBuckets or BufCap), but defaults should be good enough.

Although, there are a couple of problems with this approach.

First, you really need to know what percentiles you want. Because if you want to compute a different percentile, you will have to make changes in your code.

The other problem is that you cannot aggregate Summary types, i.e. if you have more than one replica of your app running you won’t be able to compute quantiles across all of the instances.

I usually don’t really know what I want, so I prefer to use Histograms.

By the way, the default go_gc_duration_seconds, which measures how long garbage collection took is implemented using Summary type.


Although Gauge doesn’t really implement Observer interface, you can make it using prometheus.ObserverFunc(gauge.Set).

If we had the same 3 requests with 1s, 2s, 3s durations. The /metrics would contain:

# HELP http_request_duration_seconds Duration of the last http request
# TYPE http_request_duration_seconds gauge
http_request_duration_seconds 3

So we can see that:

http_request_duration_seconds is 3, meaning that last observed duration was 3.

At first I thought, this is great, I’ll just record all my request durations this way and aggregate/average out them later.

Well, no.

The mistake here is that Prometheus scrapes /metrics data only once in a while (by default every 1 min), which is configured by scrap_interval for your target.

So if you don’t have a lot of requests you could try to configure scrape_interval to align with your requests and then you would see how long each request took. But I don’t think it’s a good idea, in this case I would rather push the Gauge metrics to Prometheus.

I think this could be useful for “job” type problems . For example, you could push how long backup, or data aggregating job has took.


Thanks for reading. Hopefully by now you and I know a bit more about Histograms, Summaries and tracking request duration. My plan for now is to track latency using Histograms, play around with histogram_quantile  and make some beautiful dashboards.

Oh and I forgot to mention, if you are instrumenting HTTP server or client, prometheus library has some helpers around it in promhttp package.