Advanced Standalone Kubelet tutorial for Raspberry Pi

So recently I adapted Kelsey Hightower’s Standalone Kubelet Tutorial for Raspberry Pi.
Standalone Kubelet Tutorial for Raspberry Pi is a prerequisite for this tutorial, as I’m going to skip Linux installation and all the other parts.

In this tutorial I will show you how to cross compile Kubernetes Kubelet to ARM architecture and we will run the amazing Prometheus, Node Exporter and Grafana using static pods.

I am running Ubuntu, so this should generally work for people running Ubuntu or other Linux distributions. If you are running Windows I have no idea whether this will work, but you can try ūüôā I think there could be issues in building Kubelet and/or cross compiling it for ARM.

You should subsitute IP for your Raspberry Pi’s IP address or hostname.

Compiling Kubernetes Kubelet for arm

Clone the Kubernetes project:

git clone git@github.com:kubernetes/kubernetes.git

You can checkout specific version branch if you like (e.g. git checkout release-1.10), I will be running master in this tutorial.

Compile Kubernetes for ARM:

cd kubernetes

Build Kubernetes for your current platform:

make

To cross compile you will need to have gcc-arm-linux-gnueabihf package installed.
On Ubuntu/Debian you can install via:

sudo apt-get install gcc-arm-linux-gnueabihf

Cross compile Kubelet for ARM:

make all WHAT=cmd/kubelet KUBE_BUILD_PLATFORMS=linux/arm

Copy Kubelet binary to your Raspberry Pi:

scp ./_output/local/bin/linux/arm/kubelet IP:~/

Installing Kubelet

Connect to Raspberry Pi:

ssh alarm@IP

Move Kubelet to /usr/bin:

sudo mv kubelet /usr/bin/kubelet

Download Kubelet’s config file and create /etc/kubernetes from previous tutorial, if you haven’t already.

Download the Kubelet systemd file:

wget -q --show-progress --https-only --timestamping \
  https://raw.githubusercontent.com/povilasv/advanced-raspberrypi-standalone-kubelet/master/kubelet.service

Move the kubelet.service unit file to the systemd configuration directory:

sudo mv kubelet.service /etc/systemd/system/

Start the kubelet service:

sudo systemctl daemon-reload

sudo systemctl enable kubelet

sudo systemctl start kubelet

Verification

Verify the kubelet is running:

sudo systemctl status kubelet

View logs for Kubelet service:

journalctl -u kubelet

Check that no containers are running:

docker ps

Installing Prometheus and friends

Prometheus

Create Prometheus data dir:

sudo mkdir /root/prometheus-data/

Download Prometheus config:

wget -q --show-progress --https-only --timestamping \
  https://raw.githubusercontent.com/povilasv/advanced-raspberrypi-standalone-kubelet/master/config/prometheus.yml

Move prometheus.yml to /root/prometheus-data/:

sudo mv prometheus.yml /root/prometheus-data/prometheus.yml

Download Prometheus manifest:

wget -q --show-progress --https-only --timestamping \
  https://raw.githubusercontent.com/povilasv/advanced-raspberrypi-standalone-kubelet/master/pods/prometheus.yaml

Move the prometheus.yaml pod manifest to the Kubelet manifest directory:

sudo mv prometheus.yaml /etc/kubernetes/manifests/prometheus.yaml

Wait for image to download and list containers:

docker ps

You should see two containers running which represent the prometheus pod and a kubelet container. Docker does not understand pods so the containers are listed as individual containers following the Kubernetes naming convention.

Open up browser and go to:

http://IP:9090/status

You should see Prometheus WEB UI.

Open up targets page:

http://ip:9090/targets

You should see 1 job as UP, and 2 jobs as DOWN. This is expected as we haven’t launched node exporter and grafana yet.

Node Exporter

Download Prometheus Node Exporter manifest:

wget -q --show-progress --https-only --timestamping \
  https://raw.githubusercontent.com/povilasv/advanced-raspberrypi-standalone-kubelet/master/pods/nodeexporter.yaml

Move the nodeexporter.yaml pod manifest to the Kubelet manifest directory:

sudo mv nodeexporter.yaml /etc/kubernetes/manifests/nodeexporter.yaml

Wait for image to download and list containers:

docker ps

You should see additional two containers running which represent the node exporter pod.

Open up Prometheus targets page:

http://ip:9090/targets

You should see 2 jobs as UP, and 1 job as DOWN.

Grafana

Download Grafana manifest:

wget -q --show-progress --https-only --timestamping \
  https://raw.githubusercontent.com/povilasv/advanced-raspberrypi-standalone-kubelet/master/pods/grafana.yaml

Move the grafana.yaml pod manifest to the Kubelet manifest directory:

sudo mv grafana.yaml /etc/kubernetes/manifests/grafana.yaml

Wait for image to download and verify it is running:

docker ps

You should see additional two containers running which represent the grafana pod.

Open up Prometheus targets page:

http://ip:9090/targets

You should see 3 jobs as UP.

Go to Grafana page:

http://IP:3000/login

Enter `admin`/`admin`.

Add data source:

Click on Add Source and enter:

Name: prom
URL: http://127.0.0.1:9090
Access: proxy
Scrape interval: 30s

Adding dashboards:

Click on Dashboards tab in `add data source page`:

Import `Prometheus 2.0 Stats` and `Grafana metrics` dashboards.

Hover on a left `+` button and click `Import`.

Enter `5573`.

Set `prometheus` as `prom`.

Take a look at those 3 dashboards in Grafana UI.

You should see something like:

Conclusion

Here we are at the end of the journey.

We went through a lot together: modifying archlinux kernel parameters, checking cgroup stats via lxc-checkconfig, crosscompiling Kubelet for ARM and actually running Prometheus and friends on a Raspberry Pi.

Hope you enjoyed the journey and see you next time!

Running Prometheus Node exporter on a router

So at home I have Asus RT-N14U router, which has 600 Mhz MIPS architecture CPU, 16mb of flash storage and 64mb of RAM. On the software side, I’m running Andy Padavan’s RT-N56U firmware (Thank you, Andy!), this firmware gives you all the router features, I think raw Linux, ssh and some basic tools.

So Node exporter is written in Go, why not just go build ? So what I did is just simply cross compile into MIPS little endian architecture:

git clone git@github.com:prometheus/node_exporter.git

cd node_exporter
GOARCH='mipsle' GOOS=linux go build

Success! We have a node_exporter executable. By the way, you can checkout specific version (eg git checkout release-0.16) if you like, I usually run master for the latest and greatest ūüėČ
So now, let’s just copy the built executable into the router

scp node_exporter router:~/
node_exporter                                               89%   13MB   2.0MB/s   00:00
ETAscp: /home/admin//node_exporter: No space left on device
node_exporter

Turns out that I only have 1MB of space in my home dir:

Filesystem                Size      Used Available Use% Mounted on
rootfs                    8.5M      8.5M         0 100% /
/dev/root                 8.5M      8.5M         0 100% /
tmpfs                     8.0K         0      8.0K   0% /dev
tmpfs                     2.0M    180.0K      1.8M   9% /etc
tmpfs                     1.0M      8.0K   1016.0K   1% /home
tmpfs                     8.0K         0      8.0K   0% /media
tmpfs                     8.0K         0      8.0K   0% /mnt
tmpfs                    24.0M     76.0K     23.9M   0% /tmp
tmpfs                     4.0M    192.0K      3.8M   5% /var

and Node exporter’s executable size is around ~15mb, So, I attached usb, copied file scp node_exporter router:/media/UBUNTU_17_0/ into mounted usb and tried to launch it:

time="2018-04-14T15:15:00Z" level=info msg="Starting node_exporter (version=, branch=, revision=)" source="node_exporter.go:82"
time="2018-04-14T15:15:00Z" level=info msg="Build context (go=go1.10.1, user=, date=)" source="node_exporter.go:83"
time="2018-04-14T15:15:00Z" level=info msg="Enabled collectors:" source="node_exporter.go:90"
time="2018-04-14T15:15:00Z" level=info msg=" - arp" source="node_exporter.go:97"
time="2018-04-14T15:15:00Z" level=info msg=" - bcache" source="node_exporter.go:97"
time="2018-04-14T15:15:00Z" level=info msg=" - bonding" source="node_exporter.go:97"
time="2018-04-14T15:15:00Z" level=info msg=" - conntrack" source="node_exporter.go:97"
time="2018-04-14T15:15:00Z" level=info msg=" - cpu" source="node_exporter.go:97"


time=”2018-04-14T15:15:00Z” level=info msg=” – vmstat” source=”node_exporter.go:97″
time=”2018-04-14T15:15:00Z” level=info msg=” – wifi” source=”node_exporter.go:97″
time=”2018-04-14T15:15:00Z” level=info msg=” – xfs” source=”node_exporter.go:97″
time=”2018-04-14T15:15:00Z” level=info msg=” – zfs” source=”node_exporter.go:97″
time=”2018-04-14T15:15:00Z” level=info msg=”Listening on :9100″ source=”node_exporter.go:111″

Woohoo it’s working!

Let’s add a Grafana dashboard, to look at those exported metrics:

I’ve used Host Stats – Prometheus Node Exporter grafana dashboard, just had to change it, as all of the labels have changed in recent release of exporter. You can find this fixed dashboard in https://grafana.com/dashboards/5573.

Getting back to the Node exporter, so now that we have node exporter running, how do we start it on router boot?
This Linux is very limited and doesn’t support systemd. So, looking at what other options are available I found that there is crontab (you need to enable it via UI). So first thing I tried:

@reboot /media/UBUNTU_17_0/node_exporter >> /media/UBUNTU_17_0/node_exporter.log 2>&1

Restarted router and nope, @reboot doesn’t seem to work, no surprises here, as this router is very limited.

So this leads us doing things “the old way”: running a cron script every minute or so checking for process existence and starting if it’s not there.

Here is the script:

#!/bin/bash

pidof node_exporter

if [[ $? -ne 0 ]] ; then
        /media/UBUNTU_17_0/node_exporter >> /media/UBUNTU_17_0/node_exporter.log 2>&1 & 
fi

And the crontab:

* * * * * /media/UBUNTU_17_0/ne.sh         

Thats all, thanks for reading!

Exploring Prometheus Go client metrics

In this post I want to explore go metrics, which are exported by client_golang via promhttp.Handler() call.
Here is a sample program, registering prom handler and listening on 8080 port:

package main

import (
    "log"
    "net/http"

    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
    http.Handle("/metrics", promhttp.Handler())
    log.Fatal(http.ListenAndServe(":8080", nil))
}

When you hit your metrics endpoint, you will get something like:

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.5101e-05
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 6
...
process_open_fds 12
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.1272192e+07
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 4.74484736e+08

On initialisation client_golang registers 2 Prometheus collectors:

  • Process Collector – which collects basic Linux process information like CPU, memory,¬†file descriptor usage and start time.
  • Go Collector – which collects information about Go’s runtime like details about GC, number of gouroutines and OS threads.

Process Collector

What this collector does is reads proc file system. proc file system exposes internal kernel data structures, which is used to obtain information about the system.1

So Prometheus client reads /proc/PID/stat file, which looks like this:

1 (sh) S 0 1 1 34816 8 4194560 674 43 9 1 5 0 0 0 20 0 1 0 89724 1581056 209 18446744073709551615 94672542621696 94672543427732 140730737801568 0 0 0 0 2637828 65538 1 0 0 17 3 0 0 0 0 0 94672545527192 94672545542787 94672557428736 140730737807231 140730737807234 140730737807234 140730737807344 0

You can get human readable variant of this information using cat /proc/PID/status.

process_cpu_seconds_total – it uses utime – number of ticks executing code in user mode, measured in jiffies, with stime – jiffies spent in the system mode, executing code on behalf of the process (like doing system calls). A jiffy is the time between two ticks of the system timer interrupt. 2

process_cpu_seconds_total equals to sum of utime and stime and divide by USER_HZ. This makes sense, as dividing number of scheduler ticks by Hz(ticks per second) produces total time in seconds operating system has been running the process. 3

process_virtual_memory_bytes – uses vsize – virtual memory size is the amount of address space that a process is managing. This includes all types of memory, both in RAM and swapped out.

process_resident_memory_bytes – multiplies rss – resident set memory size is number of memory pages the process has in real memory, with pagesize 4. This results in the amount of memory that belongs specifically to that process in bytes. This excludes swapped out memory pages.

process_start_time_seconds – uses start_time – time the process started after system boot, which is expressed in jiffies and btime from /proc/stat which shows the time at which the system booted in seconds since the Unix epoch. start_time is divided by USER_HZ in order to get the value in seconds.

process_open_fds – counts the number of files in /proc/PID/fd directory. This shows currently open regular files, sockets, pseudo terminals, etc.

process_max_fds – reads /proc/{PID}/limits and uses Soft Limit from “Max Open Files” row. The interesting bit here is that /limits lists Soft and Hard limits.
As it turns out, the soft limit is the value that the kernel enforces for the corresponding resource and the Hard limit acts as a ceiling for the soft limit.
An unprivileged process may only set its soft limit to a value up to the hard limit and (irreversibly) lower its Hard limit. 5

In Go you can use err = syscall.Setrlimit(syscall.RLIMIT_NOFILE, &syscall.Rlimit{Cur: 9, Max: 10}) to set your limit.

Go Collector

Go Collector’s most of the metrics are taken from runtime, runtime/debug packages.

go_goroutines – calls out to runtime.NumGoroutine(), which computes the value based off the scheduler struct and global allglen variable. As all the values in sched struct can be changed concurently there is this funny check where if computed value is less than 1 it becomes 1.

go_threads – calls out to runtime.CreateThreadProfile(), which reads off global allm variable. If you don’t know what M’s or G’s you can read my blogpost about it.

go_gc_duration_seconds – calls out to debug.ReadGCStats() with PauseQuantile set to 5, which returns us the minimum, 25%, 50%, 75%, and maximum pause times. Then it manualy creates a Summary type from pause quantiles, NumGC var and PauseTotal seconds. It’s cool how well GCStats struct fits the prom’s Summary type. 6

go_info – this provides us with Go version. It’s pretty clever, it calls out to runtime.Version() and set’s that as a version label and then always returns value of 1 for this gauge metric.

Memory

Go Collector provides us with a lot of metrics about memory and GC.
All those metrics are from runtime.ReadMemStats(), which gives us metrics from MemStats struct.
One thing that worries me, is that runtime.ReadMemStats() has a explicit call to make a stop-the-world pause 7.
So I wonder how much actual cost this pause introduces?
As during stop-the-world pause, all goroutines are paused, so that GC can run. I’ll probably do a comparison of an app with and without instrumentation in a later post.

We already seen that Linux provides us with rss/vsize metrics for memory stats, so naturally the question arises which metrics to use, the ones provided in MemStats or rss/vsize?

The good part about resident set size and virtual memory size is that it’s based off Linux primitives and is programming language agnostic.
So in theory you could instrument any program and you would know how much memory it consumes (as long as you name your metrics consistently, ie process_virtual_memory_bytes and process_resident_memory_bytes.).
In practice, however when Go process starts up it takes a lot of virtual memory beforehand, such a simple program like the one above takes up to 544MiB of vsize on my machine (x86_64 Ubuntu), which is a bit confusing. RSS shows around 7mib, which is closer to the actual usage.

On the other hand using Go runtime based metrics gives more fined grained information on what is happening in your running application.
You should be able to find out more easily whether your program has a memory leak, how long GC took, how much it reclaimed.
Also, it should point you into right direction when you are optimizing program’s memory allocations.

I haven’t looked in detail how Go GC and memory model works, a part of it’s concurrency model 8, so this bit is still new to me.

So let’s take a look at those metrics:

go_memstats_alloc_bytes – a metric which shows how much bytes of memory is allocated on the Heap for the Objects. The value is same as go_memstats_heap_alloc_bytes. This metric counts all reachable heap objects plus unreachable objects, GC has not yet freed.

go_memstats_alloc_bytes_total – this metric increases as objects are allocated in the Heap, but doesn‚Äôt decrease when they are freed. I think it is immensly useful, as it is only increasing number and has same nice properties that Prometheus Counter has. Doing rate() on it should show us how many bytes/s of memory app consumes and is ‚Äúdurable‚ÄĚ across restarts and scrape misses.

go_memstats_sys_bytes – it’s a metric, which measures how many bytes of memory in total is taken from system by Go. It sums all the sys metrics described below.

go_memstats_lookups_total – counts how many pointer dereferences happened. This is a counter value so you can use rate() to lookups/s.

go_memstats_mallocs_total – shows how many heap objects are allocated. This is a counter value so you can use rate() to objects allocated/s.

go_memstats_frees_total – shows how many heap objects are freed. This is a counter value so you can use rate() to objects allocated/s. Note you can get number of live objects with go_memstats_mallocs_totalgo_memstats_frees_total.

Turns out that Go organizes memory in spans, which are contiguous regions of memory of 8K or larger. There are 3 types of Spans:
1) idle – span, that has no objects and can be released back to the OS, or reused for heap allocation, or reused for stack memory.
2) in use – span, that has atleast one heap object and may have space for more.
3) stack – span, which is used for goroutine stack. This span can live either in stack or in heap, but not in both.

Heap memory metrics

go_memstats_heap_alloc_bytes – same as go_memstats_alloc_bytes.

go_memstats_heap_sys_bytes – bytes of memory obtained for the heap from OS. This includes virtual address space that has been resevered, but not yet used and virtual address space which was returned to OS after it became unused. This metric estimates the largest size the heap.

go_memstats_heap_idle_bytes – shows how many bytes are in idle spans.

go_memstats_heap_idle_bytes minus go_memstats_heap_released_bytes estimates how many bytes of memory could be released, but is kept by runtime, so that runtime can allocate objects on the heap without asking OS for more memory.

go_memstats_heap_inuse_bytes – shows how many bytes in in-use spans.

go_memstats_heap_inuse_bytes minus go_memstats_heap_alloc_bytes shows how many bytes of memory has been allocated for the heap, but not currently used.

go_memstats_heap_released_bytes – shows how many bytes of idle spans were returned to the OS.

go_memstats_heap_objects – shows how many objects are allocated on the heap. This changes as GC is performed and new objects are allocated.

Stack memory metrics

go_memstats_stack_inuse_bytes – shows how many bytes of memory is used by stack memory spans, which have atleast one object in them. Go doc says, that stack memory spans can only be used for other stack spans, i.e. there is no mixing of heap objects and stack objects in one memory span.

go_memstats_stack_sys_bytes – shows how many bytes of stack memory is obtained from OS. It’s go_memstats_stack_inuse_bytes plus any memory obtained for OS thread stack.

There is no go_memstats_stack_idle_bytes, as unused stack spans are counted towards go_memstats_heap_idle_bytes.

Off-heap memory metrics

These metrics are bytes allocated for runtime internal structures, that are not allocated on the heap, because they implement the heap.

go_memstats_mspan_inuse_bytes – shows how many bytes are in use by mspan structures.

go_memstats_mspan_sys_bytes – shows how many bytes are obtained from OS for mspan structures.

go_memstats_mcache_inuse_bytes – shows how many bytes are in use by mcache structures.

go_memstats_mcache_sys_bytes – shows how many bytes are obtained from OS for mcache structures.

go_memstats_buck_hash_sys_bytes – shows how many bytes of memory are in bucket hash tables, which are used for profiling.

go_memstats_gc_sys_bytes – shows how many in garbage collection metadata.

go_memstats_other_sys_bytesgo_memstats_other_sys_bytes shows how many bytes of memory are used for other runtime allocations.

go_memstats_next_gc_bytes – shows the target heap size of the next GC cycle. GC’s goal is to keep go_memstats_heap_alloc_bytes less than this value.

go_memstats_last_gc_time_seconds – contains unix timestamp when last GC finished.

go_memstats_last_gc_cpu_fraction – shows the fraction of this program’s available CPU time used by GC since the program started.
This metric is also provided in GODEBUG=gctrace=1.

Playing around with numbers

So it’s a lot of metrics and a lot of information.
I think the best way to learn is to just play around with it, so in this part I’ll do just that.
So I’ll be using the same program that is above.
Here are the dump from /metrics (edited for space), which I’m going to use:

process_resident_memory_bytes 1.09568e+07

process_virtual_memory_bytes 6.46668288e+08

go_memstats_heap_alloc_bytes 2.24344e+06

go_memstats_heap_idle_bytes 6.3643648e+07

go_memstats_heap_inuse_bytes 3.039232e+06

go_memstats_heap_objects 6498

go_memstats_heap_released_bytes 0

go_memstats_heap_sys_bytes 6.668288e+07

go_memstats_lookups_total 0

go_memstats_frees_total 12209

go_memstats_mallocs_total 18707

go_memstats_buck_hash_sys_bytes 1.443899e+06

go_memstats_mcache_inuse_bytes 6912

go_memstats_mcache_sys_bytes 16384

go_memstats_mspan_inuse_bytes 25840

go_memstats_mspan_sys_bytes 32768

go_memstats_other_sys_bytes 1.310909e+06

go_memstats_stack_inuse_bytes 425984

go_memstats_stack_sys_bytes 425984

go_memstats_sys_bytes 7.2284408e+07

go_memstats_next_gc_bytes 4.194304e+06

go_memstats_gc_cpu_fraction 1.421928536233557e-06

go_memstats_gc_sys_bytes 2.371584e+06

go_memstats_last_gc_time_seconds 1.5235057190167596e+09

rss = 1.09568e+07 = 10956800 bytes = 10700 KiB = 10.4 MiB

vsize = 6.46668288e+08 = 646668288 bytes = 631512 KiB = 616.7 MiB

heap_alloc_bytes = 2.24344e+06 = 2243440 = 2190 KiB = 2.1 MiB

heap_inuse_bytes = 3.039232e+06 = 3039232 = 2968 KiB = 2,9 MiB

heap_idle_bytes = 6.3643648e+07 = 63643648 = 62152 KiB = 60.6 MiB

heap_released_bytes = 0

heap_sys_bytes = 6.668288e+07 = 66682880 = 65120 KiB = 63.6 MiB

frees_total = 12209

mallocs_total = 18707

mspan_inuse_bytes = 25840 = 25.2 KiB

mspan_sys_bytes = 32768 = 32 KiB

mcache_inuse_bytes = 6912 = 6.8 KiB

mcache_sys_bytes = 16384 = 12 KiB

buck_hash_sys_bytes = 1.443899e+06 = 1443899 = 1410 KiB = 1.4 MiB

gc_sys_bytes = 2.371584e+06 = 2371584 = 2316 KiB = 2.3 MiB

other_sys_bytes = 1.310909e+06 = 1310909 = 1280,2 KiB = 1.3MiB

stack_inuse_bytes = 425984 = 416 KiB

stack_sys_bytes = 425984 = 416 KiB

sys_bytes = 7.2284408e+07 = 72284408 = 70590.2 KiB = 68.9 MiB

next_gc_bytes = 4.194304e+06 = 4194304 = 4096 KiB = 4 MiB

gc_cpu_fraction = 1.421928536233557e-06 = 0.000001

last_gc_time_seconds = 1.5235057190167596e+09 = Thu, 12 Apr 2018 05:47:59 GMT

Interesting bit is that heap_inuse_bytes is more than heap_alloc_bytes.
I think heap_alloc_bytes shows how many bytes are in terms of objects and heap_inuse_bytes shows bytes of memory in terms of spans.
Dividing heap_inuse_bytes by size of the span gives: 3039232 / 8192 = 371 span.

heap_inuse_bytes minus heap_alloc_bytes, should show the amount of free space that we have in in-use spans, which is 2,9 MiB – 2.1 MiB = 0.8 MiB.
This roughly means that we can allocate 0.8 MiB of objects on the heap without using new memory spans.
But we should keep in mind memory fragmentation.
Imagine if you have a new bytes slice of 10K bytes, the memory could be in the position, where it doesn’t have a contiguous block of 10K bytes + slice header, so it would need use a new span, instead of reusing

heap_idle_bytes minus heap_released_byte shows that we have around 60.6 MiB of unused spans, which are reserved from OS and could be returned to OS. It’s 63643648/8192 = 7769 spans.

heap_sys_bytes, which is 63.6MiB estimates the largest size the heap has had. It’s 66682880/8192 = 8140 spans.

mallocs_total shows that we allocated 18707 objects and freed 12209. So currently, we have 18707-12209 = 6498 objects. We can find the average size of the object dividing heap_alloc_bytes over live objects, which is 6498. The result is 2243440 / 6498 = 345.3 bytes.
(This is probably a stupid metric, as object vary a lot in size and we should do histograms instead)

So sys_bytes should be a sum of all *sys metrics. So let’s check that:
sys_bytes == mspan_sys_bytes + mcache_sys_bytes + buck_hash_sys_bytes + gc_sys_bytes + other_sys_bytes + stack_sys_bytes + heap_sys_bytes.
So, we have 72284408 == 32768 + 16384 + 1443899 + 2371584 + 1310909 + 425984 + 66682880, which is 72284408 == 72284408, which is correct.

The interesting detail about sys_bytes, is that it’s 68,9 MiB it’s how many bytes of memory in total taken from OS. Meanwhile, OS’es vsize gives you 616,7MiB and rss 10.4 MiB. So all these numbers don’t really match up.

As I understand it so part of our memory could be in OS’s memory pages which are in swap or in filesystem (not in RAM), so this would explain why rss is smaller than sys_bytes.

And vsize contains a lot of things, like mapped libc, pthreads libs, etc. You can explore /proc/PID/maps and /proc/PID/smaps file, to see what is being currently mapped.

gc_cpu_fraction is running crazy low, 0.000001 of CPU time is used for GC. That’s really really cool. (Although this program doesn’t produce much garbage)

next_gc_bytes shows that the target for GC is to keep heap_alloc_bytes under 4 MiB, as heap_alloc_bytes is currently at 2.1 MiB the target is achieved.

Conclusion

I love Go and the fact that it exposes so much useful information in it’s packages and simple users like you and me can just call a function and get the information.

It was really cool playing around and reading about linux & Go, so I’m thinking of doing part 2 of this post. Maybe go look into metrics provided by cAdvisor or show how to use some of the metrics described here in dashboards/alerts with Prometheus.

Also, once vgo get’s integrated (and I really really hope it does, cause it’s like the best package manager I ever used). Then we should be able to inspect dependencies from some go runtime package, which would be really cool! Imagine writing a custom prom collector, which would go through all your dependencies, check for new versions and if found wouldgive you back a number of outdated pkgs, something like go_num_outdated_pkgs metric.
This way you could write an alert if your service get’s terribly outdated. Or check that your live dependency hashes don’t match current hashes?

If you like the post, hit the up arrow button on the reddit and see you soon.

  1. You can read more about what data kernel exposes http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html or more technical version here: https://www.kernel.org/doc/Documentation/filesystems/proc.txt
  2. Take a look at http://man7.org/linux/man-pages/man7/time.7.html, which has a great description for jiffies, user time and system time.
  3. The formula is in https://github.com/prometheus/procfs/blob/master/proc_stat.go#L187 . Also, there is an interesting work around in prometheus client definition of USER_HZ in https://github.com/prometheus/procfs/blob/master/proc_stat.go#L10-L25 .
  4. http://man7.org/linux/man-pages/man2/getpagesize.2.html
  5. More docs in https://linux.die.net/man/2/setrlimit
  6. https://golang.org/pkg/runtime/debug/#ReadGCStats
  7. https://github.com/golang/go/blob/master/src/runtime/mstats.go#L458
  8. Weirdly article “The Go Memory Model” talks about accessing memory from different goroutines https://golang.org/ref/mem rather than how memory is allocated etc

Tracking request duration with Prometheus

I recently started using Prometheus for instrumenting and I really like it! It has a cool concept of labels,  a functional query language & a bunch of very useful functions like rate(), increase() & histogram_quantile().

Adding metrics to your app is easy,  just import prometheus client and register metrics HTTP handler http.Handle("/metrics", promhttp.Handler())

This one-liner adds HTTP /metrics endpoint to HTTP router. By default client exports memory usage, number of goroutines, gc information and other runtime information. Exporting metrics as HTTP endpoint makes the whole dev/test lifecycle easy, as it is really trivial to check whether your newly added metric is now exposed.

One thing I struggled on is how to track request duration.

Prometheus doesn’t have a built in Timer metric type, which is often available in other monitoring systems. It has only 4 metric types: Counter, Gauge, Histogram and Summary.

It turns out that client library allows you to create a timer using: prometheus.NewTimer(o Observer) and record duration using ObserveDuration() method. Provided Observer can be either Summary, Histogram or a Gauge. So, which one to use?

Histograms

In Prometheus Histogram is really a cumulative histogram (cumulative frequency). Histogram is made of a counter, which counts number of events that happened, a counter for a sum of event values and another counter for each of a bucket. Buckets count how many times event value was less than or equal to the bucket’s value.

Let’s take a look at the example:

Imagine that you create a histogram with 5 buckets with values:¬†0.5, 1, 2, 3, 5. Let’s call this histogram¬†http_request_duration_seconds¬†and 3 requests come in with durations 1s, 2s, 3s. Then you would see that /metrics¬†endpoint contains:

# HELP http_request_duration_seconds request duration histogram
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.5"} 0
http_request_duration_seconds_bucket{le="1"} 1
http_request_duration_seconds_bucket{le="2"} 2
http_request_duration_seconds_bucket{le="3"} 3
http_request_duration_seconds_bucket{le="5"} 3
http_request_duration_seconds_bucket{le="+Inf"} 3
http_request_duration_seconds_sum 6
http_request_duration_seconds_count 3

Here we can see that:

sum is 1s + 2s + 3s = 6,

count is 3, because of 3 requests

bucket {le=”0.5″} ¬†is 0, because none of the requests where <= 0.5 seconds

bucket {le=”1″} ¬†is 1, because one of the requests where <= 1¬†seconds

bucket {le=”2″} ¬†is 2, because two of the requests where <= 2¬†seconds

bucket {le=”3″} ¬†is 3, because all of the requests where <= 3¬†seconds

So what can we do with it?

Thirst thing to note is that when using Histogram we don’t need to have a separate counter to count total HTTP requests, as it creates one for us.

We could calculate average request time by dividing sum over count. In PromQL it would be:

http_request_duration_seconds_sum / http_request_duration_seconds_count

Also we could calculate percentiles from it. Prometheus comes with a handy histogram_quantile function for it. For example calculating 50% percentile (second quartile) for last 10 minutes in PromQL would be:

histogram_quantile(0.5, rate(http_request_duration_seconds_bucket[10m])

Which results in 1.5.

Wait, 1.5? Shouldn’t it be 2?¬†(50th percentile is supposed to be the median, the number in the middle)

I even computed the 50th percentile using cumulative frequency table (what I thought prometheus is doing) and still ended up with 2.

As it turns out, this value is only an approximation of computed quantile. You can find more information on what type of approximations prometheus is doing in histogram_quantile doc.

By the way, be warned that percentiles can be easily misinterpreted.

It’s important to understand that creating a new histogram requires you to specify bucket boundaries up front. The default values, which are 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10¬†are tailored to broadly measure the response time in seconds and probably won’t fit your app’s behavior.

This creates a bit of a chicken or the egg problem, because you cannot know bucket boundaries until you launched the app and collected latency data and you cannot make a new Histogram without specifying (implicitly or explicitly) the bucket values. So I guess the best way to move forward is launch your app with default bucket boundaries, let it spin for a while and later tune those values based on what you see.

Summary

A Summary is like a histogram_quantile() function, but percentiles are computed in the client. Summary is made of a count and sum counters (like in Histogram type) and resulting quantile values.

For example, we want to find 0.5, 0.9, 0.99 quantiles and the same 3 requests with 1s, 2s, 3s durations come in.

Then /metrics endpoint should contain:

http_request_duration_seconds{quantile="0.5"} 2
http_request_duration_seconds{quantile="0.9"} 3
http_request_duration_seconds{quantile="0.99"} 3
http_request_duration_seconds_sum 6
http_request_duration_seconds_count 3

So we can see that:

sum is 1s + 2s + 3s = 6,

count is 3, because of 3 requests.

{quantile=”0.5″} is 2, meaning 50th percentile is 2.

{quantile=”0.9″} is 3, meaning 90th percentile is 3.

{quantile=”0.99″} is 3, meaning 99th percentile is 3.¬†

Summaries are great if¬†you already know what quantiles you want. You just specify them in¬†SummaryOpts¬†objectives map with it’s error window. For example:

map[float64]float64{0.5: 0.05}, which will compute 50th percentile with error window of 0.05.

Of course there are a couple of other parameters you could tune (like MaxAge, AgeBuckets or BufCap), but defaults should be good enough.

Although, there are a couple of problems with this approach.

First, you really need to know what percentiles you want. Because if you want to compute a different percentile, you will have to make changes in your code.

The other problem is that you cannot aggregate Summary types, i.e. if you have more than one replica of your app running you won’t be able to compute quantiles across all of the instances.

I usually don’t really know what I want, so I prefer to use Histograms.

By the way, the default go_gc_duration_seconds, which measures how long garbage collection took is implemented using Summary type.

Gauge

Although Gauge doesn’t really implement¬†Observer¬†interface, you can make it using¬†prometheus.ObserverFunc(gauge.Set).

If we had the same 3 requests with 1s, 2s, 3s durations. The /metrics would contain:

# HELP http_request_duration_seconds Duration of the last http request
# TYPE http_request_duration_seconds gauge
http_request_duration_seconds 3

So we can see that:

http_request_duration_seconds is 3, meaning that last observed duration was 3.

At first I thought, this is great, I’ll just record all my request durations this way and aggregate/average out them later.

Well, no.

The mistake here is that Prometheus scrapes /metrics data only once in a while (by default every 1 min), which is configured by scrap_interval for your target.

So if you don’t have a lot of requests you could try to configure scrape_interval¬†to align with your requests and then you would see how long each request took. But I don’t think it’s a good idea, in this case I would rather push¬†the Gauge metrics to Prometheus.

I think this could be useful¬†for “job” type problems . For example, you could push how long backup, or data aggregating job has took.

Conclusion

Thanks for reading. Hopefully by now you and I know a bit more about Histograms, Summaries and tracking request duration. My plan for now is to track latency using Histograms, play around with histogram_quantile  and make some beautiful dashboards.

Oh and I forgot to mention, if you are instrumenting HTTP server or client, prometheus library has some helpers around it in promhttp package.

Go scheduler: Ms, Ps & Gs

I decided to learn a bit more about Go internals and since it was long time that somebody wrote about Go scheduler, I though it would be an interesting post. So, let’s get to it!

Basics

The Go runtime manages scheduling, garbage collection, and the runtime environment for goroutines. Here, I will focus only on the scheduler.

Runtime scheduler runs goroutines by mapping them onto operating system threads. Goroutines are lightweight version of threads, with very low cost of starting up. Each goroutine is described by a struct called G, which contains fields necessary to keep track of its stack and current status. So, G = goroutine.

Runtime keeps track of each G and maps them onto Logical Processors, named P. P can be seen as a abstract resource or a context, which needs to be acquired, so that OS thread (called M, or Machine) can execute G .

You can control how many logical processor your runtime has by calling runtime.GOMAXPROCS(numLogicalProcessors), if you are planning to tweak this parameter¬†(you probably shouldn’t), set it once and forget it, because it needs “stop the world” GC pause.

Essentially, Operating System runs¬†threads, which runs your¬†code. The Go’s trick is that compiler inserts calls into Go runtime in various places, e.g. sending a value thru the channel, making a call to¬†runtime package), so that Go can notify scheduler and take action.

This is how go program, runtime & operating system fits together

Note: Idea for figure below taken from Analysis of the Go runtime scheduler.

The dance between Ms, Ps & Gs

The interaction between Ms, Ps & Gs is a bit complicated. Take a look at this amazing workflow graph, which is from  go runtime scheduler slides by Gao Chao.

This is what happens when you create a new goroutine

 

 

Here we can see, that there are two types of queues for¬†G: a global queue¬†in the schedt struct¬†(which is rarely used)¬†and each¬†P¬†maintains a queue¬†of runnable¬†G‘s.

In order to execute a goroutine,¬†M¬†needs to be holding context P. Machine, then just pops it’s¬†P queue of goroutines and executes code.

When you schedule a new goroutine (do a go func() call)¬†it is placed into P‘s¬†queue. There is this interesting work-stealing scheduling algorithm, which runs when¬†M¬†finishes executing some¬†G and then¬†it¬†tries to take another G out of a queue, which is empty, then it randomly chooses another P and tries to steal a half of runnable G’s¬†from it!

Interesting things happen when your goroutine makes a blocking syscall. Blocking syscall will be intercepted, if there are Gs to run, runtime¬†will detach the thread from the P¬†and create a new OS thread (if idle thread doesn’t exist) to service that processor.

When a system calls resumes, the goroutine is placed back into a local run queue and the thread will park itself (meaning thread won’t¬†be¬†running) and insert itself in the list of idle¬†threads.

If a goroutine makes a network call, runtime will do a similar action. The call will be intercepted, but because Go has integrated network poller, which has its own thread, it will be assigned to it.

Essentially Go runtime will run a different goroutine, if current goroutine is blocked on:

  • blocking syscall (for example opening a file),
  • network input,
  • channel operations,
  • primitives in the sync package.

Scheduler Tracing

Go allows to trace runtime scheduler. This is done via GODEBUG environment variable:

$ GODEBUG=scheddetail=1,schedtrace=1000 ./program

Here is an example of output it gives:

SCHED 0ms: gomaxprocs=8 idleprocs=7 threads=2 spinningthreads=0 idlethreads=0 runqueue=0 gcwaiting=0 nmidlelocked=0 stopwait=0 sysmonwait=0
  P0: status=1 schedtick=0 syscalltick=0 m=0 runqsize=0 gfreecnt=0
  P1: status=0 schedtick=0 syscalltick=0 m=-1 runqsize=0 gfreecnt=0
  P2: status=0 schedtick=0 syscalltick=0 m=-1 runqsize=0 gfreecnt=0
  P3: status=0 schedtick=0 syscalltick=0 m=-1 runqsize=0 gfreecnt=0
  P4: status=0 schedtick=0 syscalltick=0 m=-1 runqsize=0 gfreecnt=0
  P5: status=0 schedtick=0 syscalltick=0 m=-1 runqsize=0 gfreecnt=0
  P6: status=0 schedtick=0 syscalltick=0 m=-1 runqsize=0 gfreecnt=0
  P7: status=0 schedtick=0 syscalltick=0 m=-1 runqsize=0 gfreecnt=0
  M1: p=-1 curg=-1 mallocing=0 throwing=0 preemptoff= locks=1 dying=0 helpgc=0 spinning=false blocked=false lockedg=-1
  M0: p=0 curg=1 mallocing=0 throwing=0 preemptoff= locks=1 dying=0 helpgc=0 spinning=false blocked=false lockedg=1
  G1: status=8() m=0 lockedm=0

Note that it uses the same concepts of G, M & P and their state, such as P’s queue size. Usually, you don’t need that much detail, so you can just use:

$ GODEBUG=schedtrace=1000 ./program

There is this great article by William Kennedy, which explains how to interpret this and the detailed version of the trace.

Also, there is an advanced tool called go tool trace, which has an UI and allows you to explore, what your program and runtime is doing. You can read how to use it in Pusher article.

References:

yaml url.URL parsing

I was using gopkg.in/yaml.v2 to unmarshal a .yaml file into a struct and couldn’t find a solution to unmarshal an url string into url.URL.

Trying to just use url.URL  fails with:

Error initializing configuration: yaml: 
unmarshal errors:\n  line 54:
cannot unmarshal !!str `http://...` into url.URL

This is because, url.URL doesn’t adhere to yaml.v2.Unmarshaler interface.

So here is my solution, feel free to copy paste it!

type YAMLURL struct {
	*url.URL
}

func (j *YAMLURL) UnmarshalYAML(unmarshal func(interface{}) error) error {
	var s string
	err := unmarshal(&s)
	if err != nil {
		return err
	}
	url, err := url.Parse(s)
	j.URL = url
	return err
}

And here is how to use it:

type Config struct{
    url YAMLURL `yaml:"url"`
}

Thanks for reading!

Think about your dependencies

A little copying is better than a little dependency, says Go proverb.

Go has a huge standard library, which has support for many, many things. I think many developers, don’t utilize it fully. I mean I have¬†seen many new gophers¬†try to use packages for everything. For example, do you really need a dependency on github.com/stretchr/testify ? ¬†If you are just using assert,¬†maybe you¬†can just can copy the assert function from benbjohnson/testing? Ben Johnson has ¬†published these convenience functions (MIT license):

func assert(tb testing.TB, condition bool, msg string) 
func ok(tb testing.TB, err error) 
func equals(tb testing.TB, exp, act interface{}) 

If you are using the TestSuite¬†stuff, maybe you can just avoid it entirely? The same goes for all the cool HTTP muxes. It’s cool, but if you don’t have a lot of endpoints, maybe you can just avoid that dependency and use standard library’s¬†http¬†package.

But why am I against dependencies? Well, for new developers learning a new language is a lot, if you complicate this by adding custom packages and frameworks it will be harder. And it will be harder for you too. If you get back to a project after a break, you will notice that you have to relearn the dependencies you have used.

With less dependencies you will spend less time managing them. Once, I spent 3 hours fixing a dependency problem with Glide. The problem was that one of my dependency (let’s call this dependency A) had a dependency on an older version of a package C¬†and another dependency (let’s call it B) had a dependency on a newer version of a package C.¬†The biggest problem was that package¬†C¬†hadn’t used semantic versioning, so I had to try random library C commits and try to get this thing working… After doing that for an hour, I took a break and¬†realized, that I can easily rewrite the code and remove a dependency on B.

But aren’t packages from standard library are dependencies too? Well yes, but by utilizing the standard library, you are learning the language. You¬†will start to remember public functions, structs and interfaces, after awhile. Also, Go has version 1 compatibility guarantee, that means that standard library’s API won’t break down on you, for go 1.* versions.

Of course don’t go and reimplement Go MySQL driver or anything crazy like that. Just think before adding another dependency!

That’s all from me. Do you enjoy these articles? Follow me on twitter @PofkeVe¬†and join my mailing list¬†ūüôā

Go schema migration tools

In this post I did a comparison of following tools:

Summary

TL;DR If your looking for schema migration tool you can use:

mattes/migrate, SQL defined schema migrations, with a well defined and documented API, large database support and a useful CLI tool. This tool is actively maintained, has a lot of stars and an A+ from goreport.

rubenv/sql-migrate, go struct based or SQL defined schema migrations, with a config file, migration history, prod-dev-test environments. The only drawback is that it got B from goreport.

markbates/pop, use this if you are looking for an ORM like library. It has awesome model generation capabilities and a custom DSL language for writing schema changes.

go-gormigrate/gormigrate, use this if you are using GORM, this helper adds proper schema versioning and rollback capabilities.

Why these tools? Read below:

Comparing by Stars

mattes/migrate 961
rubenv/sql-migrate 605
markbates/pop 605
liamstask/goose – bitbucket project; stars not available
DavidHuie/gomigrate 110
pressly/goose 80
BurntSushi/migration 56
tanel/dbmigrate 38
GuiaBolso/darwin 29
go-gormigrate/gormigrate 22
pravasan/pravasan 14

Note: I put goose into 4th place because it has a lot of watchers.

I use Github stars as a metric to see how widely project is used. Because it is almost impossible to see how many users actually use it. I bolded out the tools that win in this category.

Comparing by last activity

markbates/pop Feb 14, 2017
mattes/migrate Feb 10, 2017
GuiaBolso/darwin Feb 10, 2017
rubenv/sql-migrate Feb 7, 2017
go-gormigrate/gormigrate Feb 4, 2017
pressly/goose Dec 9, 2016
DavidHuie/gomigrate Aug 9, 2016
tanel/dbmigrate Feb 23, 2016
pravasan/pravasan Mar 20, 2015
liamstask/goose Jan 16, 2015
BurntSushi/migration Jan 25, 2014

Note: I scrapped this info on Wednesday, February 15, 2017 3:00 pm EET. 

More often than not you want to use projects that are maintained. So last activity can be seen as a measure of project’s maintainability. It is¬†important because if there is a bug you want to have an ability to submit a PR or create an issue, which¬†hopefully would get resolved. Bolded out tools win in this category.

Comparing by goreportcard

markbates/pop A+
GuiaBolso/darwin A+
go-gormigrate/gormigrate A+
mattes/migrate A
liamstask/goose A
pressly/goose A
DavidHuie/gomigrate A
tanel/dbmigrate A
rubenv/sql-migrate B
BurntSushi/migration B
pravasan/pravasan D

Note: I scrapped this info on Wednesday, February 15, 2017 3:00 pm EET. 

Goreportcard allows you to check the code quality of any open source project written in go and gives an overall score (A+, A, B,..). I bolded out the tools that win in this category.

Comparing by usability

In this comparison I will try out some of the tools and give my opinion.

I will be playing around with a table MyGuests, which looks like this:

CREATE TABLE MyGuests (
   id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
   firstname VARCHAR(30) NOT NULL,
   lastname VARCHAR(30) NOT NULL,
   email VARCHAR(50),
   reg_date TIMESTAMP
)

mattes/migrate

This is a simple tool, which does¬†migrations based on files. It comes with a Go library and a CLI¬†tool, which helps you to¬†create SQL migration¬†files and manages schema version. Let’s take a look at the example usage of CLI tool below:

$ migrate -url mysql://root@tcp(127.0.0.1:3306)/mattes -path ./migrations 
 create initial_users_table

creates 2 files and a table called schema_migrations:

Version 1487240220 migration files created in ./migrations:
1487240220_initial_users_table.up.sql
1487240220_initial_users_table.down.sql

I added CREATE TABLE MyGuests …¬†statement in file called *up.sql and DROP TABLE¬†MyGuests..¬†statement in *down.sql

Running migration using CLI is simple:

$ migrate -url mysql://root@tcp(127.0.0.1:3306)/mattes 
  -path ./migrations up

This creates a table and sets a row in schema_migrations table:

mysql> select * from schema_migrations;
+------------+
| version |
+------------+
| 1487240220 |
+------------+
1 row in set (0.00 sec)

Here is an example of running ¬†a “down” migration, which drops the table and removes a row from schema_migrations:

$ migrate -url mysql://root@tcp(127.0.0.1:3306)/mattes 
  -path ./migrations down

CLI tool also allows going to specific schema version,  rollbacking previous n migrations, etc.

The provided go library is also pretty simple, it allows you to run migrations from your code and provides you with synchronous and asynchronous implementations. Probably you will only be using UpSync function from your code. Take a look at the example below:

package main

import (
   "fmt"
   _ "github.com/mattes/migrate/driver/mysql"
   "github.com/mattes/migrate/migrate"
)

func main() {
   fmt.Println("Hello")
   allErrors, ok := migrate.UpSync("mysql://root@tcp(127.0.0.1:3306)/mattes", "./migrations")
   if !ok {
      fmt.Println(allErrors)
   }
}

I like¬†this library for it’s simplicity. It supports PostgreSQL, Cassandra, SQLite, MySQL, Neo4j, Ql, MongoDB, CrateDb. But it has a¬†caveat: MySQL support is only experimental.

liamstask/goose

Playing around with this library was a bit painful for me. For about 20 minutes I couldn’t figure out what was wrong with my connection info.¬†I was continuously getting Invalid DBConf¬†errors, with no explanations:

2017/02/16 13:14:54 Invalid DBConf: 
   {mysql  root@tcp(127.0.0.1:3306)/goose  }

It appears to me now that I had left a space after specifying database type!

So in goose you have to create a dir called db and add a file called dbconf.yaml , which contains connection information. This is how my file looked:

development: 
  driver: mysql
  open: root@tcp(127.0.0.1:3306)/goose

In this config you are also allowed to choose your SQL dialect and import a different db driver.

Creating a migration with goose is easy:

goose create initial_users_table

which creates a  file called 20170216132820_initial_users_table.go, which contains 2 go functions:

func Up_20170216132820(txn *sql.Tx) {

}

func Down_20170216132820(txn *sql.Tx) {

}

Here is how I filled these functions:

// Up is executed when this migration is applied
func Up_20170216132820(txn *sql.Tx) {
   res, err := txn.Exec(`CREATE TABLE MyGuests (
      id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
      firstname VARCHAR(30) NOT NULL,
      lastname VARCHAR(30) NOT NULL,
      email VARCHAR(50),
      reg_date TIMESTAMP
   )`)
  fmt.Println(res)
  fmt.Println(err)
}

// Down is executed when this migration is rolled back
func Down_20170216132820(txn *sql.Tx) {
   res, err := txn.Exec("DROP TABLE MyGuests;")
   fmt.Println(res)
   fmt.Println(err)
}

Executing up/down migrations is also easy:

goose up

goose down

Internally goose maintains a table called goose_db_version:

 mysql> select * from goose_db_version;
+----+----------------+------------+---------------------+
| id | version_id | is_applied | tstamp |
+----+----------------+------------+---------------------+
| 1 | 0 | 1 | 2017-02-16 13:37:20 |
| 2 | 20170216132820 | 1 | 2017-02-16 13:37:47 |
| 3 | 20170216132820 | 0 | 2017-02-16 13:39:32 |
| 4 | 20170216132820 | 1 | 2017-02-16 13:40:04 |
| 5 | 20170216134743 | 1 | 2017-02-16 13:51:30 |
| 6 | 20170216134743 | 0 | 2017-02-16 13:51:34 |
+----+----------------+------------+---------------------+
6 rows in set (0.00 sec)

This tools also allows you to specify migration using SQL files, by default this tool supports postgres, mysql, sqlite3 and has a go library. Mostly I liked that you specify connection info in a config file, which simplifies your work with the CLI. Also, writing db migrations as go code looks interesting! Overall, not a bad tool.

markbates/pop

pop is more like an “ORM”, which helps you to create models and sql schema for you. pop also comes with migration capabilities in a CLI tool called¬†soda and a DSL for specifying migrations called fizz.

At the start you have to specify database connection config in a database.yaml file. Mine looked like this:

development:
  dialect: "mysql"
  database: "pop"
  host: "localhost"
  port: "3306"
  user: "root"
  password: ""

Then you can create/drop a database using CLI tool:

soda create -e development
soda drop -e development

Generating a model with it’s migration script based on fizz¬†DSL is simple:

soda generate model MyGuest firstname:text lastname:text email:text 
  reg_date:timestamp

Generated DSL looks like this:

create_table("my_guests", func(t) {
   t.Column("id", "uuid", {"primary": true})
   t.Column("firstname", "text", {})
   t.Column("lastname", "text", {})
   t.Column("email", "text", {})
   t.Column("reg_date", "timestamp", {})
})

and model:

type MyGuest struct {
        ID uuid.UUID `json:"id" db:"id"`
        CreatedAt time.Time `json:"created_at" db:"created_at"`
        UpdatedAt time.Time `json:"updated_at" db:"updated_at"`
        Firstname string `json:"firstname" db:"firstname"`
        Lastname string `json:"lastname" db:"lastname"`
        Email string `json:"email" db:"email"`
        RegDate time.Time `json:"reg_date" db:"reg_date"`
}

Migrate up/down:

soda migrate up 
soda migrate down

On migration Fizz DSL produced the following schema:

mysql> desc my_guests;
+------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+-------+
| created_at | datetime | NO | | NULL | |
| updated_at | datetime | NO | | NULL | |
| id | char(36) | NO | PRI | NULL | |
| firstname | text | NO | | NULL | |
| lastname | text | NO | | NULL | |
| email | text | NO | | NULL | |
| reg_date | datetime | NO | | NULL | |
+------------+----------+------+-----+---------+-------+
7 rows in set (0.00 sec)

Internally this tool maintains a table called schema_migration, which holds schema version number.

I liked this library a lot, but it feel like you should only use this then you are looking for “ORM” like library. Generating models and migrations looks cool! Also, bonus points for Fizz DSL, which looks a lot like go ūüôā

One drawback is that pop supports only : PostgreSQL (>= 9.3), MySQL (>= 5.7) and SQLite (>= 3.x).

go-gormigrate/gormigrate

Gormigrate is a migration helper for GORM library. This helper adds proper schema versioning and rollback cababilities. I like schema versioning + schema migration definition in a list of structs. This is how this looks with MyGuests example:

func main() {
        db, err := gorm.Open("mysql",
    "root@tcp(127.0.0.1:3306)/gorm?charset=utf8&parseTime=True&loc=Local")
        if err != nil {
                panic("failed to connect database")
        }

        if err = db.DB().Ping(); err != nil {
                log.Fatal(err)
        }

        db.LogMode(true)

        defer db.Close()

        m := gormigrate.New(db, gormigrate.DefaultOptions, 
             []*gormigrate.Migration{
                {
                        ID: "201702200906",
                        Migrate: func(tx *gorm.DB) error {
                                type MyGuest struct {
                                        gorm.Model
                                        Firstname string
                                        Lastname  string
                                        Email     string
                                        RegDate   time.Time
                                }
                                return tx.AutoMigrate(&MyGuest{}).Error
                        },
                        Rollback: func(tx *gorm.DB) error {
                                return tx.DropTable("MyGuest").Error
                        },
                },
        })

        if err = m.Migrate(); err != nil {
                log.Fatalf("Could not migrate: %v", err)
        }
}

This migration creates a table, with the following schema:

mysql> desc my_guests;
+------------+------------------+------+-----+---------+----------------+
| Field      | Type             | Null | Key | Default | Extra          |
+------------+------------------+------+-----+---------+----------------+
| id         | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| created_at | timestamp        | YES  |     | NULL    |                |
| updated_at | timestamp        | YES  |     | NULL    |                |
| deleted_at | timestamp        | YES  | MUL | NULL    |                |
| firstname  | varchar(255)     | YES  |     | NULL    |                |
| lastname   | varchar(255)     | YES  |     | NULL    |                |
| email      | varchar(255)     | YES  |     | NULL    |                |
| reg_date   | timestamp        | YES  |     | NULL    |                |
+------------+------------------+------+-----+---------+----------------+
8 rows in set (0.00 sec)

I would definitely use this with GORM.

rubenv/sql-migrate

sql-migrate is a look like goose, you specify connection info in a yaml file, migrations are written in SQL files. It has a CLI, which generates a template for your migration:

$ sql-migrate new MyGuests

My schema change looked like this:

-- +migrate Up
CREATE TABLE MyGuests (
           id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
           firstname VARCHAR(30) NOT NULL,
           lastname VARCHAR(30) NOT NULL,
           email VARCHAR(50),
           reg_date TIMESTAMP
);

-- +migrate Down
DROP TABLE MyGuests;

Applying migration with CLI tool is pretty straightforward:

$ sql-migrate up
Applied 1 migration

It also has nice looking library, with well defined API, which supports having migrations in a struct or in a directory. The only drawback I can think of is that goreport card gave a B for this library.

DavidHuie/gomigrate

Is a really simple toolkit, which only allows you to run migrations from go code:

err := migrator.Migrate()
err := migrator.Rollback()

migration are defined in sql files named { id }}_{{ name }}_{{ “up” or “down” }}.sql, which you have to manage yourself, because¬†it’s only a library.¬†mattes/migrate seems to cover the same functionality and add much more, so I would prefer to use it over this library.

GuiaBolso/darwin

It’s a library, which tracks schema changes in a struct¬†and only allows up migrations. I love the idea of storing all migrations in a slice:

var (
   migrations = []darwin.Migration{
   {
   Version: 1,
   Description: "Creating table MyGuests",
   Script: `CREATE TABLE MyGuests (
              id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
              firstname VARCHAR(30) NOT NULL,
              lastname VARCHAR(30) NOT NULL,
              email VARCHAR(50),
              reg_date TIMESTAMP
            )`,
   },
   }
)

func main() {
   database, err := sql.Open("mysql", "root@tcp(127.0.0.1:3306)/darwin")

   if err != nil {
     log.Fatal(err)
   }

   driver := darwin.NewGenericDriver(database, darwin.MySQLDialect{})

   d := darwin.New(driver, migrations, nil)
   err = d.Migrate()

   if err != nil {
     log.Println(err)
   }
}

I don’t think there is more to say about this library, but I guess it’s a good thing.

tanel/dbmigrate

Simple library for PostgreSQL or Cassandra. Runs migrations (.sql or .cql files) sorted using their file name. mattes/migrate seems to cover the same functionality and add much more, so I would prefer to use it over this.

pressly/goose

It’s a fork of goose, which drops support for config files and custom drivers.¬†Migrations can be run with any driver that is compatible with database/sql. This tool looks like¬†liamstask/goose¬†and mattes/migrate¬†had a baby ūüôā It takes good things from both projects: good CLI, no configuration, write migrations using .sql or .go files,¬†store history of migrations in a goose_db_version table. But there are some things that this tools lacks: it doesn’t support Cassandra or any other non SQL database, CLI can migrate SQL files only.

Then trying to use it I had problems with this tool:

I created SQL based migration:

$ goose mysql "root@tcp(127.0.0.1:3306)/pressly" create initial_users_table sql
Created sql migration at 20170301085637_initial_users_table.sql

Filled 20170301085637_initial_users_table.sql file with:

-- +goose Up
-- SQL in section 'Up' is executed when this migration is applied

CREATE TABLE MyGuests (
	id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
	firstname VARCHAR(30) NOT NULL,
	lastname VARCHAR(30) NOT NULL,
	email VARCHAR(50),
	reg_date TIMESTAMP
)


-- +goose Down
-- SQL section 'Down' is executed when this migration is rolled back
DROP TABLE MyGuests;
$ goose mysql "root@tcp(127.0.0.1:3306)/pressly" up
2017/03/01 08:57:41 WARNING: Unexpected unfinished SQL query: -- +goose Up
-- SQL in section 'Up' is executed when this migration is applied

CREATE TABLE MyGuests (
	id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
	firstname VARCHAR(30) NOT NULL,
	lastname VARCHAR(30) NOT NULL,
	email VARCHAR(50),
	reg_date TIMESTAMP
). Missing a semicolon?
OK    20170301085637_initial_users_table.sql
goose: no migrations to run. current version: 20170301085637
$ goose mysql "root@tcp(127.0.0.1:3306)/pressly" down
2017/03/01 08:57:47 FAIL 20170301085637_initial_users_table.sql (Error 1051: Unknown table 'pressly.myguests'), quitting migration.
$ goose mysql "root@tcp(127.0.0.1:3306)/pressly" up
goose: no migrations to run. current version: 20170301085637

I did miss the semicolon, but this tool wrote into it’s table that migration ran successfully, so I was left at state where table doesn’t exist, but tool thinks it’s exist.

mysql> select * from goose_db_version;
+----+----------------+------------+---------------------+
| id | version_id     | is_applied | tstamp              |
+----+----------------+------------+---------------------+
|  1 |              0 |          1 | 2017-03-01 08:44:45 |
|  2 | 20170301085637 |          1 | 2017-03-01 08:57:41 |
+----+----------------+------------+---------------------+

I had to manually delete the row in goose_db_version the table and restart the migration…
After that everything worked normally:

$ pressly goose mysql "root@tcp(127.0.0.1:3306)/pressly" up
OK    20170301085637_initial_users_table.sql
goose: no migrations to run. current version: 20170301085637
$ pressly goose mysql "root@tcp(127.0.0.1:3306)/pressly" down
OK    20170301085637_initial_users_table.sql

Also I tried to run migration based on go code, but wasn’t successful:
I created the migration and compiled the example cmd file provided in the repo for running migrations:

$ goose mysql "root@tcp(127.0.0.1:3306)/pressly" create initial_users_table
 Created go migration at 20170301083810_initial_users_table.go

and filled with my migration code:

package migration

import (
	"database/sql"
	"fmt"

	"github.com/pressly/goose"
)

func init() {
	goose.AddMigration(Up_20170301083810, Down_20170301083810)
}

func Up_20170301083810(tx *sql.Tx) error {
	res, err := tx.Exec(`CREATE TABLE MyGuests (
	       id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
	       firstname VARCHAR(30) NOT NULL,
	       lastname VARCHAR(30) NOT NULL,
	       email VARCHAR(50),
	       reg_date TIMESTAMP
	   )`)
	fmt.Println(res)
	return err
}

func Down_20170301083810(tx *sql.Tx) error {
	res, err := txn.Exec("DROP TABLE MyGuests;")
	fmt.Println(res)
	return err
}

Then I tried to run:

./pressly --dir=migrations/ mysql "root@tcp(127.0.0.1:3306)/pressly" up
2017/03/01 09:02:24 FAIL 00002_rename_root.go 
(Error 1146: Table 'pressly.users' doesn't exist), quitting migration.

my migrations directory contains only 1 file called 20170301083810_initial_users_table.go, there is no file called 00002_rename_root.go, so I don’t know what the hell this tool is doing, but I really don’t like that it tries to run file called rename_root.go, which I didn’t write and don’t know nothing about.

So be careful with this tool!

Apache Mesos in the face of network partitions

In MesosCon Europe 2015 welcome speech, Benjamin Hindman said that Mesos was designed to support building and running distributed systems. In order to survive production, systems have to learn to adapt occasional network blips and outages. As research shows network can have problems and thinking that network  is reliable is one of the 8 fallacies of distributed computing.

In this post, I will cover what happens when Mesos Master has network problems with it’s Mesos¬†Agent and how Mesos applications (frameworks)¬†gets notified. So, let’s consider the following architecture:

Mesos arch1 - Page 1 (1) 2.pngThis is a bit simplified architecture, because Scheduler can reside in a different host. White boxes are Mesos components, red shaded boxes are components, which are developed by the Mesos app (framework) developer. So, let’s consider what happens when¬†Mesos Master cannot connect to the Mesos Agent.

Mesos Master detects the network blip using pings. The detection is controlled using
–agent_ping_timeout (default 15s)¬†and¬†–max_agent_ping_timeouts (default 5),¬†so the Agent which does not answer to a ping after agent_ping_timeout¬†times max_agent_ping_timeouts seconds, will be considered lost. ¬†In this case,¬†on reconnection Mesos agent will kill all tasks, executors and shutdown. The scheduler will be informed using task status updates, it will receive¬†TASK_LOST for each task on that Agent and an¬†agent failure event.
Here is log of reconnected agent shutting everything down:

I0203 14:31:00.725201 128352256 slave.cpp:809] Agent asked to shut down by master@127.0.0.1:5050 because ‘Agent attempted to re-register after removal’
I0203 14:31:00.725296 128352256 slave.cpp:2218] Asked to shut down framework e1637465-8791-460c-8c66-fadaa19f8148-0000 by master@127.0.0.1:5050
I0203 14:31:00.725313 128352256 slave.cpp:2243] Shutting down framework e1637465-8791-460c-8c66-fadaa19f8148-0000
I0203 14:31:00.725334 128352256 slave.cpp:4407] Shutting down executor ‘59160602-24bc-4a44-9c53-26a43d32402e’ of framework e1637465-8791-460c-8c66-fadaa19f8148-0000 (via HTTP)
E0203 14:31:00.725385 131571712 process.cpp:2105] Failed to shutdown socket with fd 14: Socket is not connected
I0203 14:31:00.725507 128352256 slave.cpp:4407] Shutting down executor ‘7eebc364-a1ab-464d-8624-0f785afccc38’ of framework e1637465-8791-460c-8c66-fadaa19f8148-0000 (via HTTP)

If the network blip is shorter than agent_ping_timeout times max_agent_ping_timeouts seconds, everything should still work.

This is not a very good approach as it doesn’t allow app developer to change this behavior in any way. For example, if you run Spark on Mesos, maybe this behavior is fine (some tasks will be killed, but automatically will be rescheduled on the different Agents), but¬†when using Cassandra on Mesos,¬† you probably wouldn’t¬†want your Cassandra nodes to be killed after 80s of network blip?

Thats why in Mesos 1.1. there was added experimental support for partition-aware Mesos frameworks. If the framework developer will opt in into this feature, tasks and executors wont be killed when agent reconnects. This allows frameworks to define their own policies for how to handle partitioned tasks. Also, it adds new task states, which allows handling network partitions differently.

Let’s take a look at the new task states:

  • TASK_UNREACHABLE – is sent when Mesos Master detects the network blip. On Mesos Agent rejoining the the task will be set as TASK_RUNNING again (if that was the case).
  • TASK_DROPPED – is sent when a task fails to launch because of a transient error. The task¬†is not running.
  • TASK_GONE – is sent during¬†task reconciliation process, when Master knows that¬†task was running on an Agent that has been terminated. So, the task¬†is not running.¬†
  • TASK_GONE_BY_OPERATOR – As I understand mesos will provide some maintenance primitive for operators to mark tasks as gone, so this task state¬†has the possibility of human error. So, the task is probably gone.
  • TASK_UNKNOWN – will be sent during¬†task reconciliation process, when the Master does not know about the task’s agent. So, the task may still be running.

The left work with partition-aware frameworks can be tracked in this Jira ticket.

Thanks for reading! If you have any questions or want to provide feedback, you can contact me via twitter @PofkeVe.

 

Cool feature of jUnit 5

One thing I love in the upcoming release of jUnit is the @DisplayName annotation. This annotation allows to use language¬†to name a unit test. Kevlin Henney in his talk “What We Talk About When We Talk About Unit Testing” has shown power of using English language to specify what unit test is doing. For example:

@Test
@DisplayName("ISBNs with more than 13 Digits are Malformed")
public void isValid() {
    ISBNValidator isbnValidator = new ISBNValidator();

    Boolean isValid = isbnValidator.isValid("1111222233334444");

    assertEquals(false, isValid);
}

Which results in unit tests being a great specification for a component. Furthermore you can use this cool IntelliJ IDEA feature, which allows to see all component tests by name. So the final result looks like this:

2

By looking at the image, we can clearly know what component does and what it’s corner cases are, which gives us a lot of insight. Actually, we even know that it does what it¬†say it does, because test ran fine!

Although you can do this using regular camel case method naming, but it is really hard to read:

@Test
public void isbnsWithMoreThan13DigitsAreMalformed() {
    ISBNValidator isbnValidator = new ISBNValidator();

    Boolean isValid = isbnValidator.isValid("1111222233334444");

    assertEquals(false, isValid);
}

and the resulting spec:

3

 

This example is taken from the talk mentioned above, I really suggest you to watch it.

Thanks for reading ! Until next time!