Raspberry Pi Cluster Config Management and Monitoring

with Ansible, Prometheus and Grafana

Disclaimer: I’m newbie to Ansible, Prometheus and Grafana.

Results

Grafana dashboard for the cluster:

Easy cluster management with Ansible:

Story

My Pi cluster config management started with:

It worked at the beginning when I set up the cluster as a pretty homogenous Hadoop cluster (most nodes are data nodes or node managers). However, as I’m getting more serious with my cluster with some really useful functionalities (instead of for the sake of running a Hadoop/k8s cluster), I had been running into cluster management challenges.

In my current cluster, I have 6 nodes with different roles:

A few obvious challenges:

  • Run pdsh as root is not easy.
  • No easy way to track past commands running in the cluster.
  • No easy way to converge states for the whole cluster.
  • No easy way to split the cluster into different roles and manage separately.

In a sentence, I need a cluster config management tool. A few choices came out: Chef, Puppet CFEngine, Salt.

Salt & CFEngine are NO to me since even professional system admins struggle to understand & use them, and they are not the most fashionable ones nowadays anyway.

I was thinking of using Puppet since Facebook seems to use it, till I found it is written in Ruby, same for Chef, a language I was frustrated with long time ago when I built a web site the first time with Ruby on Rails. Isn’t there a Python, Go or even Java version of modern infra config management system? Then I found Ansible, which is written in Python seems to be very popular as well!

Steps

  1. Installing Ansible is very easy:

2. Create an inventory file

3. Run your first command

If you got any error, make you you have ssh-copy-id your public key to all the hosts.

4. Explore the documentation.. for one use case, I wanted to remove my stale /etc/hosts file which contained a lot of invalid hosts. I need to clean it up and update it with hosts described in the inventory file.

There are two tasks, 1) remove any entry containing 192.168.0.* in my /etc/hosts 2) add new hosts mapping from inventory to the /etc/hosts. The second task is more complicated you can check https://github.com/oliverhu/ansible_config/blob/master/hostname.yml.

For the first one, it is very explanatory by itself..

source

5. A harder one, to set up Promethus and Grafana for the cluster with Anisle: https://github.com/oliverhu/picluster-ansible. After run that repo, you will get the nice monitoring UI presented at the beginning of the post.

Conclusion

It is far more efficient to use a proper infra management tool to orchestrate your cluster.. Ansible seems to be a good one. Folks might ask:

Why not just use Kubernetes & Dockers?

  • There is an unfortunate fact that, Kubernetes & Docker daemons would blow your Raspberry Pis away with ~600MB memory footprint, remember you only have ~1GB in total.

Tips

  1. If your apt gets stuck at resolving ipv6 addresses:

sudo nano /etc/sysctl.conf

append the following lines to turn off ipv6:

run sudo sysctl -p to take effect or just reboot.

2. Force apt-get instead of aptitude for upgrading packages.

For some reason, aptitude takes 2GB (eating all my physical memory & swap space).

Reference

hacker, lifetime learner