Raspberry Pi Cluster Config Management and Monitoring

Oliver Hu
4 min readJul 10, 2020

--

with Ansible, Prometheus and Grafana

Disclaimer: I’m newbie to Ansible, Prometheus and Grafana.

Results

Grafana dashboard for the cluster:

Easy cluster management with Ansible:

Story

My Pi cluster config management started with:

alias runshell=”pdsh -R ssh -w pi@192.168.0.[7,8,6,13,17]”

It worked at the beginning when I set up the cluster as a pretty homogenous Hadoop cluster (most nodes are data nodes or node managers). However, as I’m getting more serious with my cluster with some really useful functionalities (instead of for the sake of running a Hadoop/k8s cluster), I had been running into cluster management challenges.

In my current cluster, I have 6 nodes with different roles:

img0: self-hosted docker image repository.
monitor0: promethus & grafana dashboard
vpn0: vpn service.
ws0: web service.
ansible0: ansible controller.
proxy0: reverse web proxy.

A few obvious challenges:

  • Run pdsh as root is not easy.
  • No easy way to track past commands running in the cluster.
  • No easy way to converge states for the whole cluster.
  • No easy way to split the cluster into different roles and manage separately.

In a sentence, I need a cluster config management tool. A few choices came out: Chef, Puppet CFEngine, Salt.

Salt & CFEngine are NO to me since even professional system admins struggle to understand & use them, and they are not the most fashionable ones nowadays anyway.

I was thinking of using Puppet since Facebook seems to use it, till I found it is written in Ruby, same for Chef, a language I was frustrated with long time ago when I built a web site the first time with Ruby on Rails. Isn’t there a Python, Go or even Java version of modern infra config management system? Then I found Ansible, which is written in Python seems to be very popular as well!

Steps

  1. Installing Ansible is very easy:
$ sudo apt install ansible

2. Create an inventory file

pi@ansible0:~/ansible $ cat /etc/ansible/hosts[pis]
ansible0 ansible_ssh_host=192.168.0.10
img0 ansible_ssh_host=192.168.0.11
ws0 ansible_ssh_host=192.168.0.12
vpn0 ansible_ssh_host=192.168.0.13
monitor0 ansible_ssh_host=192.168.0.14

3. Run your first command

ansible all -a "/bin/echo hello"

If you got any error, make you you have ssh-copy-id your public key to all the hosts.

4. Explore the documentation.. for one use case, I wanted to remove my stale /etc/hosts file which contained a lot of invalid hosts. I need to clean it up and update it with hosts described in the inventory file.

There are two tasks, 1) remove any entry containing 192.168.0.* in my /etc/hosts 2) add new hosts mapping from inventory to the /etc/hosts. The second task is more complicated you can check https://github.com/oliverhu/ansible_config/blob/master/hostname.yml.

For the first one, it is very explanatory by itself..

---
- name: keep 10 lines of /etc/hosts file
hosts: all
gather_facts: yes
tasks:
- name: update etc/hosts
become: yes
become_user: root
tags: delete_file
lineinfile:
path: /etc/hosts
regexp: "192.168.*"
state: absent
backup: yes

source

5. A harder one, to set up Promethus and Grafana for the cluster with Anisle: https://github.com/oliverhu/picluster-ansible. After run that repo, you will get the nice monitoring UI presented at the beginning of the post.

Conclusion

It is far more efficient to use a proper infra management tool to orchestrate your cluster.. Ansible seems to be a good one. Folks might ask:

Why not just use Kubernetes & Dockers?

  • There is an unfortunate fact that, Kubernetes & Docker daemons would blow your Raspberry Pis away with ~600MB memory footprint, remember you only have ~1GB in total.

Tips

  1. If your apt gets stuck at resolving ipv6 addresses:

sudo nano /etc/sysctl.conf

append the following lines to turn off ipv6:

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

run sudo sysctl -p to take effect or just reboot.

2. Force apt-get instead of aptitude for upgrading packages.

For some reason, aptitude takes 2GB (eating all my physical memory & swap space).

- name: Playbook for upgrading the RPis  hosts: raspberry_pi  user: pi  gather_facts: no  tasks:  - name: Update and upgrade apt packages  become: true  apt:    upgrade: yes    update_cache: yes    force_apt_get: yes # This line

Reference

--

--

Oliver Hu
Oliver Hu

No responses yet