It's not the network (it's kind of the network).

We’re back in business on the Pi cluster, now with Cilium. The fun thing about all this cloud native stuff is the aggressive vendor agnosticity. You don’t go far in the installation documentation for this stuff before hitting a fork in the road where you need to choose a solution to an issue you never knew even existed. In this case, it’s your Container Network Interface, or CNI.
Sure, containers are just little bits of your operating system sliced off and cordoned off for a specific set of processes. But that can only go so far on the local system, and if it’s not local, it’s networked, and if it’s networked, it needs a network!
CNIs form the bridge* from the containers running on a machine, through the machine’s network stack, and then out the physical interface into your network in a way the rest of the network will actually make sense of.
*The word “bridge” has a certain meaning in the context of networking, but other words like route, path, connection also have their own pedantic meanings, so I had to pick one. Also, some CNIs do in fact form a virtual bridge.
Anyway, in the course of your Kubernetes installation, you will be prompted to choose a CNI without much help on which one to choose. I have previously used Calico and Flannel without much difference to me and my current capabilities. This time, for a change of pace, I decided to use Cilium. It’s a slightly different process from the others in which you need to install the CLI app first and then use that to deploy a helm chart whereas the others have you deploy the charts yourself.
In the course of this, I learned a few lessons:
- Ubuntu uses systemd-resolved to resolve DNS, which by default does not consider DHCP option 119. This is probably only relevant to my case in which I have a .lab domain in my house where all the weird experimental stuff sits. DHCP option 119 is the reason you don’t need to specify .local when you ping another device by name on your home network. Your local DNS resolver has search domains that it will automatically append to non-FQDN hostnames; it learns these when your device connects to the network and makes a DHCP request to get its IP address, and the DHCP server replies with the IP address along with the search domains.
To resolve this you need to add the line UseDomains=true to /run/systemd/resolve/stub-resolv.conf and do a systemctl restart systemd-resolved.  After that, systemd-resolved will start using the domains learned from option 119.
- When you initialize your kubelet with kubeadm, you should specify the--pod-network-cidrparameter. It specifies the full range of IP addresses that Kubernetes can use to assign to its pods. This, weirdly, gets stored in Kubernetes under the value ofcluster-cidr. More confusingly, there exists apod-cidrvalue, which is a subset of thecluster-cidrspecific to each node. When thepod-cidris equal or larger than thepod-network-cidr, Cilium will fail to start. In my case I had set mypod-network-cidrto172.16.0.0/24, which coincidentally lines up with the defaultpod-cidr. Setting that back to172.16.0.0/16got everything working again.
This all goes to say that, as a network engineer, it seems like everyone is blaming the network, where from your perspective the packets are going to and from the correct black boxes as designed, and your responsibilities are fulfilled. The hard part is that nowadays, with virtualization and containerization, the network has extended into the boxes. It’s all too easy to get stuck in traditional ways of physical switches and routers, when actually there can be any number of virtual switches and routers inside each of your endpoints, and unless you find a way to meet somewhere in the middle with the endpoint folks, none of this goes anywhere and we’re all just stuck talking to ourselves.
Bandwidth Visualizer
A while back I came across this Stellar Unicorn in the clearance bin at Micro Center for $15 and picked it up. It’s basically a 16x16 RGB LED display with a Raspberry Pi Pico W stuck to the back of it. It comes with a few neat example graphics scripts, but the real purpose is for you to write your own. It was only now that I figured it was time to sit down and learn how to.

My plan was to create a network bandwidth visualizer fashioned after the title sequence to The Matrix. My network at home is Ubiquiti-based, with a Unifi Dream Machine at the core, which is nice in many ways but also provides API access via the Ubiquiti API.
This project has been a mix of a lot of things I’d been meaning to get into or put to use but just never had the right excuse:
- Network visibility and programmability
- Asynchronous programming in Python
- Embedded systems
The Python side of things is MicroPython, a stripped down and streamlined version of Python made for microcontroller boards such as this RPi Pico W. When you flash the MicroPython firmware to the board, it creates a little filesystem for you to upload your scripts, and at power-on, it finds and runs main.py. It’s also got its own version of pip, mip (which came in handy for installing uasyncio and ujson).
Inside, I have three loops running asynchronously (in parallel):
- Poll the API at regular intervals and update the downstream utilization on the uplink in bytes per second.
- Spawn the dots at the top of the screen according to an interval based on the uplink utilization.
- Update the positions of the dots as they scroll down the screen, clearing out the ones that reach the bottom, and then update the display according to the configured frames per second variable (currently set to 4 FPS, which is slow but I’ll explain).
I had originally had these running synchronously, which was a tremendous pain because I had to:
- Track when each variable was last updated
- Track how long it’s been since the last update
- Actually update the variable
- Update the last update time.
Even with just a handful of variables, it took a lot of time that I should have just spent learning async io.

The remaining problem is, when the API call is made, the whole device freezes for a fraction of a second, in spite of being run asynchronously. It doesn’t sound like much, but it manifests as a very annoying stutter that is more noticeable at higher FPSes. Even at 4 FPS it is still there and that’s as good as I can get it.
There are also four buttons on the back of the display, and I’ve been wondering about what kinds of things I could do with them with the API. Turn the ad blocker on and off? Cut my kid’s internet access when necessary? Maybe initiate a graceful shutdown in case of a power outage.
In any case, I’m hoping to make a library of these visualizations because I spend my day at work looking at line graphs, but it’s nice to see network activity in a more tangible way, even if it’s completely qualitative and not exactly representative of reality (I fudged with multipliers a lot to get it to “look right”).
Oh, and another great thing about this – it’s got wifi, so I can just put this anywhere in the house, give it 5VDC and it automatically connects and starts doing its thing.
picocalc

We got a pair of PicoCalcs, and they’ve been a ton of fun.
I’m not entirely sure how to describe the PicoCalc. It’s got the form factor of the oldschool graphing calculators (do people still use these?), but it’s more or less a self-contained single board computer with keyboard, display, and audio. Inisde is your choice of several Raspberry Pi Pico microcontrollers.
It comes loaded with PicoMite, an MMBASIC interpreter. My older brother taught me a bit of BASICon his Commodore 64 when I was 7 or 8, so using the PicoCalc comes with a lot more nostalgia than just the form factor.

So far it’s just been silly little graphics and sound synth programs. It’s pretty slow and the 256k of RAM doesn’t leave enough room for multiple framebuffers, so decent animation is out of the question. You have access to the GPIO pins so you could work sensors and lights and motors like you would with Arduino, but you’d probably be better off just using an Arduino.
I’ve been using it as an alternative to bedtime doomscrolling. Coding on a phone sucks, but thumbtyping on this little clicky keyboard actually works really well and you can do it sitting up or lying down in bed.
I’ll probably never make anything remotely useful with this thing but that’s sort of the whole appeal. Not really having a lot of support for it at work, I’m really left with these little toy programs that don’t really do a whole lot other than look cool. One of these days I really should, though. The open source community is probably a good place to start and I’ve been looking for the right opportunity to jump in and help out a bit.
Hallway troubleshooting

At some point I bought this little 7” LCD screen for a FPGA project I never got started on, and it had been sitting around in a drawer till now. Turns out, combined with a wireless keyboard, it’s the perfect kit for some emergency hallway troubleshooting for when you mangle your network configuration and lose SSH access.
Link Aggregation

So after moving our server to the linen closet I realized it had a second NIC in back, which of course meant I had to use it. Now that I have control over the network infrastructure, I can set up link aggregation.

Link aggregation allows you to bundle multiple physical connections between two devices into a single one. On Cisco IOS, you create a numbered port channel, and configure the physical interfaces to use the corresponding channel group, also specifying the mode. This process has always been annoying to me because instead of explicitly declaring the protocol – either Link Aggregation Control Protocol (LACP) or Port Aggregation Protocol (PAgP) – you imply it with the mode.
LACP’s modes are active and passive while PAgP’s include a few more but you’ll usually use auto or desirable.  The only reason I can think of for these interchangeable and sometimes misleading keywords is for cert test purposes.
Fortunately, in Ubiquiti you click into this dropdown, which activates LACP (PAgP is Cisco proprietary).

Between network devices it’s pretty easy and you just configure both sides equally, but on the server side I finally had to reckon with one of my big blind spots – Linux networking. The configuration is scattered across a number of commands and files in such a way that I don’t even really know how to write about them. But I guess I’ll try.
For many years, ifconfig was the go-to command to do things like set ip addresses and turn interfaces on and off (not to be confused with the ipconfig command in Windows.  But you would have to set your default gateway with the route add command.  Now those are deprecated and the best practice these days is to use ip addr and ip route.  This makes googling answers a bit difficult because a good number of writeups are still using ifconfig.
And then for DNS you need to configure dns by editing /etc/resolv.conf.
By default there’s the networking service that handles this all to a certain extent, but then you can also install NetworkManager that runs alongside networking so you can use the nmcli command to configure it.
There’s even more and I’m not even going to get into wireless, but nmcli did most of the heavy lifting here.  I was going by this guide from Red Hat.  Here, I think “bond” is the term for port channel or etherchannel or whatever you want to call it (I’m sure there are some very important distinctions depending on how pedantic you want to get about it).
nmcli conn add type bond ifname po1
The above makes a logical port arbitrarily named po1 (this is just what it would have been called on a Cisco device).  Then you need to add your physical interfaces to it:
nmcli conn add type ethernet eno1 master po1
nmcli conn add type ethernet enp110s0 master po1
The two interfaces on the server are named, inexplicably, eno1 and enp110s0.  At this point I lost connectivity to the server and had to set up the world’s most annoying crash cart (more on this in a later post).
The two NICs had link lights flashing, but I couldn’t get any layer 3 traffic through. It turned out I needed to enable LACP on the interfaces with:
nmcli conn modify po1 bond.options "mode=802.3ad,lacp_rate=slow"
And now we have a 2 gigabyte uplink for this server. I have no idea what the point of this all was. Also it broke VMWare for a minute because I needed to specify the new interface for it to use.