I’ve been struggling with getting the lights to work reliably. In particular one situation which is causing a lot of problems is when you turn on room, the lights should all send an “announce” ZDO. I find that the coordinator, on average, sees about 75% of them.

What this means is that the software has a really hard time figuring out whether a given light is turned on or not. Also, I was keen to make the controller automatically restore a bulb to its previous configuration on power-on. Easy fix — ping all the devices periodically, but this led to further problems:

  • If a device is offline, you have to wait for it to timeout. This slows down the scan time for the whole house.
  • Sometimes you don’t get replies to the pings either.
  • Pings add more traffic to the network.

My theory was that either:

  • The network is handling contention badly — multiple devices send their “announce” ZDOs at the same time, they collide, and they don’t retry.
  • The devices (bulbs) are unreliable — they sometimes forget to send their announce ZDOs on power on.
  • The XBee coordinator is dropping messages on the floor.

At linux.conf.au 2017 I attended an excellent talk by Jon Oxer about Network Protocol Analysis for IoT Devices. A key point — everything is easier when you can debug the network layer. So I went looking for an 802.15.4 sniffer. I followed a few paths to various dead ends, but came to the conclusion that the best bet was a TI CC2531 USB module. This is an 8051-based microcontroller with a 802.15.4 radio transceiver. Importantly, it comes supplied with firmware that allows you to use a Windows-based TI tool for packet capture, but also allows you to develop custom firmware. In particular, it’s supported by Contiki OS which includes a sample application called “sensniff”, that does packet capture, with a Python program that can write it in pcap format for Wireshark.

TI CC2531

Very exciting! The kit arrived, but despite having a USB interface, you can’t actually program it without a dedicated programmer. Argh! So, another Digikey order. In hindsight this was super obvious, but frustrated that I missed it.

In the meantime, I experimented with the built-in firmware. The TI capture tool works fine on Windows but is fairly limited in its protocol support. Also, took me a while to realize that when Digi’s X-CTU says that my coordinator is on channel “14” it really means 0x14.

While I waited for the Digikey order, I figured out how to build the Contiki firmware. On Arch Linux, mostly this involves setting up an 8051 toolchain (SDCC) and this was fairly straightforward using the instructions on the Contiki Wiki. Note: pay careful attention to the line that says “Recent Tested SDCC revisions: 9092” — you definitely need to get that revision, see below.

Also I figured it would be good to be able to program the device from Linux too — after following this forum post to this GitHub repo I was able to get that up and running too. See commands below. Note the notch on the cable goes on the button side of the header (i.e. red is pin 1, as you’d expect). Once the cc-debugger is connected to the target and to USB, press the “reset” button on the cc-debugger to enable programming.

TI CC-Debugger.

But it didn’t work. Very frustrating, because I couldn’t even make the Contiki blink demo work. Turns out something is broken in SDCC later than SVN revision 9092. Reverting back to 9092 and I had working firmware! Let me know if you’d like a built image to flash.

> cd ~/src/github.com/dashesy/cc-tool
> ./bootstrap
...
> ./make
...
> # either copy udev/90-cc-debugger.rules or use sudo
> ./cc-tool -n CC2531 --test
  Programmer: CC Debugger
  Target: CC2531
  Device info:
   Name: CC Debugger
   Debugger ID: 0050
   Version: 0x05CC
   Revision: 0x0044

  Target info:
   Name: CC2531
   Revision: 0x24
   Internal ID: 0xB5
   ID: 0x2531
   Flash size: 256 KB
   Flash page size: 2
   RAM size: 8 KB
   Lock data size: 16 B
> ./cc-tool -n CC2531 --erase --write ~/src/github.com/contiki-os/contiki/examples/sensniff/sensniff.hex
  Programmer: CC Debugger
  Target: CC2531
  Erasing flash...
  Completed
  Writing flash (47 KB)...
  Completed (3.40 s.)

You can run the sensniff Python tool with

> $ python3 sensniff.py -b 406800 -d /dev/ttyACM0 -D DEBUG

You might need to change the channel number here - you can do this interactively. Once you do, you should start seeing messages like “Read a frame of size 52” but a warning about the remote end (i.e. Wireshark) not reading.

Sensniff will create a pipe full of pcap events which you can open in Wireshark with

> $ wireshark-gtk -ni /tmp/sensniff -k

Next to tell Wireshark how to decrypt the HA-profile data. I found this video which was extremely helpful, but the summary is:

  • In Preferences > Protocols > Zigbee, set “Security Level” to AES-128 / 32-bit.
  • Edit the keys and add “5A:69:67:42:65:65:41:6C:6C:69:61:6E:63:65:30:39” with byte order “normal” and label something like “Zigbee Trust Center Link Key”. (Fun fact: that key is the hex of “ZigBeeAlliance09”).
  • Now get a device to join the network and you’ll hopefully see a “Transport Key” message go past. Open up the “Zigbee Network Layer Data / Zigbee Security Header” and grab the transport key, and add it alongside the first key. This key is private to your network and is given to devices during joining.

Finding the transport key in the security header of a captured packet.

Configuring the Zigbee keys in Wireshark preferences.

Awesome! Now we get fully decrypted packets in Wireshark. So, what does it look like when I turn on a room?

Interesting!

So I’m fairly sure this debunks the “devices don’t do retries” theory, and it suggests that collisions probably aren’t a thing. Looking at you, XBee…

Looking forward to running this while using my home automation controller and seeing if I can iron out the remaining issues. I have one theory about the XBee which might be that packets are arriving so quickly that they’re filling the UART TX buffer, which is why they’re being dropped. This suggests three possible things to look into:

  • Run the UART at a higher speed
  • Switch to SPI (at an even higher speed) instead of serial.
  • Is the host not pulling data from serial fast enough (serial->FTDI->USB->TTY->pyserial). Update: late-2018: Yes!

Will investigate further. Perhaps it’s time to move the coordinator to the TI CC2531… more on that next post.