Tuesday, January 21, 2014

Sniffing and decoding NRF24L01+ and Bluetooth LE packets for under $30

In this long post I am going to describe my journey to sniff and decode popular digital wireless protocols off the air for very cheap. So cheap practicality anyone can obtain the equipment quickly.

I was able to decode NRF24L01+ and Bluetooth Low Energy protocols using RTL-SDR. 
As far as I can see, this is the first time the NRF24L01+ is being decoded, especially considering the low entry price for the hardware. Given the extreme popularity of this transceiver, we are likely to see a wave of hackers attacking the security of many wireless gadgets, and they are likely to succeed as security is usually the last priority for hardware designers of such cheap gadgets.

A lot of work have been done to decode bluetooth using dedicated hardware and I am sure this software can be adapted to output the right format as input to the existing Bluetooth decoders such as Wireshark.
As far as I can see, this is also the first time BTLE can be decoded using a very cheap generic device.

The main software repository for this project is at https://github.com/omriiluz/NRF24-BTLE-Decoder

Developing a wireless mesh network challenges

Recently I've been working on a project to create a super cheap (<$5) sensor node that can be flexible and power efficient so I can just leave sensors everywhere and need absolutely no maintenance.
I decided to use the extremely popular NRF24L01+ transceiver from Nordics Semiconductor due to a balance of performance, power and price - and you'll be surprised how many hardware designers have taken the exact same decision once you start sniffing the air for packets. In my home alone I can see 15 addresses - wireless keyboards and mouse, remote controls, toys and appliances all use this tiny transceiver for wireless communication.

The sensor node with NRF24L01+ 
While working on the mesh network code, my progress slowed to nearly a halt. The code is extremely complex and depends on external conditions like signal strength, noise, etc. But worst of all? I was completely blind on what really happens once packets leave the safety of my micro controller using SPI to the transceiver. For normal (i.e. non-wireless) projects I'm used to being able to connect my scope or logic analyzer and "see" what happens on the wire. This makes debugging a breeze. Unfortunately this is not the case for wireless projects and I had absolutely no idea what happens between the transceivers. To make things worse, these transceivers work in the ISM band of 2.4Ghz. this is fast. much faster than any equipment I have available.



Enter Software Defined Radio (SDR) 

I assume you all know about the magic of SDR and specifically the cheap RTL-SDR. If not, take a break and head to http://sdr.osmocom.org/trac/wiki/rtl-sdr to read about it. For $13-18 (Amazon 1, Amazon 2) you open a world of possibilities that stretches far beyond analog radio into the 2.4Ghz digital space, as you'll read on this post.

Back to the problem of debugging - having experience with rtl-sdr, I immediately started thinking how can I use it to sniff packets off the air. This is impossible using any version of the rtl-sdr as the highest you can buy reach 2.2Ghz. just shy of the 2.4Ghz we need.

I started looking for a way to convert the signal down to a frequency usable by the rtl-sdr. Building one was a possibility, but I had no idea how and all the commercial/DIY products costs hundreds of dollars.

China to the rescue

Another option was to try and find an existing, mass produced and cheap product with my required specification. A quick search on Aliexpress.com found exactly that  - MMDS LNB.
MMDS is a digital broadcast system used in some countries for digital TV, and the LNB is part of the antenna. The MMDS LNB can be found for a variety of frequencies and LO frequencies.

Complete setup
The base frequency defines the filters on the device and the LO frequency defines by how much will it reduce the input frequency. 

Based on the specification, it would do EXACTLY what we need - take 2.2-2.4Ghz signal and down convert it to around 400Mhz. Then we can use the rtl-sdr and some code to decode packets off the air.

As it was very cheap ($12+shipping at Aliexpress) I took the chances and ordered one. About 10 days later it popped in my mail. I quickly hooked everything up and to my extreme surprise, after minutes -


Success!

I used SDR# with the new radio setup to see if I can find signals where I expect them to. The easiest one to find was my Logitech wireless mouse (which uses nrf of course). Tuning to 2,405Mhz (or 407Mhz after down conversion using LO of 1998Mhz) clearly show a strong signal when I move my mouse.



Developing the software to decode the packets was a bit of a headache, but once it started shaping up it was very easy to use it to capture and decode the packets.

So what do you need to make it work?

  • RTL-SDR dongle - ~$15. Easiest to buy on Amazon - (Amazon 1Amazon 2), but you can find it everywhere. I have both an E4000 and R820T dongles and they both work perfectly.
  • MMDS down converter - $12+shipping at Aliexpress. Buy one for 2.2-2.4Ghz with L.O. of 1998Mhz. If you buy from the link here, the seller will ask you for these details after you purchase.
  • (optional) Cables - Different rtl-sdr sticks have different input plug, you need to find a way to connect it to your down converter. This is really optional as I simply hacked some wire and it worked fine.
  • (optional) Power Injector - the down converter is an active component and requires 14-24V. I started by simply connecting my power supply to the wire and it worked fine. later I purchased a commercial power injector for less than $10. you can find one at Amazon or from the same Aliexpress seller.

Back to the comfort of software

From now one, all we need in order to get the packets is some clever software and the comfort of our computer (whether it's windows, linux, mac or RPi).

The NRF24L01+ (nrf from now on) uses GFSK modulation for the data. FSK (and it's derivatives like GFSK) is the digital cousin of FM. the modulator takes a bit stream, and emits one frequency to represent "1" and another frequency to represent "0".
Luckily for us, there is already a great library that does the basic rtl-sdr work and includes a software FM demodulator, rtl_fm.

Using the rtl_fm code on the incoming stream, I exported to excel a raw demodulated feed and filtered to find interesting results -



This is without a doubt an nrf packet. You can see:

  • Noise before (<80) and after (>395) the packet
  • Radio turn on time (85 to 125)
  • Preamble sequence of alternating bits (01010101 here) for the demodulator to detect a packet start and sync clocks (125 to 160)
  • Packet data (160 to 395)

In my code, I detect the preamble and calculate a Threshold number - anything above that number is considered a binary "1" and anything below is a "0". This provides a bit stream which represent the packet.
My code takes this bit stream, and manipulate it to recover the packet.
The last step is to take the packet, apply CRC and compare to the CRC in the last two bytes to verify that this is a valid packet. if the CRC match, we print a decoded packet.

For a detailed description of how I turn this bitstream into a decoded packet, I suggest you open my code over at https://github.com/omriiluz/NRF24-BTLE-Decoder, it is documented and should be relatively simple to understand.

Getting rtl_fm to output the right signal

Once you install the librtlsdr and have it working with your dongle, the basic command line for rtl_fm to product the bitstream we need as input is:
rtl_fm -f 402m -s 2000k -g 0 -p 239

The parameters are:

  • -f - frequency. remember to take the nrf channel frequency and reduce your down converter LO frequency. in the case here it's 2400-1998=402.
  • -s 2000k - mandatory. my code expects a 2M samples per seconds stream
  • -g 0 - to avoid noise, it's important to disable the auto gain control and reduce the gain as much as possible. I use 0 when everything is on my table and 10-15 when it's in my house.
  • -p defines the rtl-sdr permanent frequency drift. As a cheap device, the rtl-sdr is not calibrated. it's easy to calibrate it out of cellular signals using kalibrate-sdr

Sniffing NRF24L01+ packets

Once rtl_fm works, simply pipe the output into my software to see packets decoded -
rtl_fm -f 402m -s 2000k -g 0 -p 239 | nrf24-btle-decoder -d 1

2 simple parameters:
  • -t nrf | btle - should we decode nrf or bluetooth LE packets (more on this later)
  • -d 1 | 2 | 8 - select packet speed - the nrf can do 2mbps, 1mbps or 256kbps. you need to pick the right one.
Having my sensor node send one byte of data (an increment counter) with an acknowledgment from another node, the output would look like:

Sniffing 2Mbps NRF24L01+ traffic on channel 0 (2,400Mhz)


And now I'm not blind anymore when debugging!

Taking it further

As one smart blogger explained, the physical radio of the NRF24L01+ and Bluetooth Low Energy (btle from now one) are quite similar. This allowed me to quickly adapt my code to sniff btle packets as well.

Sniffing Bluetooth Low Energy advertisement channel 38 (2,426Mhz)
The code for sniffing btle is complete for the advertisement channels, but not for the data channels, it would be my next step to add it. The main issue is frequency hopping as required by btle, which I'm not sure our lowly rtl-sdr can do fast enough.





Thursday, March 28, 2013

Improving VM to VM network throughput on an ESXi platform


Recently I virtualized most of the servers I had at home into an ESXi 5.1 platform. This post would follow my journey to achieve better network performance between the VMs.

I am quite happy with the setup as it allowed me to eliminate 5-6 physical boxes in favor of one (very strong) machine. I was also able to achieve some performance improvements  but not to the degree I hoped to see.

I have a variety of machines running in a virtualized form:
1. Windows 8 as my primary desktop, passing dedicated GPU and USB card.from the host to the VM using VMDirectPath
2. Multiple Linux servers
3. Solaris 11.1 as NAS, running the great napp-it software (http://www.napp-it.org/) 

All the machines have the latest VMware Tools installed and running paravirtualized drivers where possible.

VM to VM network performance has been great between the Windows/Linux boxes once I enabled Jumbo Frames. 
Throughout this post I'll use iperf to measure network performance. It's a great and easy to use tool and you can find precompiled version for almost any operating system. http://iperf.fr/

Let's start with an example of network throughput performance from the Windows 8 Machine to Linux:










11.3 Gbps, not bad. CPU utilization was around 25% on the windows box throughout the test.

Network performance between the Solaris VM and any other machine on the host was relatively bad. 
I started by using the E1000G virtual adapter, as recommended by VMware for Solaris 11 (http://kb.vmware.com/kb/2032669). We'll use one of my Linux VMs (at 192.168.1.202) as a server for these tests. using iperf to test:















1.36 Gbps. Not bad between physical servers, but unacceptable between VMs on the same host. also notice the very high CPU utilization during the test - around 80% system time.

My immediate instinct was to enable jumbo frames. Although the adapter driver is supposed to support jumbo frames, I was unable to enable it no matter how hard I fought it. 


root@solaris-lab:/kernel/drv# dladm set-linkprop -p mtu=9000 net0
dladm: warning: cannot set link property 'mtu' on 'net0': link busy

I gave up on getting better performance from the E1000G adapter and switched to VMXNET3. I immediately saw improvement:















2.31 Gbps. but more importantly, the cpu utilization was much lower.

Now let's try to enable jumbo frames for the vmxnet3 adapter - followed the steps in http://kb.vmware.com/kb/2012445 and http://kb.vmware.com/kb/2032669 without success. The commands work, but jumbo frames were not really enabled. we can test with 9000 byte ping -

root@solaris-lab:~# ping -s 192.168.1.202 9000 4
PING 192.168.1.202: 9000 data bytes
----192.168.1.202 PING Statistics----
4 packets transmitted, 0 packets received, 100% packet loss


As my next step I was planning on running some dtrace commands, and accidentally noticed that the drivers I have installed are the Solaris 10 version and not the Solaris 11 version.


root@solaris-lab:~/vmware-tools-distrib# find /kernel/drv/ -ls |grep vmxnet3
78669    2 -rw-r--r--   1 root     root         1071 Mar 27 01:42 /kernel/drv/vmxnet3s.conf
78671   34 -rw-r--r--   1 root     root        34104 Mar 27 01:42 /kernel/drv/amd64/vmxnet3s
78670   25 -rw-r--r--   1 root     root        24440 Mar 27 01:42 /kernel/drv/vmxnet3s
root@solaris-lab:~/vmware-tools-distrib# find . -ls |grep vmxnet3
  231   25 -rw-r--r--   1 root     root        24528 Nov 17 07:55 ./lib/modules/binary/2009.06/vmxnet3s
  234    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/2009.06/vmxnet3s.conf
  250    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/10/vmxnet3s.conf
  244   25 -rw-r--r--   1 root     root        24440 Nov 17 07:55 ./lib/modules/binary/10/vmxnet3s
  262   34 -rw-r--r--   1 root     root        34104 Nov 17 07:55 ./lib/modules/binary/10_64/vmxnet3s
  237   35 -rw-r--r--   1 root     root        35240 Nov 17 07:55 ./lib/modules/binary/11_64/vmxnet3s
  227   34 -rw-r--r--   1 root     root        34256 Nov 17 07:55 ./lib/modules/binary/2009.06_64/vmxnet3s
  253   25 -rw-r--r--   1 root     root        24672 Nov 17 07:55 ./lib/modules/binary/11/vmxnet3s
  259    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/11/vmxnet3s.conf


This is very strange as installation of the Tools is a straightforward procedure with no room for user error.

So I decided to open the Tools installation script (perl) and found an interesting bug -


...
sub configure_module_solaris {
  my $module = shift;
  my %patch;
  my $dir = db_get_answer('LIBDIR') . '/modules/binary/';
  my ($major, $minor) = solaris_os_version();
  my $os_name = solaris_os_name();
  my $osDir;
  my $osFlavorDir;
  my $currentMinor = 10;   # The most recent version we build the drivers for

  if (solaris_10_or_greater() ne "yes") {
    print "VMware Tools for Solaris is only available for Solaris 10 and later.\n";
    return 'no';
  }

  if ($minor < $currentMinor) {
    $osDir = $minor;
  } else {
    $osDir = $currentMinor;
  }
For Solaris 11.1, $minor is 11, which forces $osDir to be Solaris 10. Bug ?
Either way it's very easy to fix - just change "<" to ">":

if ($minor > $currentMinor) {

Re-install Tools using the modified script and reboot. 
Let's check the installed driver now:



root@solaris-lab:~/vmware-tools-distrib# find /kernel/drv/ -ls |grep vmxnet3
79085    2 -rw-r--r--   1 root     root         1071 Mar 27 02:00 /kernel/drv/vmxnet3s.conf
79087   35 -rw-r--r--   1 root     root        35240 Mar 27 02:00 /kernel/drv/amd64/vmxnet3s
79086   25 -rw-r--r--   1 root     root        24672 Mar 27 02:00 /kernel/drv/vmxnet3s
root@solaris-lab:~/vmware-tools-distrib# find . -ls |grep vmxnet3
  231   25 -rw-r--r--   1 root     root        24528 Nov 17 07:55 ./lib/modules/binary/2009.06/vmxnet3s
  234    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/2009.06/vmxnet3s.conf
  250    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/10/vmxnet3s.conf
  244   25 -rw-r--r--   1 root     root        24440 Nov 17 07:55 ./lib/modules/binary/10/vmxnet3s
  262   34 -rw-r--r--   1 root     root        34104 Nov 17 07:55 ./lib/modules/binary/10_64/vmxnet3s
  237   35 -rw-r--r--   1 root     root        35240 Nov 17 07:55 ./lib/modules/binary/11_64/vmxnet3s
  227   34 -rw-r--r--   1 root     root        34256 Nov 17 07:55 ./lib/modules/binary/2009.06_64/vmxnet3s
  253   25 -rw-r--r--   1 root     root        24672 Nov 17 07:55 ./lib/modules/binary/11/vmxnet3s
  259    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/11/vmxnet3s.conf

Now we have the correct version installed. 

Let's enable jumbo-frames as before and check if it made any difference:

root@solaris-lab:~# ping -s 192.168.1.202 9000 4
PING 192.168.1.202: 9000 data bytes
9008 bytes from 192.168.1.202: icmp_seq=0. time=0.338 ms
9008 bytes from 192.168.1.202: icmp_seq=1. time=0.230 ms
9008 bytes from 192.168.1.202: icmp_seq=2. time=0.289 ms
9008 bytes from 192.168.1.202: icmp_seq=3. time=0.294 ms
----192.168.1.202 PING Statistics----
4 packets transmitted, 4 packets received, 0% packet loss
round-trip (ms)  min/avg/max/stddev = 0.230/0.288/0.338/0.044

Success! jumbo-frames are working.


Let's test throughput with iperf:















Less than 1Mb/s, not what we expected at all!
Need to take a deeper look at the packets being sent. Let's use tcpdump to create a trace file:

root@solaris-lab:~# tcpdump -w pkts.pcap -s 100 -inet1 & PID=$! ; sleep 1s ; ./iperf -t1 -c192.168.1.202; kill $PID
[1] 1726
tcpdump: listening on net1, link-type EN10MB (Ethernet), capture size 100 bytes
------------------------------------------------------------
Client connecting to 192.168.1.202, TCP port 5001
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.206 port 35084 connected with 192.168.1.202 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.3 sec    168 KBytes  1.02 Mbits/sec
70 packets captured
70 packets received by filter
0 packets dropped by kernel

and open it in Wireshark for easier analysis:












The problem is clear with packet 7 - the driver is trying to send a 16KB packet, above our 9K MTU jumbo frame. This packet is not received outside of the VM and after a timeout it is being fragmented and retransmitted. This happens again for every packet generating a massive delay and causes throughput to be very low.

Reviewing the vmxnet3 driver source (open source at http://sourceforge.net/projects/open-vm-tools/) it seems the only way a packet larger than the MTU to be sent is if the LSO feature is enabled. 
To learn more about LSO (Large Segment Offload) read http://en.wikipedia.org/wiki/Large_segment_offload.
Essentially, the kernel is sending large packets (16K in the capture) and the NIC (or virtual NIC) is supposed to fragment the packet and transmit valid-size packets. On a real hardware NIC, at high speeds, this saves considerable amounts of CPU. in a virtualized environment I don't see the benefit. And it seems to be badly broken.

Let's disable LSO:

ndd -set /dev/ip ip_lso_outbound 0

And try to run iperf again:














12.1 Gbps, SUCCESS!

Now that that we are able to transmit from Solaris out in decent rates, let's check the performance of connections into the Solaris VM:









3.74 Gbps, not bad, but we can do better - let's at least get to 10Gbps.

Next step is to tune the TCP parameters to accommodate the higher speed needed - the buffers are simply too small for the amount of data in flight -


root@solaris-lab:~# ipadm set-prop -p max_buf=4194304 tcp
root@solaris-lab:~# ipadm set-prop -p recv_buf=1048576 tcp
root@solaris-lab:~# ipadm set-prop -p send_buf=1048576 tcp

And run iperf again:













18.3 Gbps, Not bad!