Saturday, July 29, 2017

Network Traffic Compression Enhancement by Protocol Layering

I wanted to know how network traffic be compressed by paying attention to the structure of its data, particularly to the protocol layers that can be easily recognized.

Looking at a typical packet capture, there are several layers whose size is fixed or easily predictable: the packet header (time, size), the Ethernet header (always 14 bytes), the IP header (almost always 20 bytes and easy to check), the TCP or UDP headers... Some higher level protocols are also interesting but let's focus on the concept.

Networking layers have repetitive structures, easy to compress with many algorithms. And I had the intuition that they are more easy to compress when treated in isolation. For example, to compress N packets from a TCP conversation we could first parse the "Etherner layer", that is the N Ethernet headers which only take one of 2 values, then the "Internet layer" which has a repetitive structure very dependent on the context. So we could "slice" the traffic by layers, and compress each layer individually, potentially facilitating the compression algorithm's work by providing it with more homogeneous data.

As a proof of concept, I took a handy-enough packet capture from the Mid-Atlantic Collegiate Cyber Defense Competition and started playing with it (or a subset of the first 100,000 packets to start with).

I started by analyzing the used protocols and packet sizes:
$ tshark -r maccdc2012_00000.pcap.gz -c 100000 -Tfields -e _ws.col.Protocol |sort|uniq -c|sort -rg|head
  88167 SMB 
   6445 TCP 
   2742 TLSv1 
    638 STP 
    408 HTTP 
    397 UDP 
    307 ICMP 
    277 SSLv2 
    264 EIGRP 
     74 SSHv2
Looks like SMB is the prevailing protocol. But let's not dive further than TCP in this case since I want to try a very generic solution.

Fixed Layer Sizes

Instead of full protocol analysis, we can try to choose good fixed layer sizes by just looking at packet sizes:
$ tshark -r maccdc2012_00000.pcap.gz -c 100000 -Tfields -e frame.len|sort|uniq -c|sort -rg|head 
  43783 109
  14562 212
  14560 208
   4896 207
   4872 209
   4853 217
   4281 70
   1445 78
    769 64
    609 68
 So for this proof of concept, I will be using the following fixed size layers:
layersizes = (
16, # Packet header (time & length)
14, # Ethernet
20, # IP (almost always)
20, # TCP (typical)
109 - (14+20+20), # Bytes til 109
212 - (14+20+20+109), # Bytes til 212
9999 # The rest
)
Note that I am adding two fixed size layers based on the most frequent packet sizes, plus another for the rest. These layers should really be alternatives to each other but for the sake of simplicity I am treating them as if they represented more protocol layers.

With a simple Python (dpkt) script I sliced the PCAP into 6 layers, adding up to the same decompressed size:
for ts, buf in pcap:
packets +=1
l = len(buf)
bytecount += l
size_hist[l] = size_hist.get(l,0) + 1
streams[0][1].write(struct.pack('dII',ts,l,l))
streams[0][2] += 16 # Same size as in PCAP
 
pos, rest = 0, len(buf)
for s in streams[1:]:
fragments += 1
w = min(s[0], rest)
s[1].write(buf[pos:pos+w])
s[2] += w
if w >= rest:
break
pos += w
rest -= w
Note that we generated one stream per layer, and concatenating them we obtain a file of the same size as the original pcap:
$ cat 1e5packets-layer* > 1e5packets.layers
$ ll|cut -c30-
  1600024 Jul 30 09:31 1e5packets-layer0
  1400000 Jul 30 09:31 1e5packets-layer1
  2000000 Jul 30 09:31 1e5packets-layer2
  2000000 Jul 30 09:31 1e5packets-layer3
  5194853 Jul 30 09:31 1e5packets-layer4
  2297257 Jul 30 09:31 1e5packets-layer5
  2678388 Jul 30 09:31 1e5packets-layer6
 17170522 Jul 30 09:32 1e5packets.layers # Added layers
 17170522 Jul 27 22:16 1e5packets.pcap   # Original
Now we can easily check the effect of compressing the original and the "layered" files with different algorithms:

Size File CPU seconds Compress ratio Layering Improvement MBytes / CPU
17170522 1e5packets.pcap - 1.00 - -
3861578 1e5packets.pcap.gz1 0.5 4.45 n/a 26.62
3615304 1e5packets.layers.gz1 0.4 4.75
6.81%
33.89
3413950 1e5packets.pcap.gz 1.7 5.03 n/a 8.09
3382712 1e5packets.pcap.gz9 5.3 5.08 n/a 2.60
3272017 1e5packets.layers.gz9 8.8 5.25
3.38%
1.58
3266231 1e5packets.pcap.bz1 3.4 5.26 n/a 4.09
3238030 1e5packets.layers.gz 1.5 5.30
5.43%
9.29
3149087 1e5packets.layers.bz2 5.3 5.45
-1.17%
2.65
3112274 1e5packets.pcap.bz2 3.6 5.52 n/a 3.91
2989412 1e5packets.layers.bz1 3.9 5.74
9.26%
3.64
2438016 1e5packets.pcap.xz1 2.2 7.04 n/a 6.70
2283236 1e5packets.pcap.xz 17.0 7.52 n/a 0.88
2281332 1e5packets.pcap.xz9 17.0 7.53 n/a 0.88
2148592 1e5packets.layers.xz1 2.0 7.99
13.47%
7.51
1803756 1e5packets.layers.xz 15.0 9.52
26.58%
1.02
1802924 1e5packets.layers.xz9 15.2 9.52
26.54%
1.01

In the best case of "xz" compression we appreciate a boost of 13~27% in compression efficiency, and in all cases but bzip2 there was a decrease in CPU cost.

Note: Some results were actually better with different layer sizes resulting from a typo in the first runs of last night, so there is room for improvement in the way we choose layer sizes...

Test Code

Using python 2.7 cause v3 did not have dpkt module ready on my Ubuntu system:
pcapz.py
#!/env/python2.7

# "pcapz" prototype by gatopeich 2017

import dpkt
import subprocess
import struct
import os
import time
import zlib

input = '1e5packets'
f = open(input+'.pcap')
pcap = dpkt.pcap.Reader(f)

headers = (
 16, # Packet header (time & length)
 # 14+20+20, # Eth+IP+TCP
 14, # Ethernet
 20, # IP (almost always)
 20, # TCP (typical)
 109-14-20-20,
 # 212-14-20-20,
 99999 # Rest
 )

print "headers:",headers

compression_methods = (
 ('bz1', 'bzip2 -1c'),
 # ('bz9', 'bzip2 -9c'),
 # ('gz1', 'gzip -1c'),
 # ('gz9', 'gzip -9c'),
 ('xz1', 'xz -1c'),
 # ('xz9', 'xz -9c'),
 ('zlib1', zlib.compress, 1),
 ('zlib9', zlib.compress, 9)
 )

def compress(method, filename):
 cfile = filename+'.'+method[0]
 if len(method) == 2:
  out = subprocess.check_output("time -p %s %s > %s"%(method[1],filename,cfile)
   , shell=True, stderr=subprocess.STDOUT)
  size = os.stat(cfile).st_size
  cpu = float(out.split()[1])
 else:
  data = open(filename,'rb').read()
  t0 = time.time()
  size = len(method[1](data,method[2]))
  cpu = time.time() - t0
 return size, cpu 

class Stream:
 N = 0
 def __init__(self, basename, header_size):
  self.h_size = header_size
  self.filename = '%s-stream%d'%(basename, Stream.N)
  Stream.N += 1
  self.file = open(self.filename,'wb')
  self.size = 0

 def write(self, data):
  self.file.write(data)
  self.size += len(data)

 def compress(self, methods = compression_methods):
  self.file.close()
  self.best_size = self.size
  self.best_method = None
  print "%s: %d bytes."%(self.filename, self.size),
  fsize = float(self.size)
  for method in methods:
   size, cpu = compress(method, self.filename)
   print method[0], 'x%.1f(%.1fs),'%(fsize/size,cpu),
   if size < self.best_size:
    self.best_size = size
    self.cpu_cost = cpu
    self.best_method = method[0]
  print
  print " => -%.1f Kbytes with %s in %.1f seconds."%(
   (self.size-self.best_size)/1024.0, self.best_method, self.cpu_cost)

streams = tuple(Stream(input,hs) for hs in headers)

streams[0].write('--- PCAP Header STUB ---')

size_hist = {}
packets, fragments, bytecount = 0, 0, 0
for ts, buf in pcap:
 packets +=1
 l = len(buf)
 bytecount += l
 size_hist[l] = size_hist.get(l,0) + 1
 
 streams[0].write(struct.pack('dII',ts,l,l))

 pos, rest = 0, l
 for s in streams[1:]:
  fragments += 1
  if s.h_size > 0:
   w = min(s.h_size, rest)
  elif (-s.h_size) >= l:
   w = rest
  else:
   continue
  s.write(buf[pos:pos+w])
  rest -= w
  if rest <= 0:
   break
  pos += w
 assert rest == 0

print "Done. packets, fragments, bytecount =", packets, fragments, bytecount
total = [0,0,0]
for s in streams:
 s.compress()
 total[0] += s.size
 total[1] += s.best_size
 total[2] += s.cpu_cost
print "All streams:", total[0], "==>", total[1], "(-%.3fMB)"%((total[0]-total[1])/(1024*1024.0)),
print "ratio = %.1f"%(float(total[0])/total[1]),
print "cost =", total[2], " cpusecs."

# Print the stats we use above for selecting layer sizes...
#top_sizes = sorted(map(lambda ab: ab[::-1], size_hist.items()), reverse=True)
#print "top_sizes:", top_sizes[:7]
# ==> [(43783, 109), (14562, 212), (14560, 208), (4896, 207), (4872, 209), (4853, 217), (4281, 70)]

And some sample outputs:
headers: (16, 14, 20, 20, 55, 158, 99999)
Done. packets, fragments, bytecount = 100000 448868 15570498
1e5packets-stream0: 1600024 bytes. bz1 x14.6(0.2s), xz1 x23.2(0.1s), zlib1 x10.2(0.0s), zlib9 x10.7(1.1s),
 => -1495.3 Kbytes with xz1 in 0.1 seconds.
1e5packets-stream1: 1400000 bytes. bz1 x155.0(0.8s), xz1 x114.0(0.1s), zlib1 x43.4(0.0s), zlib9 x115.9(0.1s),
 => -1358.4 Kbytes with bz1 in 0.8 seconds.
1e5packets-stream2: 2000000 bytes. bz1 x6.7(0.4s), xz1 x7.5(0.3s), zlib1 x3.5(0.1s), zlib9 x3.5(1.9s),
 => -1694.0 Kbytes with xz1 in 0.3 seconds.
1e5packets-stream3: 2000000 bytes. bz1 x4.1(0.4s), xz1 x6.6(0.3s), zlib1 x2.9(0.1s), zlib9 x3.2(0.9s),
 => -1656.4 Kbytes with xz1 in 0.3 seconds.
1e5packets-stream4: 5194853 bytes. bz1 x4.1(1.0s), xz1 x5.5(0.8s), zlib1 x3.6(0.1s), zlib9 x4.2(4.1s),
 => -4144.7 Kbytes with xz1 in 0.8 seconds.
1e5packets-stream5: 4807822 bytes. bz1 x8.8(1.2s), xz1 x13.0(0.3s), zlib1 x9.1(0.1s), zlib9 x10.1(0.2s),
 => -4334.7 Kbytes with xz1 in 0.3 seconds.
1e5packets-stream6: 167823 bytes. bz1 x5.2(0.1s), xz1 x6.9(0.0s), zlib1 x6.1(0.0s), zlib9 x6.4(0.0s),
 => -140.2 Kbytes with xz1 in 0.0 seconds.
All streams: 17170522 ==> 1991123 (-14.476203MB) ratio = 8.6 cost = 2.65  cpusecs.


headers: (16, 8, 8, 8, 8, 8, 14, 55, 99999)
Done. packets, fragments, bytecount = 100000 747720 15570498
1e5packets-stream0: 1600024 bytes. bz1 x14.6(0.2s), xz1 x23.2(0.1s), xz9 x30.9(2.0s), zlib1 x10.2(0.0s), zlib9 x10.7(1.1s),
 => -1512.0 Kbytes with xz9 in 2.0 seconds.
1e5packets-stream1: 800000 bytes. bz1 x126.4(0.5s), xz1 x91.5(0.0s), xz9 x116.5(0.2s), zlib1 x37.5(0.0s), zlib9 x103.8(0.1s),
 => -775.1 Kbytes with bz1 in 0.5 seconds.
1e5packets-stream2: 800000 bytes. bz1 x101.9(0.5s), xz1 x87.3(0.0s), xz9 x107.2(0.2s), zlib1 x34.2(0.0s), zlib9 x80.8(0.1s),
 => -774.0 Kbytes with xz9 in 0.2 seconds.
1e5packets-stream3: 800000 bytes. bz1 x5.5(0.2s), xz1 x7.4(0.1s), xz9 x8.2(0.9s), zlib1 x3.1(0.0s), zlib9 x2.9(1.9s),
 => -685.6 Kbytes with xz9 in 0.9 seconds.
1e5packets-stream4: 800000 bytes. bz1 x6.2(0.2s), xz1 x7.9(0.1s), xz9 x10.4(0.8s), zlib1 x3.5(0.0s), zlib9 x3.5(1.6s),
 => -706.4 Kbytes with xz9 in 0.8 seconds.
1e5packets-stream5: 800000 bytes. bz1 x43.3(0.4s), xz1 x44.9(0.0s), xz9 x56.0(0.3s), zlib1 x22.4(0.0s), zlib9 x34.8(0.1s),
 => -767.3 Kbytes with xz9 in 0.3 seconds.
1e5packets-stream6: 1400000 bytes. bz1 x2.9(0.3s), xz1 x4.5(0.3s), xz9 x6.3(0.9s), zlib1 x2.2(0.1s), zlib9 x2.4(1.1s),
 => -1149.2 Kbytes with xz9 in 0.9 seconds.
1e5packets-stream7: 5194853 bytes. bz1 x4.1(1.0s), xz1 x5.5(0.8s), xz9 x5.9(5.4s), zlib1 x3.6(0.1s), zlib9 x4.2(4.1s),
 => -4209.3 Kbytes with xz9 in 5.4 seconds.
1e5packets-stream8: 4975645 bytes. bz1 x8.5(1.3s), xz1 x12.6(0.4s), xz9 x12.9(3.4s), zlib1 x9.0(0.1s), zlib9 x9.9(0.2s),
 => -4481.5 Kbytes with xz9 in 3.4 seconds.
All streams: 17170522 ==> 1748617 (-14.707MB) ratio = 9.8 cost = 14.28  cpusecs.


BEST ration I have found manually is 10.0:

headers: (16, 8, 8, 8, 8, 8, 16, 64, 64, 99999)
Done. packets, fragments, bytecount = 100000 793191 15570498
1e5packets-stream0: 1600024 bytes. bz1 x14.6(0.2s), xz1 x23.2(0.1s), xz9 x30.9(1.9s), zlib1 x10.2(0.0s), zlib9 x10.7(1.1s),
 => -1512.0 Kbytes with xz9 in 1.9 seconds.
1e5packets-stream1: 800000 bytes. bz1 x126.4(0.5s), xz1 x91.5(0.0s), xz9 x116.5(0.2s), zlib1 x37.5(0.0s), zlib9 x103.8(0.1s),
 => -775.1 Kbytes with bz1 in 0.5 seconds.
1e5packets-stream2: 800000 bytes. bz1 x101.9(0.5s), xz1 x87.3(0.0s), xz9 x107.2(0.2s), zlib1 x34.2(0.0s), zlib9 x80.8(0.1s),
 => -774.0 Kbytes with xz9 in 0.2 seconds.
1e5packets-stream3: 800000 bytes. bz1 x5.5(0.2s), xz1 x7.4(0.1s), xz9 x8.2(0.9s), zlib1 x3.1(0.0s), zlib9 x2.9(1.9s),
 => -685.6 Kbytes with xz9 in 0.9 seconds.
1e5packets-stream4: 800000 bytes. bz1 x6.2(0.2s), xz1 x7.9(0.1s), xz9 x10.4(0.7s), zlib1 x3.5(0.0s), zlib9 x3.5(1.6s),
 => -706.4 Kbytes with xz9 in 0.7 seconds.
1e5packets-stream5: 800000 bytes. bz1 x43.3(0.4s), xz1 x44.9(0.0s), xz9 x56.0(0.3s), zlib1 x22.4(0.0s), zlib9 x34.8(0.1s),
 => -767.3 Kbytes with xz9 in 0.3 seconds.
1e5packets-stream6: 1600000 bytes. bz1 x2.0(0.4s), xz1 x2.9(0.5s), xz9 x3.6(1.1s), zlib1 x1.7(0.1s), zlib9 x1.7(1.7s),
 => -1128.0 Kbytes with xz9 in 1.1 seconds.
1e5packets-stream7: 5518075 bytes. bz1 x5.5(1.1s), xz1 x7.5(0.6s), xz9 x8.3(5.6s), zlib1 x5.0(0.1s), zlib9 x5.9(2.7s),
 => -4735.7 Kbytes with xz9 in 5.6 seconds.
1e5packets-stream8: 2981162 bytes. bz1 x7.1(0.6s), xz1 x11.8(0.2s), xz9 x11.6(3.0s), zlib1 x7.0(0.0s), zlib9 x8.5(0.4s),
 => -2664.3 Kbytes with xz1 in 0.2 seconds.
1e5packets-stream9: 1471261 bytes. bz1 x12.5(0.8s), xz1 x14.5(0.1s), xz9 x14.8(0.5s), zlib1 x12.3(0.0s), zlib9 x13.4(0.0s),
 => -1339.6 Kbytes with xz9 in 0.5 seconds.
All streams: 17170522 ==> 1720437 (-14.734MB) ratio = 10.0 cost = 12.0  cpusecs.


Combined xz1/xz4 compression achieves a very nice 9.6 in just 4 seconds:

headers: (16, 8, 8, 8, 8, 8, 16, 64, 64, 99999)
Done. packets, fragments, bytecount = 100000 793191 15570498
1e5packets-stream0: 1600024 bytes. xz1 x23.2(0.1s), xz4 x21.9(0.6s),
 => -1495.3 Kbytes with xz1 in 0.1 seconds.
1e5packets-stream1: 800000 bytes. xz1 x91.5(0.0s), xz4 x91.2(0.1s),
 => -772.7 Kbytes with xz1 in 0.0 seconds.
1e5packets-stream2: 800000 bytes. xz1 x87.3(0.0s), xz4 x87.0(0.1s),
 => -772.3 Kbytes with xz1 in 0.0 seconds.
1e5packets-stream3: 800000 bytes. xz1 x7.4(0.1s), xz4 x8.2(0.7s),
 => -685.4 Kbytes with xz4 in 0.7 seconds.
1e5packets-stream4: 800000 bytes. xz1 x7.9(0.1s), xz4 x10.5(0.7s),
 => -706.9 Kbytes with xz4 in 0.7 seconds.
1e5packets-stream5: 800000 bytes. xz1 x44.9(0.0s), xz4 x49.0(0.1s),
 => -765.3 Kbytes with xz4 in 0.1 seconds.
1e5packets-stream6: 1600000 bytes. xz1 x2.9(0.5s), xz4 x3.7(1.1s),
 => -1137.8 Kbytes with xz4 in 1.1 seconds.
1e5packets-stream7: 5518075 bytes. xz1 x7.5(0.6s), xz4 x6.5(1.7s),
 => -4674.7 Kbytes with xz1 in 0.6 seconds.
1e5packets-stream8: 2981162 bytes. xz1 x11.8(0.2s), xz4 x12.2(0.6s),
 => -2672.9 Kbytes with xz4 in 0.6 seconds.
1e5packets-stream9: 1471261 bytes. xz1 x14.5(0.1s), xz4 x14.6(0.2s),
 => -1338.6 Kbytes with xz4 in 0.2 seconds.
All streams: 17170522 ==> 1788160 (-14.670MB) ratio = 9.6 cost = 4.15 cpusecs.

Friday, November 21, 2014

Auto-start XBMC on demand from your remote

A really simple script that starts XBMC automagically when a remote tries to access it:
/home/tv/xbmc-on-demand.sh:
#!/bin/sh
while true; do
  echo Starting XBMC... | nc -vl 8080
  xbmc
  sleep 7
done
(Don't forget to chmod +x xbmc-on-demand.sh)

The sleep is there to prevent XBMC from restarting right after quitting from the remote, giving you a few seconds to shut down the remote itself.

Run this automatically within your X session (XFCE being my favorite), and whenever a connection is attempted to TCP port 8080 (default for XBMC), xbmc will start.

This is my ~/.config/autostart/xbmc-on-demand.desktop to have it running in background on every Freedesktop (XFCE, GNOME, KDE, etc) login:
/home/tv/.config/autostart/xbmc-on-demand.desktop:
[Desktop Entry]
Encoding=UTF-8
Version=0.9.4
Type=Application
Name=xbmc
Comment=
Exec=/home/tv/xbmc-on-demand.sh
StartupNotify=false
Terminal=false
Hidden=false


I use it with "Music Pump XBMC Remote" for Android, the best and probably the lightest.

Sunday, November 24, 2013

Droid Face 1.0

Originally a port of "gatotray", it evolved and adapted into something more suitable for a smartphone.
Today it got mature enough to be shared online, so here it goes:

Sunday, October 7, 2012

Respawning...

Just moved "gatopeich's simple unix tools & hacks" to Blogger.
Many links are broken, if you cannot find something please ask for it.

A BASH prompt that displays running command on Xterm title


In order to have a quite explanatory bash prompt showing current directory and time, and an Xterm title that shows the currently running command or the current directory when idle, use this:


    PS1='\e]0;\W\007\n\[\e[0;35m\]\u@\h: \[\e[1;36m\]\w/\n\[\e[1;33m\]\t $ \[\e[0m\]'
    trap 'echo -ne "\e]0;$BASH_COMMAND\007"' DEBUG
    set +o functrace # Do not inherit traps, to prevent messing with completion&nbsp;


Compcache: Easy installation


(Very old information, unlikely to be useful anymore, kept here for the sake of history)


Being a happy user of compcache (on a Xubuntu system), I have prepared a very simple patch to have it easily installed on my system, and reinstalled whenever kernel changes or I just want to try a new version. The patch adds an "install" target to compcache's main Makefile, in order to install an LSB-compliant init script, so compcache is loaded and unloaded at the proper system levels.


This is, roughly, the make code:

PWD := $(shell pwd)
KD := /lib/modules/$(shell uname -r)/build
install:
        make -C $(KD) M=$(PWD) INSTALL_MOD_STRIP=1 modules_install
        depmod
        install init-compcache /etc/init.d/compcache
        update-rc.d compcache defaults

 
Here is the init script: init-compcache.

And here is the full patch that you can apply with "patch < compcache_installation.diff " on a fresh compcache-0.5.1 source tree: compcache_installation.diff. After that, build it with "make" and install it with "sudo make install".

USB D2XX driver for Wine (on Linux)


I have a very nice logic analyzer that didn't come with any software for Linux. However, it uses some FTDI chip through its standard driver library 'ftd2xx.dll', which happens to have a Linux version whose API is pretty much the same, called "libftd2xx.so".


So I took the effort to wrap-up a Wine DLL 'ftd2xx.dll.so' which offers the native Linux driver to any Windows application running under Wine. It was a tedious task, but quite straightforward thanks to 'winedump'.

If you want to use your favourite FTDI-based USB devices under Wine too, just install ftd2xx.dll.so on your Wine lib installation, usually /usr/lib/wine/. Source code and detailed instructions are available here. Just don't forget to install FTDI's D2XX drivers for Linux from their site.

My particular device (Ant18e) demonstered some funny behaviour under Linux, somehow it didn't report a correct USB serial number. The strange thing was that it worked flawlessly under windows... I had to reprogram its EEPROM to get a good serial number with MPROG, a standard utility provided by FTDI, and a 'EEPROM template' provided by my device's vendor.

I also built a 'libd2xx_table.so' (ftdi_table-0403.c) that you may want to install right beside your libftd2xx.so (typically at /usr/local/lib/) in order to get all your FTDI-based devices (VendorID=0403) recognised by the driver. Otherwise only a few ProductIDs will be working. More details on D2XX Linux driver package, under "lib_table".

Use this stuff at your own risk! I already risked my brand new logic analyzer in this development :-).

Update for 64-bit systems: I have not been able to compile this on a 64-bit architecture, however the 32 bit binaries work fine when placing ftd2xx.dll.so at /usr/lib32/wine, and libd2xx_table.so (together with 32-bit D2XX drivers) at /usr/lib32/.