Simple Linux tools & hacks: bzip2

I wanted to know how network traffic be compressed by paying attention to the structure of its data, particularly to the protocol layers that can be easily recognized.

Looking at a typical packet capture, there are several layers whose size is fixed or easily predictable: the packet header (time, size), the Ethernet header (always 14 bytes), the IP header (almost always 20 bytes and easy to check), the TCP or UDP headers... Some higher level protocols are also interesting but let's focus on the concept.

Networking layers have repetitive structures, easy to compress with many algorithms. And I had the intuition that they are more easy to compress when treated in isolation. For example, to compress N packets from a TCP conversation we could first parse the "Etherner layer", that is the N Ethernet headers which only take one of 2 values, then the "Internet layer" which has a repetitive structure very dependent on the context. So we could "slice" the traffic by layers, and compress each layer individually, potentially facilitating the compression algorithm's work by providing it with more homogeneous data.

As a proof of concept, I took a handy-enough packet capture from the Mid-Atlantic Collegiate Cyber Defense Competition and started playing with it (or a subset of the first 100,000 packets to start with).

I started by analyzing the used protocols and packet sizes:

$ tshark -r maccdc2012_00000.pcap.gz -c 100000 -Tfields -e _ws.col.Protocol |sort|uniq -c|sort -rg|head

88167 SMB

6445 TCP

2742 TLSv1

638 STP

408 HTTP

397 UDP

307 ICMP

277 SSLv2

264 EIGRP

74 SSHv2

Looks like SMB is the prevailing protocol. But let's not dive further than TCP in this case since I want to try a very generic solution.

Fixed Layer Sizes

Instead of full protocol analysis, we can try to choose good fixed layer sizes by just looking at packet sizes:

$ tshark -r maccdc2012_00000.pcap.gz -c 100000 -Tfields -e frame.len|sort|uniq -c|sort -rg|head

43783 109
14562 212
14560 208
4896 207
4872 209
4853 217
4281 70
1445 78
769 64
609 68

So for this proof of concept, I will be using the following fixed size layers:

layersizes = (
16, # Packet header (time & length)
14, # Ethernet
20, # IP (almost always)
20, # TCP (typical)
109 - (14+20+20), # Bytes til 109
212 - (14+20+20+109), # Bytes til 212
9999 # The rest
)

Note that I am adding two fixed size layers based on the most frequent packet sizes, plus another for the rest. These layers should really be alternatives to each other but for the sake of simplicity I am treating them as if they represented more protocol layers.

With a simple Python (dpkt) script I sliced the PCAP into 6 layers, adding up to the same decompressed size:

for ts, buf in pcap:
packets +=1
l = len(buf)
bytecount += l
size_hist[l] = size_hist.get(l,0) + 1
streams[0][1].write(struct.pack('dII',ts,l,l))
streams[0][2] += 16 # Same size as in PCAP

pos, rest = 0, len(buf)
for s in streams[1:]:
fragments += 1
w = min(s[0], rest)
s[1].write(buf[pos:pos+w])
s[2] += w
if w >= rest:
break
pos += w
rest -= w

Note that we generated one stream per layer, and concatenating them we obtain a file of the same size as the original pcap:

$ cat 1e5packets-layer* > 1e5packets.layers
$ ll|cut -c30-
1600024 Jul 30 09:31 1e5packets-layer0
1400000 Jul 30 09:31 1e5packets-layer1
2000000 Jul 30 09:31 1e5packets-layer2
2000000 Jul 30 09:31 1e5packets-layer3
5194853 Jul 30 09:31 1e5packets-layer4
2297257 Jul 30 09:31 1e5packets-layer5
2678388 Jul 30 09:31 1e5packets-layer6
17170522 Jul 30 09:32 1e5packets.layers # Added layers
17170522 Jul 27 22:16 1e5packets.pcap # Original

Now we can easily check the effect of compressing the original and the "layered" files with different algorithms:

Size	File	CPU seconds	Compress ratio	Layering Improvement	MBytes / CPU
17170522	1e5packets.pcap	-	1.00	-	-
3861578	1e5packets.pcap.gz1	0.5	4.45	n/a	26.62
3615304	1e5packets.layers.gz1	0.4	4.75	6.81%	33.89
3413950	1e5packets.pcap.gz	1.7	5.03	n/a	8.09
3382712	1e5packets.pcap.gz9	5.3	5.08	n/a	2.60
3272017	1e5packets.layers.gz9	8.8	5.25	3.38%	1.58
3266231	1e5packets.pcap.bz1	3.4	5.26	n/a	4.09
3238030	1e5packets.layers.gz	1.5	5.30	5.43%	9.29
3149087	1e5packets.layers.bz2	5.3	5.45	-1.17%	2.65
3112274	1e5packets.pcap.bz2	3.6	5.52	n/a	3.91
2989412	1e5packets.layers.bz1	3.9	5.74	9.26%	3.64
2438016	1e5packets.pcap.xz1	2.2	7.04	n/a	6.70
2283236	1e5packets.pcap.xz	17.0	7.52	n/a	0.88
2281332	1e5packets.pcap.xz9	17.0	7.53	n/a	0.88
2148592	1e5packets.layers.xz1	2.0	7.99	13.47%	7.51
1803756	1e5packets.layers.xz	15.0	9.52	26.58%	1.02
1802924	1e5packets.layers.xz9	15.2	9.52	26.54%	1.01

In the best case of "xz" compression we appreciate a boost of 13~27% in compression efficiency, and in all cases but bzip2 there was a decrease in CPU cost.

Note: Some results were actually better with different layer sizes resulting from a typo in the first runs of last night, so there is room for improvement in the way we choose layer sizes...

Test Code

Using python 2.7 cause v3 did not have dpkt module ready on my Ubuntu system:

pcapz.py

#!/env/python2.7

# "pcapz" prototype by gatopeich 2017

import dpkt
import subprocess
import struct
import os
import time
import zlib

input = '1e5packets'
f = open(input+'.pcap')
pcap = dpkt.pcap.Reader(f)

headers = (
 16, # Packet header (time & length)
 # 14+20+20, # Eth+IP+TCP
 14, # Ethernet
 20, # IP (almost always)
 20, # TCP (typical)
 109-14-20-20,
 # 212-14-20-20,
 99999 # Rest
 )

print "headers:",headers

compression_methods = (
 ('bz1', 'bzip2 -1c'),
 # ('bz9', 'bzip2 -9c'),
 # ('gz1', 'gzip -1c'),
 # ('gz9', 'gzip -9c'),
 ('xz1', 'xz -1c'),
 # ('xz9', 'xz -9c'),
 ('zlib1', zlib.compress, 1),
 ('zlib9', zlib.compress, 9)
 )

def compress(method, filename):
 cfile = filename+'.'+method[0]
 if len(method) == 2:
  out = subprocess.check_output("time -p %s %s > %s"%(method[1],filename,cfile)
   , shell=True, stderr=subprocess.STDOUT)
  size = os.stat(cfile).st_size
  cpu = float(out.split()[1])
 else:
  data = open(filename,'rb').read()
  t0 = time.time()
  size = len(method[1](data,method[2]))
  cpu = time.time() - t0
 return size, cpu 

class Stream:
 N = 0
 def __init__(self, basename, header_size):
  self.h_size = header_size
  self.filename = '%s-stream%d'%(basename, Stream.N)
  Stream.N += 1
  self.file = open(self.filename,'wb')
  self.size = 0

 def write(self, data):
  self.file.write(data)
  self.size += len(data)

 def compress(self, methods = compression_methods):
  self.file.close()
  self.best_size = self.size
  self.best_method = None
  print "%s: %d bytes."%(self.filename, self.size),
  fsize = float(self.size)
  for method in methods:
   size, cpu = compress(method, self.filename)
   print method[0], 'x%.1f(%.1fs),'%(fsize/size,cpu),
   if size < self.best_size:
    self.best_size = size
    self.cpu_cost = cpu
    self.best_method = method[0]
  print
  print " => -%.1f Kbytes with %s in %.1f seconds."%(
   (self.size-self.best_size)/1024.0, self.best_method, self.cpu_cost)

streams = tuple(Stream(input,hs) for hs in headers)

streams[0].write('--- PCAP Header STUB ---')

size_hist = {}
packets, fragments, bytecount = 0, 0, 0
for ts, buf in pcap:
 packets +=1
 l = len(buf)
 bytecount += l
 size_hist[l] = size_hist.get(l,0) + 1
 
 streams[0].write(struct.pack('dII',ts,l,l))

 pos, rest = 0, l
 for s in streams[1:]:
  fragments += 1
  if s.h_size > 0:
   w = min(s.h_size, rest)
  elif (-s.h_size) >= l:
   w = rest
  else:
   continue
  s.write(buf[pos:pos+w])
  rest -= w
  if rest <= 0:
   break
  pos += w
 assert rest == 0

print "Done. packets, fragments, bytecount =", packets, fragments, bytecount
total = [0,0,0]
for s in streams:
 s.compress()
 total[0] += s.size
 total[1] += s.best_size
 total[2] += s.cpu_cost
print "All streams:", total[0], "==>", total[1], "(-%.3fMB)"%((total[0]-total[1])/(1024*1024.0)),
print "ratio = %.1f"%(float(total[0])/total[1]),
print "cost =", total[2], " cpusecs."

# Print the stats we use above for selecting layer sizes...
#top_sizes = sorted(map(lambda ab: ab[::-1], size_hist.items()), reverse=True)
#print "top_sizes:", top_sizes[:7]
# ==> [(43783, 109), (14562, 212), (14560, 208), (4896, 207), (4872, 209), (4853, 217), (4281, 70)]

And some sample outputs:

headers: (16, 14, 20, 20, 55, 158, 99999)

Done. packets, fragments, bytecount = 100000 448868 15570498
1e5packets-stream0: 1600024 bytes. bz1 x14.6(0.2s), xz1 x23.2(0.1s), zlib1 x10.2(0.0s), zlib9 x10.7(1.1s),
 => -1495.3 Kbytes with xz1 in 0.1 seconds.
1e5packets-stream1: 1400000 bytes. bz1 x155.0(0.8s), xz1 x114.0(0.1s), zlib1 x43.4(0.0s), zlib9 x115.9(0.1s),
 => -1358.4 Kbytes with bz1 in 0.8 seconds.
1e5packets-stream2: 2000000 bytes. bz1 x6.7(0.4s), xz1 x7.5(0.3s), zlib1 x3.5(0.1s), zlib9 x3.5(1.9s),
 => -1694.0 Kbytes with xz1 in 0.3 seconds.
1e5packets-stream3: 2000000 bytes. bz1 x4.1(0.4s), xz1 x6.6(0.3s), zlib1 x2.9(0.1s), zlib9 x3.2(0.9s),
 => -1656.4 Kbytes with xz1 in 0.3 seconds.
1e5packets-stream4: 5194853 bytes. bz1 x4.1(1.0s), xz1 x5.5(0.8s), zlib1 x3.6(0.1s), zlib9 x4.2(4.1s),
 => -4144.7 Kbytes with xz1 in 0.8 seconds.
1e5packets-stream5: 4807822 bytes. bz1 x8.8(1.2s), xz1 x13.0(0.3s), zlib1 x9.1(0.1s), zlib9 x10.1(0.2s),
 => -4334.7 Kbytes with xz1 in 0.3 seconds.
1e5packets-stream6: 167823 bytes. bz1 x5.2(0.1s), xz1 x6.9(0.0s), zlib1 x6.1(0.0s), zlib9 x6.4(0.0s),
 => -140.2 Kbytes with xz1 in 0.0 seconds.
All streams: 17170522 ==> 1991123 (-14.476203MB) ratio = 8.6 cost = 2.65  cpusecs.

headers: (16, 8, 8, 8, 8, 8, 14, 55, 99999)
Done. packets, fragments, bytecount = 100000 747720 15570498
1e5packets-stream0: 1600024 bytes. bz1 x14.6(0.2s), xz1 x23.2(0.1s), xz9 x30.9(2.0s), zlib1 x10.2(0.0s), zlib9 x10.7(1.1s),
 => -1512.0 Kbytes with xz9 in 2.0 seconds.
1e5packets-stream1: 800000 bytes. bz1 x126.4(0.5s), xz1 x91.5(0.0s), xz9 x116.5(0.2s), zlib1 x37.5(0.0s), zlib9 x103.8(0.1s),
 => -775.1 Kbytes with bz1 in 0.5 seconds.
1e5packets-stream2: 800000 bytes. bz1 x101.9(0.5s), xz1 x87.3(0.0s), xz9 x107.2(0.2s), zlib1 x34.2(0.0s), zlib9 x80.8(0.1s),
 => -774.0 Kbytes with xz9 in 0.2 seconds.
1e5packets-stream3: 800000 bytes. bz1 x5.5(0.2s), xz1 x7.4(0.1s), xz9 x8.2(0.9s), zlib1 x3.1(0.0s), zlib9 x2.9(1.9s),
 => -685.6 Kbytes with xz9 in 0.9 seconds.
1e5packets-stream4: 800000 bytes. bz1 x6.2(0.2s), xz1 x7.9(0.1s), xz9 x10.4(0.8s), zlib1 x3.5(0.0s), zlib9 x3.5(1.6s),
 => -706.4 Kbytes with xz9 in 0.8 seconds.
1e5packets-stream5: 800000 bytes. bz1 x43.3(0.4s), xz1 x44.9(0.0s), xz9 x56.0(0.3s), zlib1 x22.4(0.0s), zlib9 x34.8(0.1s),
 => -767.3 Kbytes with xz9 in 0.3 seconds.
1e5packets-stream6: 1400000 bytes. bz1 x2.9(0.3s), xz1 x4.5(0.3s), xz9 x6.3(0.9s), zlib1 x2.2(0.1s), zlib9 x2.4(1.1s),
 => -1149.2 Kbytes with xz9 in 0.9 seconds.
1e5packets-stream7: 5194853 bytes. bz1 x4.1(1.0s), xz1 x5.5(0.8s), xz9 x5.9(5.4s), zlib1 x3.6(0.1s), zlib9 x4.2(4.1s),
 => -4209.3 Kbytes with xz9 in 5.4 seconds.
1e5packets-stream8: 4975645 bytes. bz1 x8.5(1.3s), xz1 x12.6(0.4s), xz9 x12.9(3.4s), zlib1 x9.0(0.1s), zlib9 x9.9(0.2s),
 => -4481.5 Kbytes with xz9 in 3.4 seconds.
All streams: 17170522 ==> 1748617 (-14.707MB) ratio = 9.8 cost = 14.28  cpusecs.

BEST ration I have found manually is 10.0:

headers: (16, 8, 8, 8, 8, 8, 16, 64, 64, 99999)
Done. packets, fragments, bytecount = 100000 793191 15570498
1e5packets-stream0: 1600024 bytes. bz1 x14.6(0.2s), xz1 x23.2(0.1s), xz9 x30.9(1.9s), zlib1 x10.2(0.0s), zlib9 x10.7(1.1s),
 => -1512.0 Kbytes with xz9 in 1.9 seconds.
1e5packets-stream1: 800000 bytes. bz1 x126.4(0.5s), xz1 x91.5(0.0s), xz9 x116.5(0.2s), zlib1 x37.5(0.0s), zlib9 x103.8(0.1s),
 => -775.1 Kbytes with bz1 in 0.5 seconds.
1e5packets-stream2: 800000 bytes. bz1 x101.9(0.5s), xz1 x87.3(0.0s), xz9 x107.2(0.2s), zlib1 x34.2(0.0s), zlib9 x80.8(0.1s),
 => -774.0 Kbytes with xz9 in 0.2 seconds.
1e5packets-stream3: 800000 bytes. bz1 x5.5(0.2s), xz1 x7.4(0.1s), xz9 x8.2(0.9s), zlib1 x3.1(0.0s), zlib9 x2.9(1.9s),
 => -685.6 Kbytes with xz9 in 0.9 seconds.
1e5packets-stream4: 800000 bytes. bz1 x6.2(0.2s), xz1 x7.9(0.1s), xz9 x10.4(0.7s), zlib1 x3.5(0.0s), zlib9 x3.5(1.6s),
 => -706.4 Kbytes with xz9 in 0.7 seconds.
1e5packets-stream5: 800000 bytes. bz1 x43.3(0.4s), xz1 x44.9(0.0s), xz9 x56.0(0.3s), zlib1 x22.4(0.0s), zlib9 x34.8(0.1s),
 => -767.3 Kbytes with xz9 in 0.3 seconds.
1e5packets-stream6: 1600000 bytes. bz1 x2.0(0.4s), xz1 x2.9(0.5s), xz9 x3.6(1.1s), zlib1 x1.7(0.1s), zlib9 x1.7(1.7s),
 => -1128.0 Kbytes with xz9 in 1.1 seconds.
1e5packets-stream7: 5518075 bytes. bz1 x5.5(1.1s), xz1 x7.5(0.6s), xz9 x8.3(5.6s), zlib1 x5.0(0.1s), zlib9 x5.9(2.7s),
 => -4735.7 Kbytes with xz9 in 5.6 seconds.
1e5packets-stream8: 2981162 bytes. bz1 x7.1(0.6s), xz1 x11.8(0.2s), xz9 x11.6(3.0s), zlib1 x7.0(0.0s), zlib9 x8.5(0.4s),
 => -2664.3 Kbytes with xz1 in 0.2 seconds.
1e5packets-stream9: 1471261 bytes. bz1 x12.5(0.8s), xz1 x14.5(0.1s), xz9 x14.8(0.5s), zlib1 x12.3(0.0s), zlib9 x13.4(0.0s),
 => -1339.6 Kbytes with xz9 in 0.5 seconds.
All streams: 17170522 ==> 1720437 (-14.734MB) ratio = 10.0 cost = 12.0  cpusecs.

Combined xz1/xz4 compression achieves a very nice 9.6 in just 4 seconds:

headers: (16, 8, 8, 8, 8, 8, 16, 64, 64, 99999)
Done. packets, fragments, bytecount = 100000 793191 15570498
1e5packets-stream0: 1600024 bytes. xz1 x23.2(0.1s), xz4 x21.9(0.6s),
 => -1495.3 Kbytes with xz1 in 0.1 seconds.
1e5packets-stream1: 800000 bytes. xz1 x91.5(0.0s), xz4 x91.2(0.1s),
 => -772.7 Kbytes with xz1 in 0.0 seconds.
1e5packets-stream2: 800000 bytes. xz1 x87.3(0.0s), xz4 x87.0(0.1s),
 => -772.3 Kbytes with xz1 in 0.0 seconds.
1e5packets-stream3: 800000 bytes. xz1 x7.4(0.1s), xz4 x8.2(0.7s),
 => -685.4 Kbytes with xz4 in 0.7 seconds.
1e5packets-stream4: 800000 bytes. xz1 x7.9(0.1s), xz4 x10.5(0.7s),
 => -706.9 Kbytes with xz4 in 0.7 seconds.
1e5packets-stream5: 800000 bytes. xz1 x44.9(0.0s), xz4 x49.0(0.1s),
 => -765.3 Kbytes with xz4 in 0.1 seconds.
1e5packets-stream6: 1600000 bytes. xz1 x2.9(0.5s), xz4 x3.7(1.1s),
 => -1137.8 Kbytes with xz4 in 1.1 seconds.
1e5packets-stream7: 5518075 bytes. xz1 x7.5(0.6s), xz4 x6.5(1.7s),
 => -4674.7 Kbytes with xz1 in 0.6 seconds.
1e5packets-stream8: 2981162 bytes. xz1 x11.8(0.2s), xz4 x12.2(0.6s),
 => -2672.9 Kbytes with xz4 in 0.6 seconds.
1e5packets-stream9: 1471261 bytes. xz1 x14.5(0.1s), xz4 x14.6(0.2s),
 => -1338.6 Kbytes with xz4 in 0.2 seconds.
All streams: 17170522 ==> 1788160 (-14.670MB) ratio = 9.6 cost = 4.15 cpusecs.

Simple Linux tools & hacks

Saturday, July 29, 2017

Network Traffic Compression Enhancement by Protocol Layering

Fixed Layer Sizes

Test Code