Nonce Algorithm: 2010

Hi,

This month, I will speak about data compression.

So, as a starting point, we are going to look at our first "nonce algorithm": RLE.

The Run-Length Encoding is a very spread algorithm. Each engineer who wants to compress some data begins by a test of RLE.

Why is it so usefull? Well, let's say that it is very simple and that it needs only a few line of code.
Most important (and that's why it is a nonce-algorithm according to me) is the fact that RLE encodes the length between two similar values and that you are able to choose the number of bits used to stock a value.

So, if your data stream looks like this:
n,o,n,o,n,o,n,o,n,o,n,o,n,o,n,o.
with n quantized on a nibble

A RLE working at the nibble level should give you:
1-n,1-0,1-n,1-0,1-n,1-0,1-n,1-0,1-n,1-0,1-n,1-0,1-n,1-0,1-n,1-0.
with 1 quantized on log2(1) bits.
So, at the nibble level you get a compression rate of (16 * (1 nibble * log2(1) bits) / 16 nibbles
Let's say it is inappropriate for compression because it is inferior to 1.

A RLE working at the byte level should give you:
8-n0.
with 8 quantized on a nibble.
So, at the byte level you get a compression rate of 16/3.
Much better...

But a RLE working at the double level should give you:
1-nononononononono.
with 1 quantized on a double.
So, at the byte level you get a compression rate of 1/2.
And your compression rate is inferior to 1.

So, RLE is a nonce algorithm since you have to adapt it to the size of the most redundant pattern in your data if you want to reach the best compression rate. Otherwise, you will be depending upon an other algorithm like LZ to compensate.

In fact, I consider RLE as a mean to convert a stream full of redundancy into a sequences of tuples {value of interest, pointer on this value}. So, you can adapt RLE to your data and use it as a mean to localize some value of interest in your stream. It is exactly how more sophisticated algorithm designed to compress data (like LZ) is proceeding.

Next week, we will look for another nonce algorithm to find the size of the most redundant pattern in your data.

If you don't know where to begin to code your own RLE, I use the Matlab implementation of Stefan Eireiner.

Thanks for this piece of Fex.

Nonce Algorithm

vendredi 11 juin 2010

samedi 29 mai 2010