This page contains MatLab
functions, m-files, that do Huffman coding and
arithmetic coding of integer (symbol) sequences. Complete coding may be done by
calling an easy to use main program (or main function), where input argument is
the sequences you want to compress and the output is the compressed bit stream,
as a vector of bytes. Decoding is done just by switching the arguments. These
main programs are: the Huffman coder, Huff06, and two versions of the
arithmetic coder, Arith06.m and Arith07.m.
My interest for lossless
coding started when I took a course in Information Theory which included the
principles both for Huffman coding and arithmetic coding. My first projects as an Ph.D. student was in signal representation and signal
compression. Most of this work was done in MatLab, so I searched for an entropy
coder, i.e. lossless coding of a symbol (integer) sequence, that was easy to
use from MatLab. What I found was mostly programs written in C, so I tried to
make a Huffman Coder as a function (m-file) in MatLab. The function (m-file)
was especially made for compression of quantized coefficient values,
or for values after run-length coding. It is assumed that the histogram of the
values, i.e. probabilities, is symmetric about zero (or only non-negative
numbers), and that probability decrease as distance from zero increase. The
program works ok also if this it not the case, but the compression ratio is not
as good as it will be if the sequences are as expected. F.ex.
this Huffman coder is not the best to use if you want to compress a text file,
where the symbols are the ASCII-codes (bytes), or a general file where you just
have a sequence of bytes.
An entropy coder usually
only exploits the symbol probabilities independent of previous symbols, this is
optimal for uncorrelated sequences. For signal compression a decorrelation process usually precede the entropy coding,
but often the decorrelation is not perfect. The
Huffman coder was made such that it can exploit some of these remaining dependencies;
this was done by manipulating the input sequence. The Huffman coder worked
quite well, and we wrote an article on it. The article, "Improved
Huffman coding using recursive splitting", norsig99.pdf (172 kB), was
presented at the NORSIG-99 conference 9-11 sep. 1999. At the end of this page
the abstract of the article is included.
Since then I have updated
the program several times, the discovered errors have been corrected, the
interface (arguments passed and returned) were changed. Now almost any kind of
integer sequences can be compressed, but it will still work best if the initial
assumptions are met or nearly met. I have also made an arithmetic coder in MatLab, two versions of this are also available on this
page.
Lossless compression of a
sequence of symbols is an important part of data and signal compression. Huffman
coding is lossless, it is also often used in lossy compression as the final step after decomposition and
quantization of a signal. In signal compression, the decomposition/quantization
part seldom manages to produce a sequence of completely independent symbols. Here
we present a scheme giving better results than straightforward Huffman coding
by utilizing this fact. We split the original symbol sequence into two
sequences in such a way that the symbol statistics are, hopefully, different
for the two sequences. Individual Huffman coding for each of these sequences
will reduce the average bit rate. This split is done recursively for each sub-sequence
until the cost associated with the split is larger than the gain.
Experiments were done on
different signals. They were decomposed with the Discrete Cosine Transform,
quantized, and End Of Block coded, to give the input
symbol sequence. The results using the split scheme was a bit rate reduction of
usually more than 10% compared to straightforward Huffman coding, and 0-15%
better than JPEG-like Huffman coding, best at low bit rates.
This WWW page is maintained by
Karl Skretting at the Cybernetics
Group at University of Stavanger.
Last update: Oct. 11. 2005.