3 Probability Coding
As mentioned in the introduction, coding is the job of taking probabilities for messages and gen-
erating bit strings based on these probabilities. How the probabilities are generated is part of the
model component of the algorithm, which is discussed in Section 4.
In practice we typically use probabilities for parts of a larger message rather than for the com-
plete message, e.g., each character or word in a text. To be consistent with the terminology in the
previous section, we will consider each of these components a message on its own, and we will
use the term message sequence for the larger message made up of these components. In general
each little message can be of a different type and come from its own probability distribution. For
example, when sending an image we might send a message specifying a color followed by mes-
sages specifying a frequency component of that color. Even the messages specifying the color
might come from different probability distributions since the probability of particular colors might
depend on the context.
We distinguish between algorithms that assign a unique code (bit-string) for each message, and
ones that “blend” the codes together from more than one message in a row. In the first class we
will consider Huffman codes, which are a type of prefix code. In the later category we consider
arithmetic codes. The arithmetic codes can achieve better compression, but can require the encoder
to delay sending messages since the messages need to be combined before they can be sent.
3.1 Prefix Codes
A code C for a message set S is a mapping from each message to a bit string. Each bit string is
called a codeword, and we will denote codes using the syntax C = {(s
1
, w
1
), (s
2
, w
2
), · · · , (s
m
, w
m
)}.
Typically in computer science we deal with fixed-length codes, such as the ASCII code which maps
every printable character and some control characters into 7 bits. For compression, however, we
would like codewords that can vary in length based on the probability of the message. Such vari-
able length codes have the potential problem that if we are sending one codeword after the other
it can be hard or impossible to tell where one codeword finishes and the next starts. For exam-
ple, given the code {(a, 1), (b, 01), (c, 101), (d, 011)}, the bit-sequence 1011 could either be
decoded as aba, ca, or ad. To avoid this ambiguity we could add a special stop symbol to the
end of each codeword (e.g., a 2 in a 3-valued alphabet), or send a length before each symbol.
These solutions, however, require sending extra data. A more efficient solution is to design codes
in which we can always uniquely decipher a bit sequence into its code words. We will call such
codes uniquely decodable codes.
A prefix code is a special kind of uniquely decodable code in which no bit-string is a prefix
of another one, for example {(a, 1), (b, 01), (c, 000), (d, 001)}. All prefix codes are uniquely
decodable since once we get a match, there is no longer code that can also match.
Exercise 3.1.1. Come up with an example of a uniquely decodable code that is not a prefix code.
Prefix codes actually have an advantage over other uniquely decodable codes in that we can
decipher each message without having to see the start of the next message. This is important when
sending messages of different types (e.g., from different probability distributions). In fact in certain
10