Chapter Four - Information Theory

Let $I(p)$ be the amount of information given by an event of probability $p$.

$I(p) = kln(p)$ for some $k \lt 0$

Shannon entropy of S

$H(S) = H2(S) = \sum{i=1}^q p_iI(pi) = - \sum{i=1}^q p_i log_2 p_i$

Example - Single coin toss
H(S) = -¹⁄₂ log_2 ¹⁄₂ + -¹⁄₂ log_2 ¹⁄₂

Example - Double coin toss, counting heads
p_1 = P(TT) = ¹⁄₄
p_2 = P(HT, TH) = ²⁄₄
p_3 = P(HH) = ¹⁄₄

There are 0 heads ¹⁄₄ of the time.
There are 1 heads ¹⁄₂ of the time.
There are 2 heads ¹⁄₄ of the time.

H(S) = (-¹⁄₄ log_2 ¹⁄₄) + (-¹⁄₂ log_2 ¹⁄₂) + (-¹⁄₄ log_2 ¹⁄₄) = ³⁄₂

On average, there are ³⁄₂ bits of information

Example - Huffman code
$P = 0.3, 0.2, 0.2, 0.1, 0.1, 0.1$
$H(S) = -\sum_{i=1}^q p_ilog_2p_i$ = 2.446

Example - Arithmetic coding
$P = 0.4, 0.2, 0.2, 0.1, 0.1$
H(S) ~= 0.639 digits/symbol

If p_1, …, p_q and p’1, …, p’q are probability distributions then

-\sum_{i=1}^q p_1 log_r pi \le -\sum{i=1}^q p_1 log_r p’i

And

\sum_{i=1}^q p_1 log_r p’i/p_1 \le 0

There is an equality if and only if pi = p’i for all i

For any source S with q symbols

H_r(S) \le log_r q

with equality iff all symbols are equally likely ( $H_r(S) = log_r |S|$ )

For each radix $r$ UD-code $C$ for source $S$, $H_r(S) \le L_r$, with equality iff $p_i = r^{-l_i} for all $i$ and $Kr = \sum{i=1}^q r^{-l_i} = 1$

-> K = \sum_{i=1}^q 2^{-l_i} = 1

-> PINK ELEPHANT

Input: Soruce S = {s_1, …, s_q} probabilities p_1, …, p_q Output: radix r I-code C for S given a decision key

Algorithm::

– Example r = 2

Find the probabilities find the reciprocals (1/p) Find the smallest power of r, which the reciprocal is smaller than

They give us the lengths of the tree

Hr(S) \le L{r,H} \le L_{r,SF} \lt H_r(S) + 1

Huffman - better compression SF can be used even when only some p_i are known SF lengths prior to encoding

Worst by at most one.