Shannon's noiseless coding theorem places an upper and a lower bound on the minimal possible expected length of codewords as a function of the entropy of the input word (which is viewed as a random variable) and of the size of the target alphabet.
Shannon's statement
Let X be a random variable taking values in some finite alphabet Σ1 and let f be a decipherable code from Σ1 to Σ2 where |Σ2*| = a. Let S denote the resulting wordlength of f(X).
If f is optimal in the sense that it has the minimal expected wordlength for X, then
\frac{H(X)}{\log_2 a} \leq \mathbb{E}S < \frac{H(X)}{\log_2 a} +1
Proof
Let si denote the wordlength of each possible xi (i = 1, …, n). Define qi = a−si/C, where C is chosen so that ∑qi = 1.
Then
H(X)amp;=amp;−∑ni=1pilog2piamp;amp;amp;≤amp;−∑ni=1pilog2qiamp;amp;amp;=amp;−∑ni=1pilog2a−si+∑ni=1log2Camp;amp;amp;≤amp;−∑ni=1−sipilog2aamp;amp;amp;≤amp;−ESlog2a
where the second line follows from Gibbs' inequality and the third line follows from Kraft's inequality: C=∑ni=1a−si≤1 so log C ≤ 0.
For the second inequality we may set
si = ⌈−logapi⌉
so that
- \log_a p_i \leq s_i < -\log_a p_i + 1
and so
a−si ≤ pi
and
∑a−si ≤ ∑pi = 1
and so by Kraft's inequality there exists a prefix-free code having those wordlengths. Thus the minimal S satisfies
ESamp;=amp;∑pisiamp;amp;amp;lt;amp;∑pi(−logapi+1)amp;amp;amp;=amp;∑−pilog2pilog2a+1amp;amp;amp;=amp;H(X)log2a+1