'''Shannon's noiseless coding theorem''' places an upper and a lower bound on the minimal possible expected length of codewords as a function of the [[mathematical entropy|entropy]] of the input word (which is viewed as a [[random variable]]) and of the size of the target alphabet.
==Shannon's statement==
Let ''X'' be a random variable taking values in some finite alphabet and let ''f'' be a decipherable code from to where . Let ''S'' denote the resulting wordlength of ''f(X)''.
If ''f'' is optimal in the sense that it has the minimal expected wordlength for ''X'', then
:
==Proof==
Let denote the wordlength of each possible (). Define , where ''C'' is chosen so that .
Then
:
where the second line follows from [[Gibbs' inequality]] and the third line follows from [[Kraft's inequality]]: so .
For the second inequality we may set
:
so that
:
and so
:
and
:
and so by Kraft's inequality there exists a prefix-free code having those wordlengths. Thus the minimal ''S'' satisfies
:
{{FromWikipedia}}
[[Category: Quantum Information Theory]]