Shannon's noiseless coding theorem

'''Shannon's noiseless coding theorem''' places an upper and a lower bound on the minimal possible expected length of codewords as a function of the [[mathematical entropy|entropy]] of the input word (which is viewed as a [[random variable]]) and of the size of the target alphabet. ==Shannon's statement== Let ''X'' be a random variable taking values in some finite alphabet \Sigma_1 and let ''f'' be a decipherable code from \Sigma_1 to \Sigma_2 where |\Sigma_2^*|=a. Let ''S'' denote the resulting wordlength of ''f(X)''. If ''f'' is optimal in the sense that it has the minimal expected wordlength for ''X'', then : \frac{H(X)}{\log_2 a} \leq \mathbb{E}S < \frac{H(X)}{\log_2 a} +1 ==Proof== Let s_i denote the wordlength of each possible x_i (i=1,\ldots,n). Define q_i = a^{-s_i}/C, where ''C'' is chosen so that \sum q_i = 1. Then : \begin{matrix} H(X) &=& - \sum_{i=1}^n p_i \log_2 p_i \\ && \\ &\leq& - \sum_{i=1}^n p_i \log_2 q_i \\ && \\ &=& - \sum_{i=1}^n p_i \log_2 a^{-s_i} + \sum_{i=1}^n \log_2 C \\ && \\ &\leq& - \sum_{i=1}^n - s_i p_i \log_2 a \\ && \\ &\leq& - \mathbb{E}S \log_2 a \\ \end{matrix} where the second line follows from [[Gibbs' inequality]] and the third line follows from [[Kraft's inequality]]: C = \sum_{i=1}^n a^{-s_i} \leq 1 so \log C \leq 0. For the second inequality we may set :s_i = \lceil - \log_a p_i \rceil so that : - \log_a p_i \leq s_i < -\log_a p_i + 1 and so : a^{-s_i} \leq p_i and : \sum a^{-s_i} \leq \sum p_i = 1 and so by Kraft's inequality there exists a prefix-free code having those wordlengths. Thus the minimal ''S'' satisfies : \begin{matrix} \mathbb{E}S & = & \sum p_i s_i \\ && \\ & < & \sum p_i \left( -\log_a p_i +1 \right) \\ && \\ & = & \sum - p_i \frac{\log_2 p_i}{\log_2 a} +1 \\ && \\ & = & \frac{H(X)}{\log_2 a} +1 \\ \end{matrix} {{FromWikipedia}} [[Category: Quantum Information Theory]]