Classical relative entropy

In [[probability theory]] and [[information theory]], the '''Kullback-Leibler divergence''', or '''relative entropy''', is a quantity which measures the difference between two [[probability distributions]]. It is named after [[Solomon Kullback]] and [[Richard Leibler]]. The term "divergence" is a misnomer; it is not the same as [[divergence]] in [[vector calculus|calculus]]. One might be tempted to call it a "[[metric space|distance metric]]", but this would also be a misnomer as the Kullback-Leibler divergence is not [[symmetric]] and does not satisfy the [[triangle inequality]]. The Kullback-Leibler divergence between two probability distributions ''p'' and ''q'' is defined as : \mathrm{KL}(p,q) = \sum_x p(x) \log_2 \frac{p(x)}{q(x)} \! for distributions of a [[discrete]] variable, and as : \mathrm{KL}(p,q) = \int_{-\infty}^{\infty} p(x) \log_2 \frac{p(x)}{q(x)} \; dx \! for distributions of a [[continuous random variable]]. The logarithms in these formulae are conventionally taken to base 2, so that the quantity can be interpreted in units of [[bit|bits]]; the other important properties of the KL divergence hold irrespective of log base. It can be seen from the definition that : \mathrm{KL}(p,q) = -\sum_x p(x) \log_2 q(x) + \sum_x p(x) \log_2 p(x) = H(p,q) - H(p)\, \! denoting by ''H''(''p'',''q'') the [[cross entropy]] of ''p'' and ''q'', and by ''H''(''p'') the [[information entropy|entropy]] of ''p''. As the cross-entropy is always greater than or equal to the entropy, this shows that the Kullback-Leibler divergence is nonnegative, and furthermore ''KL''(''p'',''q'') is zero [[iff]] ''p'' = ''q'', a result known as [[Gibbs' inequality]]. In [[coding theory]], the KL divergence can be interpreted as the needed extra message-length per datum for sending messages distributed as ''q'', if the messages are encoded using a code that is optimal for distribution ''p''. In [[Bayesian statistics]] the KL divergence can be used as a measure of the "distance" between the [[prior distribution]] and the [[posterior distribution]]. The KL divergence is also the gain in [[information entropy|Shannon information]] involved in going from the prior to the posterior. In [[Bayesian experimental design]] a design which is optimised to maximise the KL divergence between the prior and the posterior is said to be [[Bayes d-optimality|Bayes d-optimal]]. In [[quantum information science]] it is used as a measure of entanglement in a state. It should be noted that Kullback and Leibler themselves actually defined the divergence as: : \mathrm{KL}(p,q) + \mathrm{KL}(q,p)\, \! which is symmetric and nonnegative. ==References== * S. Kullback and R. A. Leibler. On information and sufficiency. ''Annals of Mathematical Statistics'' 22(1):79–86, March 1951. {{FromWikipedia}} [[Category:Classical Information Theory]] [[Category:Handbook of Quantum Information]] [[Category:Entropy]]