In [[classical information theory]], the '''mutual information''' of two [[random variable]]s is a quantity that measures the mutual dependence of the two variables. Intuitively, the mutual information "I(X:Y)" measures the information about ''X'' that is shared by ''Y''. [[Image:classinfo.png]]
If ''X'' and ''Y'' are independent, then ''X'' contains no information about ''Y'' and vice versa, so their mutual information is zero. If ''X'' and ''Y'' are identical then all information conveyed by ''X'' is shared with ''Y'': knowing ''X'' reveals nothing new about ''Y'' and vice versa, therefore the mutual information is the same as the information conveyed by ''X'' (or ''Y'') alone, namely the [[information entropy|entropy]] of ''X''. In a specific sense (see below), mutual information quantifies the distance between the [[joint distribution]] of ''X'' and ''Y'' and the product of their [[marginal distribution]]s.
If we consider pairs of discrete random variables , then formally, the mutual information
can be defined as:
with , the [[Shannon entropy]] of "X" and "Y", and
the Shannon entropy of the pair "(X,Y)". In terms of the probabilities, the mutual information can be written as
:
where ''p'' is the [[joint distribution|joint probability distribution function]] of ''X'' and ''Y'', and ''f'' and ''g'' are the marginal probability distribution functions of ''X'' and ''Y'' respectively.
In the [[continuum|continuous]] case, we replace summation by a definite [[double integral]]:
:
where ''p'' is now the joint probability ''density'' function of ''X'' and ''Y'', and ''f'' and ''g'' are the marginal probability density functions of ''X'' and ''Y'' respectively.
Mutual information is nonnegative by [[subadditivity]] of the Shannon entropy. (i.e. ''I''(''X'';''Y'') ≥ 0; see below) and [[symmetry|symmetric]] (i.e. ''I''(''X'';''Y'') = ''I''(''Y'';''X'')).
==Relation to other quantities==
Mutual information can be equivalently expressed as
:
where is the [[conditional entropy|conditional entropies]].
Mutual information can also be expressed in terms of the [[Kullback-Leibler divergence]] between the [[joint distribution]] of two random variables ''X'' and ''Y'' and the product of their [[marginal distribution]]s. Let ''q''(''x'', ''y'') = ''f''(''x'') × ''g''(''y''); then
:
Furthermore, let ''h''''y''(''x'') = ''p''(''x'', ''y'') / ''g''(''y''). Then
:
::
::
Thus mutual information can also be understood as the [[expected value|expectation]] of the Kullback-Leibler divergence between the [[conditional distribution]] ''h'' of ''X'' given ''Y'' and the univariate distribution ''f'' of ''X'': the more different the distributions ''f'' and ''h'', the greater the information gain.
[[Category:Handbook of Quantum Information]]
[[Category:Classical Information Theory]]