The **conditional entropy** measures how much entropy a random variable *X* has remaining if we have already learned the value of a second random variable *Y*. It is referred to as *the entropy of X conditional on Y*, and is written

*H*(

*X*∣

*Y*). If the probability that

*X*=

*x*is denoted by

*p*(

*x*), then we donote by

*p*(

*x*∣

*y*) the probability that

*X*=

*x*, given that we already know that

*Y*=

*y*.

*p*(

*x*∣

*y*) is a conditional probability. In Baysian language,

*Y*represents our prior information information about

*X*.

The conditional entropy is just the Shannon entropy with *p*(*x*∣*y*) replacing *p*(*x*), and then we average it over all possible "Y".

H(X|Y):=\sum_{xy} p(x|y)\log p(x|y) p(y).

Using the Baysian sum rule *p*(*x**y*) = *p*(*x*∣*y*)*p*(*y*), one finds that the conditional entropy is equal to H(X|Y) = H(X,Y) - H(Y) with "H(XY)" the joint entropy of "X" and "Y".

### See Also

Category:Handbook of Quantum Information Category:Classical Information Theory Category:Entropy