My watch list
my.chemeurope.com

# Conditional entropy

In information theory, the conditional entropy (or equivocation) quantifies the remaining entropy (i.e. uncertainty) of a random variable Y given that the value of a second random variable X is known. It is referred to as the entropy of Y conditional on X, and is written H(Y | X). Like other entropies, the conditional entropy is measured in bits, nats, or hartleys.

Given discrete random variable X with support $\mathcal X$ and Y with support $\mathcal Y$, the conditional entropy of Y given X is defined as: \begin{align} H(Y|X)\ &\stackrel{\mathrm{def}}{=}\sum_{x\in\mathcal X}\,p(x)\,H(Y|X=x)\\ &{=}-\sum_{x\in\mathcal X}p(x)\sum_{y\in\mathcal Y}\,p(y|x)\,\log\,p(y|x)\\ &=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}\,p(y,x)\,\log\,p(y|x)\\ &=-E_{p(x,y)}\log\,p(Y|X). \end{align}

From this definition and Bayes' theorem, the chain rule for conditional entropy is $H(Y|X)\,=\,H(Y,X)-H(X)$.

This is true because \begin{align} H(Y|X)&=-E_{p(x,y)}\log\,p(y|x)\\ &=-E_{p(x,y)}\log\left(\frac{p(y,x)}{p(x)}\right)\\ &=-E_{p(x,y)}(\log p(y,x)-\log p(x))\\ &=-E_{p(x,y)}\log p(y,x)+E_{p(x)}\log p(x)\\ &=H(Y,X)-H(X). \end{align}

Intuitively, the combined system contains H(X,Y) bits of information: we need H(X,Y) bits of information to reconstruct its exact state. If we learn the value of X, we have gained H(X) bits of information, and the system has H(Y | X) bits remaining of uncertainty. H(Y | X) = 0 if and only if the value of Y is completely determined by the value of X. Conversely, H(Y | X) = H(Y) if and only if Y and X are independent random variables.

In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy.

## References

1. Theresa M. Korn; Korn, Granino Arthur. Mathematical Handbook for Scientists and Engineers: Definitions, Theorems, and Formulas for Reference and Review. New York: Dover Publications, 613-614. ISBN 0-486-41147-8.