My watch list
my.chemeurope.com

# Joint entropy

The joint entropy is an entropy measure used in information theory. The joint entropy measures how much entropy is contained in a joint system of two random variables. If the random variables are X and Y, the joint entropy is written H(X,Y). Like other entropies, the joint entropy can be measured in bits, nits, or hartleys depending on the base of the logarithm.

## Background

Given a random variable X, the entropy H(X) describes our uncertainty about the value of X. If X consists of several events x, which each occur with probability px, then the entropy of X is

$H(X) = -\sum_x p_x \log_2(p_x) \!$

Consider another random variable Y, containing events y occurring with probabilities py. Y has entropy H(Y).

However, if X and Y describe related events, the total entropy of the system may not be H(X) + H(Y). For example, imagine we choose an integer between 1 and 8, with equal probability for each integer. Let X represent whether the integer is even, and Y represent whether the integer is prime. One-half of the integers between 1 and 8 are even, and one-half are prime, so H(X) = H(Y) = 1. However, if we know that the integer is even, there is only a 1 in 4 chance that it is also prime; the distributions are related. The total entropy of the system is less than 2 bits. We need a way of measuring the total entropy of both systems.

## Definition

We solve this by considering each pair of possible outcomes (x,y). If each pair of outcomes occurs with probability px,y, the joint entropy is defined as

$H(X,Y) = -\sum_{x,y} p_{x,y} \log_2(p_{x,y}) \!$

In the example above we are not considering 1 as a prime. Then the joint probability distribution becomes:

$P(even,prime)=P(odd,not prime)=1/8 \quad$

$P(even,not prime)=P(odd,prime)=3/8 \quad$

Thus, the joint entropy is

$-2\frac{1}{8}\log_2(1/8) -2\frac{3}{8}\log_2(3/8) \approx 1.8$ bits.

## Properties

### Greater than subsystem entropies

The joint entropy is always at least equal to the entropies of the original system; adding a new system can never reduce the available uncertainty.

$H(X,Y) \geq H(X)$

This inequality is an equality if and only if Y is a (deterministic) function of X.

if Y is a (deterministic) function of X, we also have

$H(X) \geq H(Y)$

Two systems, considered together, can never have more entropy than the sum of the entropy in each of them. This is an example of subadditivity.

$H(X,Y) \leq H(X) + H(Y)$

This inequality is an equality if and only if X and Y are statistically independent.

### Bounds

Like other entropies, $H(X,Y) \geq 0$ always.

## Relations to Other Entropy Measures

The joint entropy is used in the definitions of the conditional entropy:

$H(X|Y) = H(X,Y) - H(Y)\,$

and the mutual information:

$I(X;Y) = H(X) + H(Y) - H(X,Y)\,$

In quantum information theory, the joint entropy is generalized into the joint quantum entropy.

## References

1. Theresa M. Korn; Korn, Granino Arthur. Mathematical Handbook for Scientists and Engineers: Definitions, Theorems, and Formulas for Reference and Review. New York: Dover Publications, 613-614. ISBN 0-486-41147-8.