My watch list
my.chemeurope.com

# Differential entropy

Differential entropy (also referred to as continuous entropy) is a concept in information theory which tries to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions.

## Definition

Let X be a random variable with a probability density function f whose support is a set $\mathbb X$. The differential entropy h(X) or h(f) is defined as $h(X) = -\int_\mathbb{X} f(x)\log f(x)\,dx.$

As with its discrete analog, the units of differential entropy depend on the base of the logarithm, which is usually 2 (i.e., the units are bits). See logarithmic units for logarithms taken in different bases. Related concepts such as joint, conditional differential entropy, and relative entropy are defined in a similar fashion. One must take care in trying to apply properties of discrete entropy to differential entropy, since probability density functions can be greater than 1. For example, Uniform(0,1/2) has differential entropy $\int_0^\frac{1}{2} -2\log2\,dx=-\log2\,$.

The definition of differential entropy above can be obtained by partitioning the range of X into bins of length Δ with associated sample points iΔ within the bins, for X Riemann integrable. This gives a quantized version of X, defined by XΔ = iΔ if $i\Delta \leq X \leq (i+1)\Delta$. Then the entropy of XΔ is $-\sum_i f(i\Delta)\Delta\log f(i\Delta) - \sum f(i\Delta)\Delta\log \Delta$.

The first term approximates the differential entropy, while the second term is approximately − log(Δ). Note that this procedure suggests that the differential entropy of a discrete random variable should be $-\infty$.

Note that the continuous mutual information I(X;Y) has the distinction of retaining its fundamental significance as a measure of discrete information since it is actually the limit of the discrete mutual information of partitions of X and Y as these partitions become finer and finer. Thus it is invariant under quite general transformations of X and Y, and still represents the amount of discrete information that can be transmitted over a channel that admits a continuous space of values.

## Properties of differential entropy

• For two densities f and g, $D(f||g) \geq 0$ with equality if f = g almost everywhere. Similarly, for two random variables X and Y, $I(X;Y) \geq 0$ and $h(X|Y) \leq h(X)$ with equality if and only if X and Y are independent.
• The chain rule for differential entropy holds as in the discrete case $h(X_1, \ldots, X_n) = \sum_{i=1}^{n} h(X_i|X_1, \ldots, X_{i-1}) \leq \sum h(X_i)$.
• Differential entropy is translation invariant, ie, h(X + c) = h(X) for a constant c.
• Differential entropy is in general not invariant under arbitrary invertible maps. In particular, for a constant a, $h(aX) = h(X) + \log \left| a \right|$. For a vector valued random variable X and a matrix A, $h(A\mathbf{X}) = h(\mathbf{X}) + \log(\det A)$.
• If a random vector $\mathbf{X} \in \mathbb{R}^{n}$ has mean zero and covariance matrix K, $h(\mathbf{X}) \leq \frac{1}{2} \log[(2\pi e)^n \det{K}]$ with equality if and only if X is jointly gaussian.

## Example: Exponential distribution

Let X be an exponentially distributed random variable with parameter λ, that is, with probability density function $f(x) = \lambda e^{-\lambda x} \mbox{ for } x \geq 0.$

Its differential entropy is then $h_e(X)\,$ $=-\int_0^\infty \lambda e^{-\lambda x} \log (\lambda e^{-\lambda x})\,dx$ $= -\left(\int_0^\infty (\log \lambda)\lambda e^{-\lambda x}\,dx + \int_0^\infty (-\lambda x) \lambda e^{-\lambda x}\,dx\right)$ $= -\log \lambda \int_0^\infty f(x)\,dx + \lambda E[X]$ $= -\log\lambda + 1\,.$

Here, he(X) was used rather than h(X) to make it explicit that the logarithm was taken to base e, to simplify the calculation.

## Differential entropies for various distributions

In the table below, $\Gamma(x) = \int_0^{\infty} e^{-t} t^{x-1} dt$ (the gamma function), $\psi(x) = \frac{d}{dx} \Gamma(x)$, $B(p,q) = \frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q)}$, and γ is Euler's constant.

Table of differential entropies.
Distribution Name Probability density function (pdf) Entropy in nats
Uniform $f(x) = \frac{1}{b-a}$ for $a \leq x \leq b$ $\ln(b - a) \,$
Normal $f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$ $\ln\left(\sigma\sqrt{2\,\pi\,e}\right)$
Exponential $f(x) = \lambda \exp\left(-\lambda x\right)$ $1 - \ln \lambda \,$
Rayleigh $f(x) = \frac{x}{b^2} \exp\left(-\frac{x^2}{2b^2}\right)$ $1 + \ln \frac{\beta}{\sqrt{2}} + \frac{\gamma}{2}$
Beta $f(x) = \frac{x^{p-1}(1-x)^{q-1}}{B(p,q)}$ for $0 \leq x \leq 1$ $\ln B(p,q) - (p-1)[\psi(p) - \psi(p + q)] - (q-1)[\psi(q) - \psi(p + q)] \,$
Cauchy $f(x) = \frac{\lambda}{\pi} \frac{1}{\lambda^2 + x^2}$ $\ln(4\pi\lambda) \,$
Chi $f(x) = \frac{2}{2^{n/2} \sigma^n \Gamma(n/2)} x^{n-1} \exp\left(-\frac{x^2}{2\sigma^2}\right)$ $\ln{\frac{\sigma\Gamma(n/2)}{\sqrt{2}}} - \frac{n-1}{2} \psi\left(\frac{n}{2}\right) + \frac{n}{2}$
Chi-squared $f(x) = \frac{1}{2^{n/2} \sigma^n \Gamma(n/2)} x^{\frac{n}{2} - 1} \exp\left(-\frac{x}{2\sigma^2}\right)$ $\ln 2\sigma^{2}\Gamma\left(\frac{n}{2}\right) - \left(1 - \frac{n}{2}\right)\psi\left(\frac{n}{2}\right) + \frac{n}{2}$

Erlang $f(x) = \frac{\beta^n}{(n-1)!} x^{n-1} \exp(-\beta x)$ $(1-n)\psi(n) + \ln \frac{\Gamma(n)}{\beta} + n$
F $f(x) = \frac{n_1^{\frac{n_1}{2}} n_2^{\frac{n_2}{2}}}{B(\frac{n_1}{2},\frac{n_2}{2})} \frac{x^{\frac{n_1}{2} - 1}}{(n_2 + n_1 x)^{\frac{n_1 + n2}{2}}}$ $\ln \frac{n_1}{n_2} B\left(\frac{n_1}{2},\frac{n_2}{2}\right) + \left(1 - \frac{n_1}{2}\right) \psi\left(\frac{n_1}{2}\right) -$ $\left(1 + \frac{n_2}{2}\right)\psi\left(\frac{n_2}{2}\right) + \frac{n_1 + n_2}{2} \psi\left(\frac{n_1 + n_2}{2}\right)$

Gamma $f(x) = \frac{x^{\alpha - 1} \exp(-\frac{x}{\beta})}{\beta^\alpha \Gamma(\alpha)}$ $\ln(\beta \Gamma(a)) + (1 - \alpha)\psi(\alpha) + \alpha \,$
Laplace $f(x) = \frac{1}{2\lambda} \exp(-\frac{|x - \theta|}{\lambda})$ $1 + \ln(2\lambda) \,$
Logistic $f(x) = \frac{e^{-x}}{(1 + e^{-x})^2}$ $2 \,$
Lognormal $f(x) = \frac{1}{\sigma x \sqrt{2\pi}} \exp\left(-\frac{(\ln x - m)^2}{2\sigma^2}\right)$ $m + \frac{1}{2} \ln(2\pi e \sigma^2)$
Maxwell-Boltzmann $f(x) = 4 \pi^{-\frac{1}{2}} \beta^{\frac{3}{2}} x^{2} \exp(-\beta x^2)$ $\frac{1}{2} \ln \frac{\pi}{\beta} + \gamma - 1/2$
Generalized normal $f(x) = \frac{2 \beta^{\frac{\alpha}{2}}}{\Gamma(\frac{\alpha}{2})} x^{\alpha - 1} \exp(-\beta x^2)$ $\ln{\frac{\Gamma(\alpha/2)}{2\beta^{\frac{1}{2}}}} - \frac{\alpha - 1}{2} \psi\left(\frac{\alpha}{2}\right) + \frac{\alpha}{2}$
Pareto $f(x) = \frac{a k^a}{x^{a+1}}$ $\ln \frac{k}{a} + 1 + \frac{1}{a}$
Student's t $f(x) = \frac{(1 + x^2/n)^{-\frac{n+1}{2}}}{\sqrt{n}B(\frac{1}{2},\frac{n}{2})}$ $\frac{n+1}{2}\psi\left(\frac{n+1}{2}\right) - \psi\left(\frac{n}{2}\right) + \ln \sqrt{n} B\left(\frac{1}{2},\frac{n}{2}\right)$
Triangular $f(x) = \begin{cases} \frac{2x}{a} & 0 \leq x \leq a\\ \frac{2(1-x)}{1-a} & a \leq x \leq 1 \end{cases}$ $\frac{1}{2} - \ln 2$
Weibull $f(x) = \frac{c}{\alpha} x^{c-1} \exp\left(-\frac{x^c}{\alpha}\right)$ $\frac{(c-1)\gamma}{c} + \ln \frac{\alpha^{1/c}}{c} + 1$
Multivariate normal $f_X(x_1, \dots, x_N) =$ $\frac{1} {(2\pi)^{N/2} \left|\Sigma\right|^{1/2}} \exp \left( -\frac{1}{2} ( x - \mu)^\top \Sigma^{-1} (x - \mu) \right)$ $\frac{1}{2}\ln\{(2\pi e)^{N} \det(\Sigma)\}$