====Introduction====
Classical entropy quantities are a way to quantify how much information is revealed in a random event. Shannon first introduced a way to quantify information by associating to an event occuring with probability p, an amount of information -\log p (as is standard in information theory, logarithms are taken in base 2 and so information is measured in bits). If an unlikely event occurs, one gains a large amount of information. The Shannon entropy is the average amount of information gained by observing the outcome of an event. In some cases, in particular ones relevant to cryptography, the average amount of information is not a good quantity, since it assumes many independent repetitions of the same experiment.
====Random Events====
When we discuss random events, we assume that they occur according to
a pre-defined ensemble of possible outcomes of that event, and
associated probabilities. Event X is a single instance drawn from
the ensemble \{0,1,\ldots,|X|\} with probabilities
\{p_0,p_1,\ldots,p_{|X|}\}. We call this probability distribution
P_X. The terminology X=x refers to a single instance drawn from
this distribution taking the value x. One similarly defines
distributions over more than one random variable. For instance,
P_{XY} is the joint distribution of X and Y, and P_{X|Y=y} is
the distribution of X conditioned on the fact that Y takes the
value Y=y.
====Shannon Entropy====
It was Shannon who pioneered the mathematical formulation of
information. In essence his insight was that an event
that occurs with probability p could be associated with an amount of
information -\log p. Consider many independent
repetitions of random event X. The average information revealed by
each instance of X is given by the Shannon entropy of X defined as
follows.
The '''Shannon entropy''' associated with an event x drawn from random
distribution X is
H(X)\equiv\sum_{x\in X}-P_X(x)\log P_X(x).
Likewise, one can define conditional Shannon entropies. H(X|Y=y)
denotes the Shannon entropy of X given Y=y. It measures the average
amount of information one learns from a single instance of X if one
possesses string y\in Y, where X,Y are chosen according to joint
distribution P_{XY}. One can average this quantity to form
H(X|Y), the conditional Shannon entropy.
The '''conditional Shannon entropy''' of an event X given Y is defined by
H(X|Y)\equiv\sum_{x\in X, y\in Y}-P_Y(y)P_{X|Y=y}(x)\log P_{X|Y=y}(x).
This leads one to define the '''mutual Shannon information''' between X
and Y by
I(X:Y)\equiv H(X)-H(X|Y)=H(Y)-H(Y|X).
In some sense, this is the amount of information in common to the two strings X and
Y.
Shannon information was first used to solve problems of compression,
and communication over a noisy channel, as given in the following
theorems.
'''Source coding theorem :'''
Consider a source emitting independent and indentically distributed
(i.i.d.) random variables drawn from distribution P_X. For any
\epsilon>0 and R>H(X), there exists an encoder such that for
sufficiently large N, any sequence drawn from P_X^N can be
compressed to length NR, and a decoder such that, except with
probability <\epsilon, the original sequence can be restored from
the compressed string.
Furthermore, if one tries to compress the same source using R
bits per instance, it is virtually certain that information will be
lost.
For a discrete, memoryless channel, in which Alice sends a random
variable drawn from X to Bob who receives Y, the ''channel capacity'' is defined by C\equiv \max_{P_X}I(X:Y).
'''Noisy channel coding theorem :'''
Consider Alice communicate with Bob via a discrete memoryless channel
which has the property that if Alice draws from an i.i.d. source X,
Bob receives Y. For any \epsilon>0 and R, for large enough
N, there exists an encoding of length N and a decoder such that
\geq RN bits of information are conveyed by the channel for each
encoder-channel-decoder cycle, except with probability <\epsilon.
Notice that in the noisy channel coding theorem, the channel is
memoryless, and Alice has an i.i.d. source. In other words, all
uses of the channel are independent of one another. This is the
situation in which Shannon information is useful. However, in
cryptographic scenarios where the channel may be controlled by an
eavesdropper, such an assumption is not usually valid. Instead, other
entropy measures have been developed that apply for these cases.
====Beyond Shannon entropy====
Rényi introduced the following generalization of the
Shannon entropy.
The '''Rényi entropy of order \alpha''' is defined by
H_{\alpha}(X)\equiv\frac{1}{1-\alpha}\log\sum_{x\in X}P_X(x)^{\alpha}.
We have, H_1(X)\equiv\lim_{\alpha\rightarrow 1}H_{\alpha}(X)=H(X).
Two other important cases are H_0(X)=\log|X| and
H_{\infty}(X)=-\log\max_{x\in X}P_X(x). A useful property is that, for
\alpha\leq\beta, H_{\alpha}(X)\geq H_{\beta}(X).
H_0(X) is sometimes called the '''max entropy''' of X. It is important for ''information reconciliation'', which in essence is error correction.
H_{\infty}(X) is sometimes called the '''min entropy''' of X.
It is important for ''privacy amplification''. There, the
presence of an eavesdropper means that it no longer suffices to
consider each use of the channel as independent. The min entropy
represents the maximum amount of information that could be learned
from the event X, so describes the worst case scenario. In a
cryptographic application, one wants to be assured security even in
the worst case.
In general, Rényi entropies are strongly
discontinuous. As an example, consider the two distributions P_X and
Q_X defined on x\in\{1,\ldots,2^n\}. Take
P_X(1)=2^{-\frac{n}{4}}, P_X(x\neq
1)=\frac{1-2^{-\frac{n}{4}}}{2^n-1}, and Q_X to be the uniform
distribution. The difference between min entropies is
\frac{3n}{4}. In the large n
limit, the two distributions have distance \approx 2^{-\frac{n}{4}},
which is exponentially small, while the difference in min entropies
becomes arbitrarily large.
Smoothed versions of these
quantities have been introduced which remove such discontinuities. In
essence, these smoothed quantities involve optimizing such quantities
over a small region of probability space. They have operational
significance in cryptography in that they provide the relevant
quantities for information reconciliation and privacy amplification.
The conditional versions of these smoothed entropies are relevant in cryptography, hence we provide a
definition of these directly.
For a distribution P_{XY}, and smoothing parameter \epsilon>0, we
define the following '''smoothed Rényi entropies: '''
\begin{array}{ccc}
H_0^{\epsilon}(X|Y)\equiv\min_{\Omega}\max_y\log|\{x:P_{X\Omega|Y=y}(x)>0\}|\\
H_{\infty}^{\epsilon}(X|Y)\equiv\max_{\Omega}\left(-\log\max_y\max_x P_{X\Omega|Y=y}(x)\right),
\end{array}
where \Omega is a set of events with total probability at least
1-\epsilon.
More generally, the smooth Rényi entropy of order \alpha can be
defined. However, up to an additive constant these equal
either H_0^{\epsilon} (for \alpha<1) or H_{\infty}^{\epsilon}
(for \alpha>1). It is
also worth noting that for a large number of independent repetitions
of the same experiment, the Rényi entropies tend to the Shannon
entropy, that is,
\lim_{\epsilon\rightarrow 0}\lim_{n\rightarrow\infty}\frac{H_{\alpha}^{\epsilon}(X^n|Y^n)}{n}=H(X|Y).
====Significance to cryptography====
'''Information Reconciliation'''
The task of information reconciliation can be stated as follows.
Alice has string X and Bob Y, these being chosen with joint
distribution P_{XY}. Alice also possesses some additional
independent random string R. What is the minimum length of string
S=f(X,R) that Alice can compute such that X is uniquely obtainable
by Bob using Y, S and R, except with probability less than
\epsilon?
Renner and Wolf denote this quantity
H_{enc}^{\epsilon}(X|Y). It is tightly bounded by the relation
H_0^{\epsilon}(X|Y)\leq H_{enc}^{\epsilon}(X|Y)\leq
H_0^{\epsilon_1}(X|Y)+\log\frac{1}{\epsilon_2},
where \epsilon_1+\epsilon_2=\epsilon.
It is intuitively clear why H_0^{\epsilon}(X|Y) is the correct
quantity. Recall the definition
H_0^{\epsilon}(X|Y)\equiv\min_{\Omega}\max_y\log|\{x:P_{X\Omega|Y=y}(x)>0\}|,
where \Omega is a set of events with total probability at least
1-\epsilon. The size of the set of strings x that could have
generated Y=y given \Omega is |\{x:P_{X\Omega|Y=y}(x)>0\}|.
Alice's additional information needs to point to one of these. It
hence requires \log |\{x:P_{X\Omega|Y=y}(x)>0\}| bits to encode.
Since Alice does not know y, she must assume the worst, hence we
maximize on y. Furthermore, since some error is tolerable, we
minimize on \Omega, by cutting away unlikely events from the
probability distribution.
'''Privacy Amplification'''
In essence, this task seeks to find the maximum length of string Alice
and Bob can form from their shared string such that Eve has no
information on this string.
This task can be stated more formally as follows. Alice possesses
string X and Eve Z, distributed according to P_{XZ}. Alice also
has some uncorrelated random string R. What is the maximum length
of a binary string S=f(X,R), such that for a uniform random
variable U that is independent of Y and R, we have S=U, except
with probability less than \epsilon?
Again, this quantity, denoted H_{ext}^{\epsilon}(X|Z), has been defined
and bounded by Renner and Wolf :
H_{\infty}^{\epsilon_1}(X|Z)-2\log\frac{1}{\epsilon_2}\leq
H_{ext}^{\epsilon}(X|Z)\leq H_{\infty}^{\epsilon}(X|Z),
where \epsilon_1+\epsilon_2=\epsilon.
The reason that this quantity is relevant follows from the definition of extractors, which are functions commonly exploited for privacy amplification. In essence they take a non-uniform random string, and some additional catalytic randomness, to form a smaller string which is arbitrarily close to being uniform. The length reduction in the string is bounded by its min-entropy.
==References==
#
C. E. Shannon,
Bell System Technical Journal
'''27''', 379 (1948).
#
A. Rényi, in
''Proceedings of the 4th Berkeley Symposium on
Mathematics, Statistics and Probability'' (1961),
vol. 1.
#
R. Renner and
S. Wolf, in
''Advances in Cryptology --- ASIACRYPT 2005'', edited
by B. Roy
(Springer-Verlag, 2005), vol.
3788, pp. 199--216.
[[Category:Entropy]]
[[Category:Classical Information Theory]]