Classical information was first defined rigorously by Claude Shannon. Information is equal to how much communication is needed to convey it. Roughly speaking, if one has a list of possible messages you might want to convey, then the information of the messages is how much communication is required to tell someone which of the messages from the list you wish to communicate.

We denote the messages by a random variable *X*. This is a list of messages {*x*_{1}, *x*_{2}, *x*_{3}...*x*_{m}} each one occuring with probability {*p*(*x*_{1}), *p*(*x*_{2}), *p*(*x*_{3})...*p*(*x*_{m})}. We denote this probability distribution by *P*_{X}.

Shannon showed that the number of bits needed to convey which message occurs is given by the Shannon entropy *H*(*X*) = − ∑*p*(*x*)log*p*(*x*). Essentially, he showed that if one has *n* messages, then one can compress this information onto a space of dimension just over 2^{nH(X)} such that with high probability the information is sent faithfully.

### Reference

- C. E. Shannon, A mathematical theory of communication
*Bell System Technical Journal*, vol. 27, pp. 379–423 and 623–656, (July and October, 1948)