In 2016 we are marking 100 years since the birth of Claude E. Shannon, the man who in 1948 established the area of information theory, which is both a methodology and inspiration in various scientific and technological endeavors. One of his main contributions is the concept of a capacity of a communication channel. This is not the easiest concept around and the scientific literature has witnessed many instances of misunderstanding of the operational notion of channel capacity. This year I am teaching part of an introductory course in information theory and I tried to make a very simple example, with minimal mathematical content and intentionally not related to electronic communication in order to stress the (much) wider significance of Shannon’s theory.

Think of a pre-telecommunication era and of an emperor that wants to send a message to his general through messengers. Each messenger needs to use a horse to traverse the enemy territory in order to get to the general. However, there is a chance, denoted by p, that a messenger gets captured by the enemy. For example, let p=0.1 i.e. the chance of capturing the messenger is 10%. This statistical figure means that, if the emperor sends a large number of, say N=1,000,000 of messengers, then *most probably* only 90% of them will arrive to the general, or in absolute terms, most probably around 900,000 will traverse the enemy territory successfully. Which strategy should the emperor use to increase the probability that his messages will get through to the general, although any messenger can potentially be captured by the enemy?

Unlike many of today’s politicians, the emperor of our example is smart and does not want to write the message on a paper, since any of the captured messengers will reveal the whole message to the enemy. So, the emperor uses another strategy. Suppose that each messenger can remember a sequence of e.g. M=7 letters. The emperor wants each messenger to remember 7 letters of the message, such that the general can reconstruct the message from all messengers that manage to cross the enemy territory. We note that, in order to reconstruct the message, the general needs to put the letters of the messengers in the correct order. Therefore, each messenger also knows his ordinal number: the first group of 7 letters are from messenger #1, the second group of 7 letters are from messenger #2, etc. On the other hand, the enemy cannot reconstruct the message by capturing a single, or even several messengers – and since the message is not written on a paper, the captured messenger can always lie about the 7 letters that s/he was supposed to remember. It is noted that here we are not dealing with the problem of *secure *communication, i.e. protecting the data from being acquired by the enemy (although that was also formalized mathematically by C. E. Shannon in 1949). Instead, we are dealing with the problem of *reliable *communication, i.e. how to ensure that the general gets the message of the emperor with very high probability.

The trouble that the emperor has is that he does not know in advance which one of the messengers will be captured. However, he knows the *law of large numbers*: random fluctuations start to even out when one observes a large population of messengers. This is the same law that says that, if you roll a die 6,000 times, then most likely you will observe around 1,000 1s, around 1,000 2s, etc. The emperor knows that, if the chance that a messenger is captured is 10% and if he sends N=10,000 messengers, then around 9,000 will arrive to the general. Since each messenger carries 7 letters, the general will receive 7*9,000=63,000 letters from which he can try to reconstruct the message. However, we repeat here that the emperor does not know in advance which messengers will arrive to the general. Hence, the general should be able to reconstruct the message if the 9,000 messengers that are actually arriving are #1, #3, #6, #7, #8, #10, …, but he should be able to reconstruct the *same *message if the arriving messengers are #2, #3, #4, #7, #8, … or, in fact, any combination of 9,000 out of 10,000 messengers.

We now come to the main point. Shannon has proved that there exists a way for the emperor to encode 63,000 letters into a large number of 70,000 letters and tell 7 of those letters to each of the 10,000 messengers, such that if each messenger has a chance of 90% (or more) to pass through the enemy territory without being captured, then with very high probability, the general can reconstruct the original message of 63,000 letters. Shannon did not provide a way to actually code the 63,000 letters into 70,000 letters, but made a mathematically powerful statement that a code with that capability must exist.

Finally, the capacity of a channel. If 63,000 letters arrive at the general after being coded and sent through 10,000 messengers, then the information transfer is 6.3 letters per messenger. In our example this is the *capacity of the channel *and stands for the maximal amount of information transfer per single carrier of information, i.e. messenger. If 10,000 messengers are used and the original message has 63,000 letters *or less*, then the general will reconstruct the message almost surely. On the other hand, communication above the channel capacity is unreliable: if the number of messengers is kept to 10,000, but the original message has more than 63,000 letters, then the general will almost certainly not be able to reconstruct the message.

At the first glance this is counterintuitive, as we tend to look at one messenger and think of him/her as an unreliable carrier of information. But Shannon showed that it is possible to send a message reliably by using many unreliable carriers of information, as long as the number of messages is lower than what is dictated by the channel capacity.

Information-theoretic purists may be horrified by the brutal lack of rigor in this example, but I hope it introduces the basic idea. And I think that the ingenious work of Shannon deserves to reach to a broader audience.