Information Theory and the Midnight Ride of Paul Revere

| June 10, 2013

What is Information?

Early Mechanical Telephone Switch Room

Early Mechanical Telephone Switch Room

In the mid 2oth century this was an important question because of the rapidly growing radio, television, and telephone systems all around the world.   The problem facing the telephone company was a logistical one of figuring out how the fewest numbers of wires and other resources could be shared for maximum utilization without anyone going wanting waiting for a shared line to free up.    “Three barbers no waiting”, if you will, but not a separate barber for every person in town.

To solve these problems, the question of “What is information” had to be answered before they could answer the question, “How much information can we move with X amount of wires, relays, telephone poles, and transatlantic cable?”.    The timing was fortuitous because who better to turn to but those information pioneers who applied modern mathematical techniques to information and help win a war.  And did so with an economy of scale that was moving from human labor to some kind of automation.

Claude Shannon

Claude Shannon

One of those information workers was the brilliant mathematical prodigy and electrical engineer, Claude Shannon, who was employed along with some other information pioneers at Bell Labs in New Jersey.  This was  Bell Telephone’s advanced research facility that brought you the first computers and the transistor among other things.   Shannon set out to quantify information from a communications point of view in hopes of turning what was a “rule of thumb” kind of engineering into a rigorous mathematical discipline.   But you can’t do math on something until you define it precisely.   Rather the recounting Shannon’s exact steps in his process of discovery, let’s turn to a much earlier war with an example of the need for information that everyone (at least every American) learns as a schoolboy.    I will take some liberties with the story in order to illuminate how Shannon answered the question.


Paul Revere's Midnight Ride

Paul Revere’s Midnight Ride

On the night of April 18th,  around 9:30 PM, Paul Revere is standing by on horseback on the outskirts of colonial Boston.    It is known that the  British are about to invade the city so citizens all throughout the surrounding towns are standing by ready to take arms and defend the city.  The big question at the moment is are the British planning a naval invasion with ships, or will they be invading with troops on land.  The answer to this question will make all the difference in how the citizens will respond effectively to the attack.

As the story goes, Paul Revere is positioned with a view of the tall steeple of the Old North Church.  In the steeple is a scout who will be able to see the British advancing whether it be by land or sea.   The scout will be able to signal Paul Revere so he can ride through the surrounding towns giving the proper alarm.

(Now here is where I take some license).  Let us imagine the conversation a few days before hand between Paul Revere and the scout as they figure out how to transmit this information to Paul from the church steeple.     They are quietly working out this plan over a few pints of Sam Adams Ale.

Flag SemaphoreLet us suppose that they decide that they will use a form of flag semaphore and a secret code so Paul will get the information without the British being able to understand it if the see it.   So they decide that the scout will spell out the words to  Shakespeare’s Hamlet by flag semaphore.    And if the British are coming by land, the scout will send the last line of the play backwards, and if by sea he will send it as written by Shakespeare.     Satisfied that they have a good plan, they turn to their pewter mugs of ale.

Let’s stop here and talk about how this relates to Shannon’s discoveries about information.   Shannon would say that this is typical of all transfers of information.  What is required is a sender, a code, a medium of transmitting the code, a decoder, and a receiver of the code.      The goal of all of this is to send a message containing information.

As we can see, the scout is the sender and the encoder, and Paul Revere is the decoder and receiver.    And the medium of transmission is visible light.   The important take-away at this point is that Shannon says that all information is conveyed in the form of some code or other.    It could be a secret code, or something in plain English, a sequence of numbers, trumpet blasts, a shot fired from a gun and so forth.

Rejoining our colonial heroes:

But then the scout says, “Wait a minute.    Sending the entire text of Hamlet by flag semaphore will take at least an hour or more.  By the time I get to the last line, the  British will already be sacking the city.   We need a better plan.”    Paul Revere looks up and says, “Oh hey.   Why not skip the entire play except for the last line.   Send it backwards for a sea invasion, and forwards for a land invasion.”    And so the go back to their ale.

Shannon would say that there are two important concepts at this point.  One is that the message, the code, and the information are three different things.    Information is carried by a code within a message.  But the message might have a lot more stuff in it than required to send the information.     The trimming of the code down from the complete text of Hamlet to just the last line of Hamlet has reduced the size of the message, for sure.  But it is important to realize that the new message format can still convey the same information.      Same code, same information, smaller message.

remote_download1The next think Shannon would point out is that whatever stuff is in the message, a given message will take a finite amount of time and resources to transmit.    This is not simply a matter of a bad choice of transmission medium, it is a fundamental aspect of the universe.     You can choose a faster medium but you cannot send a message infinitely fast.    Paul and the scout have figured this out and did some work to minimize the size of the message by changing the format of the code.   It still requires english letters sent by flag semaphore, but being a smaller message, it will take less time.

Why this is fundamental is that it demonstrates the physical significance of what we are dealing with.   You can begin to see that information theory is more than an abstract notion.  It is as fundamental to the universe as energy, entropy, force, motion, distance, and time.     Your Internet provider is charging you more for a high speed connection because it uses something that is expensive.  Its not just an arbitrary toll for using the information superhighway.

Paul then looks up from his ale and says, “Wait a minute.  We are going to be doing this at night, so I won’t be able to see the flag semaphores from the edge of town. “.   The scout thinks for a moment and says, “Oh, no problem.  I will use lantern semaphores like we do from ship to ship at night.”     But Paul says, “That won’t work.  I don’t read lantern semaphores.  I’m am a silversmith, not a seaman.”    So they go back to their ale, and ponder the problem.    Suddenly the scout looks up and says, ,”Wait, I have it.    The only thing I need to signal to you about is whether the British are coming from land or sea.   We don’t really need to send real sentences or English letters sent in semaphore,  We can make up anything we want.”

Here, Shannon would say this is correct.   The actual message and the code is arbitrary.  What matters is that the information can be conveyed, and that the receiver can decode it.  This is why the English letter flag semaphores can be replaced by something that is not English, not flags,  or a well known language at all.

The Lanterns

The Lanterns

So the scout says, “When I see the British I will simply light my lantern.  When you see the light you ride.”   Paul says, “But how will I know if it is by land or sea with just a single light?”   The scout says, “Oh yeah.    Suppose I  light the lantern once if by land and light it twice if by sea.”   Paul says, “I might miss the first flash, though, if I wasn’t staring at the right spot with no interruption.   Why not bring two lanterns, and light one if by land, two if by sea and don’t blow them out for a while.”   The scout says, “Perfect, that’s a plan.”

Here, Shannon would point out that although a message can streamlined down to that which can convey the information, there is a lower limit on the minimum size of a message that can convey a given amount of information.   As you probably guessed as quickly as Paul Revere did, that a single light is not a rich enough message format to convey all that needs to be conveyed.   If you reduce the code and message down to something too small, you will suffer information loss.    And now you can see Shannon getting closer and closer to something that we can pin down as information.   If there is a minimum size to a message for a given piece of information, that must mean there is something fundamental about information that we can grab onto and make into a measurable mathematical quantity.

shannon_comm_channelAnother thing that Shannon was motivated to include in his analysis is the notion that no communication channel is 100% reliable.    Paul Revere noticed that when he realized that he might miss the one or two quick flashes of one lantern, so he asked for one or two lanterns to be lit and sustained so as to reduce the possibility of error.   So sometimes the code and the message format might need to be chosen so as to compensate for the possibility of transmission error.    It is a very real practical problem that must be dealt with, but at this point it will distract us from getting to the essence of information.


Taking a cue from one of his mentors, George Boole, he realized that the smallest unit of information is a “bit” which is the simple proposition of “true” or “false”, or “yes” or “no”.     And Boole had already formulated that into what we call Boolean Logic which is the mathematics and logic behind using ones and zeros to store information and transmit information.   Notice that one lantern can only convey either true or false depending on whether it is lit or not.    That is a single bit of information.    If you want to convey something more, you need more bits than simply the single proposition “true” or “false”, you need more bits.   So in our heroes’ case, either flash the lantern in some kind of code, or use more than one lantern, or all of the above.

So how do we model the number of bits needed to transmit a given amount of information?   The way to do that is to lay out what are the expectations for what the receiver is anticipating.    In Paul Revere’s case we might say that it is as follows for what Paul wants to know at any given momente

  1. The British are not yet coming.
  2. The British are coming now.
  3. The approach is by land.
  4. The approach is by sea.

So the question is, what is the minimum number of bits needed to convey that information?   One bit won’t do it, as that is like lighting a single lantern.    We either need more bits or less possibilities.   One thing is that we can notice that #2 is redundant in that it can be conveyed by #3 or #4.     So it reduces to:

  • The British are not yet coming. (no lantern)
  • The British are coming now by land. (one lantern)
  • The British are coming now by sea. (two lanterns)

We can see that there are three of them, so a single lantern for true or false still won’t do it.  And since we can’t add anything smaller unit than a bit, we have to concede that we need two bits in order to convey this information.  This is why Paul and the scout end up with either two flashes or two lanterns, with each  lantern (or each flash) representing one bit of information.

Personally, I have a problem with this scheme, but since it turned out ok, my concern is academic.  But it speaks to something pretty important in terms of reducing uncertainty (which is the goal of sending information.)     Notice that when the British are not yet coming, no lanterns are lit.   Its meaning is denoted by an absence of a message.   However, when no lanterns are lit, we don’t know if the British have not been sighted yet or perhaps the scout fell asleep in the belfry.   We are using an absence of a message to indicate one of the possibilities.   We don’t know if seeing no lantern lit is evidence of absence of the British, or absence of evidence for whether the  scout is awake or not.     To solve that problem, we might ask the scout to flash one lantern on and off at intervals of two minutes so we know he is awake.    Then when the British are sighted, light two lanterns for land, and three lanterns for by sea, and leave them lit for a while.   Notice that this would require another lantern.  Shannon would say that  more bits might be required than the theoretical minimum so as to compensate for the low quality of the channel.   Let’s set that aside, as its not consistent with the story.

 So as Paul Revere are walking out of the tavern, Paul says, “Ok, so let’s make sure we have this straight.   I will watch the church tower for a signal.  No lanterns mean no British.   One lantern means British by land, and two lanterns mean British by sea.”   The scout says “Correct!”.  And they wander off home.

And to this Shannon would say that information is closely associated with uncertainty.  So in a qualitative sense, information creates surprise.   And the information content is related to “surprise” if we define surprise as the reduction of uncertainty.  The more we are surprised by the information, the more it has eliminated uncertainty.   If that sounds vague and unscientific, he would go on to define information uncertainty as uncertainty can be determined by counting the number of possible messages.

In our story, we see that Paul Revere is expecting one of three possible messages.     And if each of the three messages are equally likely, each message reduces the uncertainty by the same amount.  So each of the three messages would have the same quantity of information.     This means that you could receive a message that is completely intelligible that had no information content by this definition.

For example, suppose your name is Bob, and you get a phone call from your brother who says, “Your name is Bob.”  and then hangs up.    Notice that this is a message, it is conveyed in a code (spoken English words), and it took a finite amount of time to transmit.  If it was a cell phone call, it truly was encoded in a stream of bits that were sent over the cell phone system.     But also notice that it has reduced no uncertainty in you on hearing it.    Shannon would say that this is a coded message with information content of zero.     Or another way of saying it is that the number of bits needed in a message to reduce your uncertainty about your name is zero bits, because you have zero uncertainty about your name.

And now for something really surprising.   Notice that the message from your brother has meaning.   Any English speaking human hearing a recording of that phone call would know what your brother meant by those spoken words.    The surprising thing is that we just demonstrated that by Shannon’s definition that phone call message has zero information content and yet we can surely say that the message has meaning.  Shannon would say that meaning is not relevant in the quantification of information.  So now we see that messages, codes, information, and meaning are all different things.

It might be surprising to hear that meaning is not part of information theory.  The reason for that is that meaning is what humans bring to information once we receive it.     The only important part of what gets transferred is that which reduced uncertainty on the receiving end.  This is why although the lantern message received by Paul Revere is burdened with tremendous meaning for him and his fellow countrymen, all it took is the lighting of one or two lanterns to reduce Paul Revere’s uncertainty about the message itself.  Paul brings the rest of the meaning to the message himself.   Since Paul is at the receiving end, the meaning of the message does not have to be sent by lanterns.

Lest you think that Paul Revere’s case is too simple to represent something like someone’s love letter to their sweetheart, don’t forget that the love letter is just a small number of symbols on a piece of paper.    The sweetheart brings a lot of meaning to the receiving end.  What counts in the symbols on the letter is only that which reduces her uncertainty.   A single rose might accomplish that.

Since meaning is not part of the quantification of information, human understanding  of a message is also not relevant to the science of information theory.   Shannon says that the producers and the consumers of information can be human or any kind of artificial or naturally occurring system.   This surprising claim is made possible by the fact that Shannon defines information as only related to the reduction of uncertainty among the number of possible messages.

home thermostatConsider a home heating and air conditioning system in the basement with a thermostat on the living room wall.     The thermostat is set up to send one of three possible messages, depending on the temperature in the room:

  1. Off
  2. Heat
  3. Cool

Shannon would say that the information content in any one of these messages is exactly the same as the Paul Revere example we used above.    They both have three possible messages, therefore each message reduces the uncertainty by the same amount.   In each case, the same minimum number of bits is required in a perfect channel.   Why is this important to Shannon?   Because communication systems are used for human communications, for sending telemetry readings, and and for systems controlling other systems.     And the important thing about information is how many bits it takes to send to reduce the uncertainty at the other end.  So one has to realize that terms like “uncertainty” and “surprise” are meant to represent mathematical or statistical properties that can be quantified and measured.   They are metaphorical so human understanding, meaning, or the human experience of surprise is not what we are talking about.  The use of those terms is metaphorical.

Some other examples of information transfer:

  • Humans talking in a restaurant.
  • Humans reading a novel or a newspaper.
  • A human walking in the woods using a compass is receiving a coded message from the Earth’s magnetic field about the relative direction in which he is facing.
  • Birds navigating during migration using information they obtain from both the Earth’s magnetic field and the orientation of the Sun or the constellations.
  • Dogs obeying human commands.
  • Dogs following a scent.
  • Honeybee scouts returning to the hive to dance out information about the location of wild flowers.
  • Smoke from a wildfire causing animals to flee.
  • Evolution modifying the contents of DNA.
  • DNA directing the synthesis of proteins.

The list goes on forever.  Information is streaming all around us in nature and between humans as if it was all part of the movie, The Matrix.

So in summary, Shannon has quantified information in such a way that we has affected our communications and computer technology profoundly.    Information is communication, and only that part of a message that reduces uncertainty counts as information.   Uncertainty is precisely defined in relation to the number of possible messages that can be sent.    Where there are a million possibilities, the uncertainty at the receiver is high before the message is sent, and where there is only one possibility, the uncertainty is zero, requiring no message at all.     The smallest piece of information is one bit, which can signal one of two possibilities.   So all information that is sent or stored requires some amount of physical resources of either material or energy to represent the number of bits it requires.



Tags: , ,

Category: Science and Evidence

About the Author ()

Comments are closed.