Map of life expectancy at birth from Global Education Project.

Monday, April 19, 2010

Probability and Statistics 101: Pre-lesson lesson

This isn't even the first installment in the curriculum. It's something I'm going to hand out to read before the first class. Many of you will find it completely unnecessary. Some may feel insulted by it, but please don't be: lots of people out there actually, truly need to read this. If you feel you don't, read it anyway as a lesson in what lots of people actually need to read. If you are matho-phobic or hate math or think you just can't possibly understand it, this will help. I hope.

In public health science, and for that matter in most fields of scientific inquiry, we frequently talk about proportions, rates or percentages.

These all mean exactly the same thing!

When people speak of proportions, there is a tendency to represent them as decimals, like this: .9 (pronounced "point nine.") When people speak of rates, they are probably a bit more likely to represent them as fractions, like this: 9/10 (pronounced "nine tenths," or "nine out of ten.") To get the percentage, you just multiply by 100: 90% (pronounced "ninety percent.")

They all mean exactly the same thing! There is no difference at all. I would not be doing any violence to the English language or to mathematics to speak of a proportion of .9, a proportion of 9/10, or a proportion of 90%. There is no difference at all, not even a little tiny difference. None.

For some reason, when I show people a percentage -- 90%, 87%, whatever -- they usually say that's fine, they get it. But when I show them .9, or .87, the very same people often say they don't get it, they can't possibly understand it, it's math and it's just too confusing. That is like saying you understand me perfectly well when I say a Ford is a car, but it's impossible to understand or follow what I'm saying if I say a Ford is an automobile.

There's nothing to understand. There's nothing to "get." There is no deep meaning or special attribute of the symbol .9 that makes it any different from the symbol 90%. If you are at all confused, just keep in mind that the space just to the right of the decimal point is 10ths. .9 = 9 tenths = 9/10 = 90/100 = 90%. The next space is hundredths. .97 = 9 tenths + 7 100ths = 90 100ths + 7 100ths = 97 100ths = 97%. That's all there is to it. If you still think you don't get it, it's because you are convinced there must be more to it than that, that there must be something I'm not telling you, some deeper secret. There isn't.

Probabilities: Probabilities are a lot like proportions. If 90% or .9 or 9/10ths of the people in the room are right handed, and I pick somebody at random, the probability that person will be right handed is 90%, .9, or 9/10. We will often say "nine out of ten" for a probability but it means exactly the same thing to say .9 or 9/10 or 90%. It obviously saves space to write .9 and it makes calculations easier, so that's what I will do most of the time. If this bothers you, you can think "ninety percent" every time you see .9, and it will be just fine.

Formulas and equations: Mathematical formulas and equations give lots of people the heebie jeebies. As soon as they see one, they stop reading and they stop thinking. One reason for this, I think, is the convention, when writing about mathematical subjects, of first writing down the equation, and then telling the reader what the symbols mean. The result is that you'll see a bit fat equation sitting in the middle of the page, and you have no idea what it means, so you feel stupid. But remember, at this point, even Einstein has no idea what it means. You have to read the next sentence, where the person tells you what it means, and look up at the equation and back and forth between the equation and the sentence underneath it to get what all the symbols mean. For example, suppose I write this:

P(RH) = .9


You're baffled and you're mad at me because you don't know what that means. But all you had to do was wait for me to tell you. "P" means the probability of whatever term is enclosed in the parentheses right after it. "RH" means being right-handed. So P(RH)=.9 means "The probability of being right-handed is point nine" (or ninety percent if you prefer).

Why didn't I just say so in the first place? Because I want to use this idea more generally: I want to use this structure in arguments and equations where I'm not just talking about being right-handed and .9, but whatever phenomenon and whatever probability happens to apply. The equation is a handy structure that doesn't take a lot of space to write down and captures the ideas I want to talk about, not just the specific case.

So what I am going to do is first, discuss concepts qualitatively, in prose, with as little use of mathematical symbols as I can get away with. I'll only use mathematical symbols when I'm pretty sure it will make it easier, but we're going to be talking about mathematics so it will be impossible to avoid referring to mathematical concepts such as one thing being equal to another.

Then I'll use one or more specific examples and maybe do some arithmetic with them to show you how the idea works.

Then, and only then, I will present general formulas that embody the idea. If you don't like that part, you can skip it. It doesn't need to turn you off to the rest of it.

Any questions? Are we ready to begin?

No comments: