Map of life expectancy at birth from Global Education Project.

Friday, April 30, 2010

Now pay attention!

Because you're going to have to keep a couple of ideas straight -- concepts that are kind of similar but aren't the same. This part tends to confuse people so you've been warned.

So I'll start with what I hope is a fairly simple idea that will be a half step toward the slightly harder one. If we don't know anything about a population, and then we take a sample from it, the sample is our best estimate of what the population is like. We know the mean and standard deviation of the sample, but we don't know for sure how similar that is to the population. But suppose we do know the distribution of a population, and we want to know if a sample somebody gave us is likely to have come from that population or not.

Suppose we had the medical records of of a whole lot of children, 1,795 of whom had asthma, and .55 were male and .45 were female. We happen to know that .51 of all children are male, and .49 of all children are female. (It’s true! There are slightly more boys born than girls. However, the men start to die off so eventually there are more women than men.) So the question is, does this mean that boys are more likely to have asthma than girls? (These numbers are just made up by the way. I probably shouldn't have used a real disease.)

What we need to do in this case is compute a number called "Z". I don't know why they call it Z, they just do. This is the distance of an observation from the mean, measured in standard deviations.

In this case, the standard error for a sample of 1,795 children of whom .51 are male is approximately .012. (Actually it's 0.011799154872714672808505450202002, but close enough.) How do I know that? Because the standard deviation of a binomial distribution, as you may recall from last time, is the square root of P*Q/n. P is .51, and Q is .49. n is 1,795. Get out your calculator and try it! The difference between .55 and .51 is .04, which is more than 3 times .012. In other words, the observed outcome is more than 3 standard deviations from the expected outcome. The probability of this happening is so small – less than .001 – that it is extremely unlikely that boys have the same risk of asthma as girls. We say in this case that Z= 3.3333 etc. Z is also called the standard normal deviate.

Yup, it's normal and it's standard and it's deviate, all at the same time. What's the world coming to?

But notice something else, which is also very important - the difference is not very large. It's highly unlikely to arise just by chance, because we have a large sample. With big samples, you can detect small differences. But that doesn't mean they necessarily matter very much.

Anyway, hang onto that idea -- how many standard deviations are you from the mean? -- because it is very useful.

No comments: