Module for March 31st
Welcome back!
Ok before we start, quick wellness check. Yes, we are really doing this. It’s important.
Many thanks!
The health of our class is the health of our classroom. I’ll do my best to start there every week. Ok onto the school stuff…
[Drumroll……….]
Introduction
My hope is that after last week, you’re thinking 2 things:
1) Wow, calculating t-tests is incredibly cool and I feel like an all-powerful statistics deity
2) Ok, but rewind… how did we do that…
It’s not magic! It’s math! We’ll unpack the logic with z-scores and z-tests, but know that t-tests operate under essentially the same logic (with a little fancy schmancy statistics thrown in)… Here we go!
Remember Z-Scores?
Yup! Those Z-Scores. The standardizations of raw scores that turn values into “locations” in a distribution in “standard deviation units.” Remember, the formula looks like this:
Zscore = [[Raw Score] - [Mean]] / [Standard Deviation]
What Do Dice Have to Do with Z-Tests
Z-Scores tell us about the location of a score in a distribution of scores. Z-tests are sort of like meta-versions of Z-scores. They tell us about the location of a sample within a population distribution. Like the dice, there are different probabilities of sampling different means. There is a higher probability of sampling a mean near the population mean than there is of sampling a mean very far away from the population mean, but we can never eliminate the possibility that our sample mean is very misleading. Instead we account for the possibility of that error.
So how, mathematically, do we evaluate “where” a sample mean is in a population? Check this out.
The Z statistic looks like this:
It’s been a long time since we talked about the standard error of the mean… so let’s revisit that as well:
We started by wanting to measure this:
“How much is my sample like my population?”
So at first we might think… let’s just subtract the sample’s mean from the population’s mean:
That gets us here…
But then we might think: We should probably account for the fact that we have some error here, because some populations vary from their mean more than others. And some samples are larger than other samples. And you remember that the standard error of then mean is a proportion of those two things already! Remember?
Pull them together and…
Another way to think about this is:
Which makes sense, if you think about it. Because it’s exactly what it claims to be:
A measurement of a sample mean’s distance from the population mean, accounting for how much values in the population tend to spread out from the mean, and accounting for how much of the population you’re sampling.
Take that, Jackie’s first stats textbook author! (Just kidding, he’s actually a really nice guy, I was just 18 and thought it was cool to smoke cigarettes and not do my homework. He was right, I was wrong. Do your homework. Quit smoking).
So how do we use Z-Statistics?
The Z test computation just gives us a value which allows us to establish whether mean A (our sample mean) is dramatically different from mean B (our population mean).
Let’s look at an example. Let’s imagine that we are sampling from the BC population. Let’s say the university-wide mean height is 5’5” (so 65 in), with a standard deviation of 3 inches. We sample our class, 60 people. And we find that the mean height of our sample is 5’6” (66 in).
Can you calculate the z-statistic?
(Not for credit, just for practice)
If you calculated correct, you should get:
z = ~2.4
Time to interpret…
Let’s consider our curve of possibilities again.
Remember this? Our distribution of possibilities given that we select random samples from a population with a given mean? We expect 95% of our possible sample means to fall between 2 SDs above and 2 SDs below the mean.
And technically, this looks one of two ways, depending on whether it’s a one or two tailed test…
You can see that in both diagrams, we have a representation of the fact that we have a 95% likelihood that our sample mean lands in the green area, and 5% likelihood that it lands in the red. (In the case on the left, the two-tailed case, you have 2.5% of the possibilities on the left tail and 2.5% on the right). This is where z-scores arrive. Z-scores tell us whether we are landing inside or outside the happy green area. Our z-score is a marker of the location of our sample mean in this distribution. And to make sense of it, we need to know what that threshold is…
We compare our Z-calculation to the Z-score that’s “probabilistically expected” to mark the line between the red and green zones. Sometimes we call this the “critical value.” For a 2 tailed test, this is +/- 1.96 above or below the mean. In a 1 tailed test, this threshold is 1.645 (either above or below the mean, depending on which tail).
This is a 2 tailed test so…
If the absolute value of our obtained z-value is greater than the critical value, we reject our null hypothesis.
We can compare our z-calculation to this threshold:
2.4 is greater than 1.96, so the absolute value of our obtained value is greater than our critical value, and we can reject our null hypothesis.
Let’s check our hypotheses:
We hypothesized that the sample mean was not equal to the population mean.
The null hypothesis was is that the sample mean is equal to the population mean.
We can reject the null hypothesis. The sample mean is not representative of the population.
You can see how we might use the same logic to compare groups with a bivariate t-test (as we did in Stata last week). We can similarly calculate whether the group means are meaningfully different from one another! More on this later.
T-Tests
T-tests allow you to have an unknown population variance … So we learned t tests first, because we can calculate them using the GSS (where we do not have population variance information). They use essentially the same logic though. (More on this next week)
Ok so what we just completed is the essence of how we do “significance” testing. But that’s only about half of the information we’re generally retrieving when we make a statistical inference. Yes, it’s helpful to know whether our results are significant, but what are the results?! In a t-test specifically, we are not just interested in the statistical significance of an effect, but strength of that effect! This is where we’re headed next.
Generally we are measuring significance… and the intensity of the effect, or the effect size.
One way to calculate size is the effect size between a sample distribution and a population distribution:
A small effect size is 0-.2
A moderate effect size is .2-.8
And a large effect size is .8 +
When we were comparing 2 groups using a T-test last week, we obtained a significance level (which we discussed) but we also obtained an effect size (which we have not yet discussed). These are calculated like this:
T-Tests in Stata offer us both effect sizes and significance tests. Our activity this week will examine both.
No activity this week! Tune in Thursday for a T-Test 2.0 tutorial!