Module for March 24th

Inferential Statistics & Probability

Ok. Before we were so rudely interrupted by COVID-19 we were talking about inferential statistics. I know. Big day. You’re all thrilled to be here. But really. Inferential statistics is the fun part of this class. You’ve done all the hard work to learn descriptive techniques, so now we can learn how to use a sample of data, to infer or predict things about a population.

Most of the inferential techniques we will learn in this class involve “testing” the probability that something is or is not true of a population, based on what we’re observing in a sample. So, for example, say you notice that a ton of people on your Instagram this week are talking about Tom Brady leaving the New England Patriots (truly, friends, these are dark times). And a ton are talking about some weird show on Netflix called “Love is Blind” where people marry total strangers. Maybe you take a few descriptive statistics…

You find that…

30% of the accounts you follow are talking incessantly about the Netflix show “Love is Blind” (Ugh. Jessica, Barnett will never make you happy, choose the 24 year old fitness instructor, he will love you forever)
- It appears that most of these Instagrammers are men
70% of the accounts you follow are expressing feelings about Tom Brady leaving the Patriots for the Tampa Bay Buccaneers (cool I feel FINE everyone feels FINE about this can somebody call Gisele can somebody call Kraft)
- It appears that most of these Instagrammers are women

And you wonder to yourself… what can we say about what the nation thinks, based on this data! It’s time for inferential statistics!

So let’s do some inferring of our own. Looking at your instagram, it would be reasonable for you to develop the following hypotheses:

“Women are more concerned about Tom Brady than men”

“Men are more concerned about “Love is Blind” than women”

These are great starts at hypotheses. So the first thing we’re going to do is a quick set of tests. We’re going to check in on reliability and validity first…

So is the measure from your Instagram reliable and valid? Think about it… Formulate an answer.

…

I’d say that it’s probably got decent reliability, a person with that opinion today would likely have the same opinion tomorrow.

Validity is pretty terrible though. People sharing concerns on Instagram isn’t the same thing as people having concerns. So not great. In the real world, we would not use this measure. But let’s play with it for a bit and pretend it’s valid.

“Women are more concerned about Tom Brady than men”

Let’s focus on this hypothesis. What would be our null hypothesis? Try to formulate it yourself.

…

Let’s try…

“There is no difference in concerns between men and women”.

This we can work with. Checking back in with our original hypothesis:

“Women are more concerned about Tom Brady than men”

Is this a directional hypothesis or a non-directional hypothesis?

…

Correct! It is directional. Because it hypothesizes the direction of the relationship, not just the existence of a relationship. If it were non-directional it would sound something like this:

“There is a relationship between gender and Instagram behavior”

Ok so we’ve now waded into territory where we are trying to make inferences about the entire country based on a sample (your Instagram). For simplicity’s sake, we are going to pretend that your Instagram is a random sampling of Americans. Which it is most certainly not. But let’s pretend.

Hypotheses & Null Hypotheses

Ok so now the fun part… pay attention to this sentence, it’s the whole ballgame…

Our confidence in reporting the existence of a relationship between gender and instagram behavior, can only be based on probability — the likelihood that something is true based on the information you are certain about. In this case, you were certain about your own instagram data, and you wanted to make inferences about the country at large.

Normality

Ok so let’s look at a curve that’s normally distributed. Height is a trait that’s normally distributed so we will look at women’s height specifically. The average height of women is about 5’4” and in a normal curve the mean, median and mode converge, so 5’4” represents all three of those measurements. You can see that the curve is symmetrical on the axis of the mean, and that it has asymptotic tails that trail off and approach zero on each side.

One of the really cool things about normally distributed traits is that we can use standard deviations to anticipate the percentage of the population that will fall in a particular range of that trait. When a trait is normally distributed, for example, we can expect that about 68% of the population falls between one standard deviation above and one standard deviation below the average “score” on that trait. So in this case, we would expect about 68% of women to have a height that is within 1 standard deviation of the mean (above or below).

Similarly, when a trait is distributed normally, we can expect that 95% of the population’s traits can be contained within 2 standard deviations above and 2 standard deviations below the mean.

And we can expect that almost the entire population (99.7%) can be contained within 3 standard deviations above and below the mean.

Let’s try an activity…

“Imagine that you work for a clothing company that designs jeans for women, and you want to make strategic decisions about how much of your product to produce for women of below average, average, and above average height. Your shop has a “regular” line that caters to women of average height, but it also has “tall” and “petite” lines. So your boss says to you, “Hey Bethany [oh by the way your name is Bethany now and you work for a small denim boutique in Boston], we have a budget to produce 6000 pairs of jeans… Can you figure out how many should be petite, regular, and tall jeans?””

You think to yourself, “What an excellent opportunity to use my brain!” But almost immediately…

“Your coworker, Mark [who is the worst] steps in and says “Oh I’ve got this Bethany, 6000 divided by 3 is 2000 so we’ll make about 2000 of each line” [Mark isn’t a very smart dude but he talks a lot]”

But you know that height is normally distributed. You also know that the average height of women is 5’4”. You also know that the standard deviation of women’s height is about 2 inches.

The regular line fits women between 5’0” and 5”8, and recommends the petite line for women under 5’ and the tall line for women over 5’8.

Take a second to do a quick calculation. How would you distribute the 6000 jeans between the three lines to maximize sales?

…

““Actually Mark, statistically speaking, it’s strategic for us to make about 5,700 pairs of regular jeans, 150 pairs of petite, and 150 pairs of tall”. Mark walks away confused. Mark is still the worst.”

Awesome job, “Bethany.” You were able to identify that the regular line fits women between 5’ and 5’8”, which is two standard deviations above, and two standard deviations below the mean. You abstracted that you can expect about 95% of women to fit in these sizes, and that you should therefore commit only 5% of your resources to the tall and petite lines. (.05 x 6000 = 300, 300/2 = 150)

Normality Testing, Skew & Kurtosis

As you’re beginning to piece together, normality can be incredibly valuable. So it would be super helpful for us to be able to evaluate how normal a curve is. And there are a few ways we will do that this week.

The simplest way that you can establish that a curve is not normal is if it is bi-modal, which essentially means it has two “modes” or two “peaks” in the curve.

The second is by examining skew, or the asymmetry of the curve. Skew is a way of establishing how much the curve is drawn towards the upper or lower end of the range of values. If the tail of the curve points towards the higher values, we call this a positive skew.

If the tail points towards the negative values, we call this a negative skew.

You know that a curve is skewed right away if the mean and median are different. When the mean is less than the median, it is negatively skewed. And when the mean is greater than the median, it is positively skewed. But we can calculate this more specifically by hand or using Stata. (More on the pencil and paper approach later). As a point of reference, a perfectly normal curve has a skewness of 0.

Kurtosis is another measure of the shape of the curve, but a measure of how sharply peaked or plateaued a curve is. If the curve is very sharply peaked and narrow, we call this a leptokurtic curve. If the curve is stout and plateaued, we call this a platykurtic curve. As a point of reference, a perfectly normal distribution has a kurtosis of 3 (we call this mesokurtic).

Standardized Values & Z-Scores

Ok team. I’m about to explain this thing called a “z-score” and I promise that your reaction is going to be something like, “why would someone ever want to do that?” which is fair.

Trust that z-scores are a really helpful metric in calculating other important values (like skew). So stick with me. No skips.

A Z-Score is a standardized score, that converts each datapoint in a set of values into a new value (or “score”) in units of standard deviations. Now that sounds confusing. But it’s really no more complicated than converting inches to centimeters. But it is way cooler… because by converting each value into a score in “standard deviation units,” you can turn a value into a location on the curve.

To calculate the Z-score of a value in a distribution, you simply take each value (we call these “raw scores”), subtract the mean, and divide this by the standard deviation of the set.

Zscore = [[Raw Score] - [Mean]] / [Standard Deviation]

As I said, this allows for other calculations moving forward, but it also gives us more freedom to make comparisons across variables. So for example, consider that men and women have different distributions of height (the mean is a bit higher for men). Comparing Z-scores of height (rather than raw values of height) would allow you to compare a woman’s height to a man’s height based on how they measure up to others of the same gender. So you could say something like “yes, woman A is shorter than man B, but if her Z-score is higher than his, she is taller compared to other women, than he is compared to other men.” This is how standardized scores work. You can see how this might be helpful down the road, and you can absolutely count on seeing Z-scores again very soon.

Let’s practice…

If I had 5 students score the following on their exams:

70 80 80 80 90

What would their z-scores be? (Try on your own and check below)

First we calculate the mean, then we calculate the standard deviation, then we can calculate the set of z-scores:

Mean: 80 Standard Deviation: ~6

Z-Scores: -1.6, 0, 0, 0, 1.6

Sampling Error, Confidence, and Mean Testing

What does normality have to do with sampling? This is the really cool part. Every population of values has a distribution. And samples of populations have their own little distributions, that are bounded by the population distribution. The larger the sample, the more likely it is that that sample has a distribution that’s similar to the population’s distribution.

The first type of inferential technique you will learn is mean testing. Mean testing techniques are one way you can evaluate hypotheses about a population, based on sample data. What you’re really asking is: How representative can we expect our sample distribution to be of our population distribution?

Again, all inferential statistics allow us to do is report the likelihood that an outcome is or isn’t true, based on our sample data. This means that we have to decide for ourselves the threshold of certainty we are comfortable with, before doing any inferential calculations. To determine this threshold of certainty, you need to learn three new key terms: error, confidence intervals, and statistical significance.

All inferences involve error. Some error we are responsible for in the way we design our analyses. But some error is unavoidable, because we always face the possibility that our sample reflects an unlikely or misleading snapshot of the population. Even if the sample is a very good size and very well selected, there is still a very small possibility that it’s not a representative sample. Anytime we make a statistical inference, therefore, we accept that the possibility of error is unavoidable, and we do account for it, by naming and calculating the likelihood that our sample is not an accurate reflection of the population.

The more error you are willing to accept, the more targeted you can be in reporting your inference. The less error you are willing to accept, the broader the range of possibilities you have to report. I’ll give you an example.

If I survey 3 women, and they all report that they had no ice cream this year, and I don’t care about error at all, I’m willing to accept a ton of error in my inference. I might say “100% of American women had no ice cream this year.” This allows for a ton of error as it is very unlikely that my sample of 3 reflects the entire population of American women accurately (see, “itty bitty sample” to the right). So we would be able to report this with very low confidence.

A more responsible way of reporting this might be “I can say with almost no confidence that 100% of American women had no ice-cream this year.” Because we have almost no confidence that this sample is representative, given its size.

What you will see much more often, are reported statistics that sounds like this, “I can report with 95% confidence that American women had, on average, between 0 and 5 ice creams this year.” What you’re observing in that report is both an acknowledgement of error, and a confidence interval, which is the interval within which we are confident (to some degree) that something is true about our population. 0-5 is the confidence interval in that statistic. When we can report inferences with 95% (or sometimes 99%) confidence, we generally report this information as statistically significant (more on that later).

This week, the mean-testing technique you’ll be learning is “T-testing.” T-testing allows you to use sample data to make inferences about a population when you are comparing groups to one another. To learn, move on to the Week 8 Assignment!

Stop Here!

Week 8 Assignment

Great work! You’re done with lectures for the week!

And you’re ready for the Week 8 Assignment >>>