April 14th Module

Wellness Rollcall!

How are we doing this week?

This week was very rough, I could use some support

This week was rough but I'm ok on my own

I'm above water, hanging in

I'm starting to get some momentum and feeling great

What's one little thing you could do today to support your mental or physical health?

Name (Optional)

First Name

Last Name

Let’s remind ourselves how we recoded the religion variable:

relig_condensed:

1 - Non-Catholic Christianity

2 - Catholicism

3 - Non-Christian, but Religious

4 - No Religion

We can see four groups (Non-Catholic Christian, Catholic, Non-Christian Religious, and Not-Religious)

What the ANOVA test allows us to do is find out if there is a statistically significant difference in means between groups.

Now we can orient to where we stopped last week…

We were using ANOVA to test the Null Hypothesis is that all the group means are equal to one another. In this case, that Catholics, Non-Catholic Christians, Religious Non-Christians, and Non-religous people all have the same mean ideal number of children.

So we ran our One-Way ANOVA test in Stata.

Look at all the work it did for you! And some familiar friends are here! You have the F-statistic, your within and between group variances, some degrees of freedom! And our old pal, the p-value which is all the way to the right there under “Prob>F”. This is the most important element of the output. If this value is less than .05 we can reject our null hypothesis. And it is! So we can!

We can reject the null hypothesis that all the groups have the same mean.

Let’s consider how this is different than what a T-Test would allow.

In this week’s activity, you’ll have the opportunity to use both t-tests and ANOVA testing to address the above questions. Start to develop an intuition for when and why one test might be more valuable or more appropriate than another.

Two-Way ANOVA Testing

What you’ve done so far of ANOVA testing is “one-way” ANOVA testing. But you can actually layer on additional grouping variables, once you understand the basic logic. Let’s consider some other variables from my list.

chldidel

sex

race

Imagine I wanted to look at differences in sex categories and differences in race categories. I could run a two-way ANOVA test. And the syntax in Stata is incredibly simple. Just put your “dependent variable” (your interval-ratio variable) first.

Look at those p-values in the far right column. Remember, you’re looking for significance at the p<.05 level. You can see that sex qualifies for statistical significance, and race does not.

You might be wondering…

why you wouldn’t simply run two separate ANOVA tests, one for sex and one for race. The multivariate ANOVA test doesn’t simply run separate evaluations of each grouping variable. They produce a single model based on all the variables incorporated. So we can observe how the variables work together.

Check this out…

Below is a two-way ANOVA test of income and race. You can see in the far right column that we just barely have significance at the p<.05 level. So we would say based on this test that there is a statistically significant difference in income between racial groups.

But look at this…

When we add “education” to our model, and the model “accounts” for the within and between variance of both race and education, the significance of the race variable “drops out.” It’s jumped to a p-value of .23, while education remains significant.

We won’t do much with this in this course, but this logic will be incredibly important to any statistics work you do in the future. It is this logic that allows sociologists who are examining the relationship between to variables, to “control for” or “account for” the “influence” of other variables on the relevant outcomes. This week (and next) you’ll try this on with your own variables.

Summary: What you need to know of ANOVA testing for the purposes of this course is:

One-way ANOVA testing allows us to consider differences across more than two groups within a variable
Two-way ANOVA testing allows us to consider more than one grouping variable
ANOVA testing requires that your “dependent variable” is interval-ratio level and continuous
ANOVA testing assumes approximate normality in your dependent variable for each of the groups, homogeneity between variables (you don’t need to know what that means yet, but if you still have your book you’ve read a bit on this), and that there are no major outliers. (You aren’t required to do these checks for your portfolio, but it’s good to know for “the real world”)
What ANOVA testing is examining is how much variation on a dependent variable can be “explained” by within-group variance, and how much can be “explained” by between-group variance, for each independent variable.

You’ll play with this a bit in our last weekly activity.

One… More… Technique!

Team this is so exciting. You only have one more technique to learn. And it’s Linear Regression (we will sometimes call this OLS regression which stands for “ordinary least squares” regression). And the thing that’s cool about this is that you already understand the logic of linear regression… because you went to middle school… and you learned what a slope was… Ok let’s time travel for a second…

Remember this?

y = mx + b

Of course you do. And for some of you this brings up powerful feelings of accomplishable SAT questions. For others it yields only fear. All reactions are welcome.

But this should also look familiar:

Now here’s a fun fact:

Remember when we did correlations to examine effect size? Let’s stick with “education” and “income” and look at one.

And you would look at that and say “Wow. What a lovely, direct, moderate effect size we have there between education and income! Another way to think about this “effect size” is that it’s a kind of slope. That’s essentially what a correlation coefficient is. The “m” in the “y=mx+b” formula, is a correlation coefficient, telling you how “y” correlates with “x.”

Now imagine you’re interested in the relationship between education and work hours. We are interested in the correlation between years of education and weekly work hours. These are both ratio level variables in the GSS…

But it’s still middle school…

And then she’d ask you to calculate the slope of the yellow line. And you’d count your rise over run in the graph and you’d guesstimate and say, well it’s a bit less steep than a 45 degree angle, so it’s a bit less than 1. So you’d say something like: “The slope is ~0.8!” And she’d ask about the y-intercept and you’d see that the line of best fit intersects the y axis at 0 and you’d say, “The y-intercept is 0!” And then on the test it would say “what is the equation of the line of best fit?” and you’d say:

y = (0.8)x + 0

Congratulations you just did a super rudimentary version of linear regression.

BUT there are 2 hard truths we are going to reckon with now:

Hard Truth #1: Some lines of best fit are better than others.
- For example, the line of best fit on the left is more representative of the reality of the data, than the line of best fit on the right. (We will talk about “error” here, which is calculated essentially based on the amalgamated data’s distance from the best fit line). OLS regression accounts for this.

Hard Truth #2: Real. Data. Never. Looks. This. Good.
- Unless you get really really really really lucky.

As you can see, when you’re working with “real data” mapping a line of best fit, and evaluating how representative that line of best fit is of the data, is going to be pretty rough. This is reason #698 why we love Stata. Because Stata’s regression command does the hard work for us.

Stata appreciation exhale

Any questions you'd like me to address when we take this to Stata?

How are we feeling on ANOVA testing?

Better than Last Week

Worse than Last Week

About the Same

How are we feeling with the basic logic of linear regression?

Middle school is totally fresh and I feel ready for what's next

This was helpful for re-orienting and I'm soon going to be ready for Stata

I did not enjoy middle school the first time, and I will not enjoy this

I have absolutely no idea what's happening

April 14th Module

Wellness Rollcall!

Let’s remind ourselves how we recoded the religion variable:

Now we can orient to where we stopped last week…

Let’s consider how this is different than what a T-Test would allow.

Two-Way ANOVA Testing

You might be wondering…

But look at this…

Summary: What you need to know of ANOVA testing for the purposes of this course is:

One… More… Technique!

Remember this?

Now here’s a fun fact:

But it’s still middle school…

Congratulations you just did a super rudimentary version of linear regression.

BUT there are 2 hard truths we are going to reckon with now:

Hard Truth #1: Some lines of best fit are better than others.

Hard Truth #2: Real. Data. Never. Looks. This. Good.

Never going to happen:

If you’re really really lucky:

Actual Stata Scatterplot for Years of Education and Work Week Hours:

As you can see, when you’re working with “real data” mapping a line of best fit, and evaluating how representative that line of best fit is of the data, is going to be pretty rough. This is reason #698 why we love Stata. Because Stata’s regression command does the hard work for us.

Stata appreciation exhale

April 14th Module

Wellness Rollcall!

Let’s remind ourselves how we recoded the religion variable:

Now we can orient to where we stopped last week…

Let’s consider how this is different than what a T-Test would allow.

Two-Way ANOVA Testing

You might be wondering…

But look at this…

Summary: What you need to know of ANOVA testing for the purposes of this course is:

One… More… Technique!

Remember this?

Now here’s a fun fact:

But it’s still middle school…

Congratulations you just did a super rudimentary version of linear regression.

BUT there are 2 hard truths we are going to reckon with now:

Hard Truth #1: Some lines of best fit are better than others.

Hard Truth #2: Real. Data. Never. Looks. This. Good.

Never going to happen:

If you’re really really lucky:

Actual Stata Scatterplot for Years of Education and Work Week Hours:

As you can see, when you’re working with “real data” mapping a line of best fit, and evaluating how representative that line of best fit is of the data, is going to be pretty rough. This is reason #698 why we love Stata. Because Stata’s regression command does the hard work for us.

*Stata appreciation exhale*

Stata appreciation exhale