Tabulations in Stata

 
 

This week, you will use Stata Commands for the first time…

Here you will find directions for your assignment for week 2. You’ll be using Stata commands for the first time, and attempting to use these commands to produce some descriptive statistics about specific GSS Variables.

You’ll be reading about five commonly used Stata commands in chapter 2 of your textbook: tabulate, summarize, generate, replace, and recode. Here, you’ll be practicing these. We’ll begin with the “tab” command, which produces distribution tables.

You already downloaded your GSS data last week, so now you need to tell Stata that you’d like to use that dataset by typing:

use “L:\stats 2020\GSS2012.dta”

*This location (“L:\stats 2020\GSS2012.dta”) should work for those of you who saved the data in exactly the way you were directed for the Week 1 activity. Some of you either named the file differently, or located it in a different folder. To double-check the location of your dataset, open your app storage in Citrix, and confirm that the data is saved as a .dta file, within a folder, within your L drive. Your command will then look like this:

use “L:\[FOLDER NAME]\[FILE NAME].dta”

From here, we can begin. Imagine we were interested in confidence in organized religion. We might want some information about that variable:

  1. Produce a distribution table of the variable “conclerg,” a measure of respondents’ confidence in organized religion.

tab conclerg

  1. Identify what percentage of respondents reported having “hardly any” confidence in organized religion.

Now imagine that we are interested in whether men and women in the sample appear to have different degrees of confidence in organized religion. We might want some information on that variable as well.

  1. Produce a distribution table of the variable “sex,” a measure of the sex of respondents.

    tab sex

  2. Identify what percentage of respondents are male.

Good! Now, we will cross-tabulate the variables.

tab conclerg sex

Excellent! But a bit hard to interpret. Maybe we want to include percentage breakdowns in the column. Let’s try that next.

tab conclerg sex, col

Maybe we decide that we’d prefer the percentages in the rows.

tab conclerg sex, row

Well done! Now let’s interpret: Which of the following do you know to be true, based on the output you’ve just produced:

  1. American women generally have greater confidence in organized religion than men

  2. American men generally have greater confidence in organize religion than women

  3. There is not enough information to make either of the above claims.

Finally, select two variables of your own and repeat the above steps. This time, save your outputs. In addition to producing a crosstab, These will complete the first assignment in your portfolio.


Next — flip ahead to pg. 120 in the Longest text. For some variables (like the ones we considered above), frequency distributions offer intuitive and easy to metabolize snapshots of a variable. But for other variables (in particular interval-ratio) frequency distributions can be muddy and convoluted. Consider this. Produce a frequency distribution of the variable “age.”

tab age

All the information is there, but it’s very difficult to read or interpret. In this case, it would be more valuable to consider other measures. The “sum” function provides several of these. Try this:

sum age

Much easier to interpret! This command provides “obs,” the number of “observations” or respondents, “mean,” the arithmetic average of the values, “std. dev.,” the standard deviation of the values, “min,” the lowest value in the set, and “max,” the highest value in the set.

Try a new variable on your own. I recommend the variable “partners,” which is a measure of how many sexual partners the respondent has had in the past year.

tab partners

sum partners

Interpret all 5 columns of the output and include your description in your journal.