Tabulations in Stata
This week, you will use Stata Commands for the first time…
Here you will find directions for your assignment for week 2. You’ll be using Stata commands for the first time, and attempting to use these commands to produce some descriptive statistics about specific GSS Variables.
You’ll be reading about five commonly used Stata commands in chapter 2 of your textbook: tabulate, summarize, generate, replace, and recode. Here, you’ll be practicing these. We’ll begin with the “tab” command, which produces distribution tables.
You already downloaded your GSS data last week, so now you need to tell Stata that you’d like to use that dataset by typing:
use “L:\stats 2020\GSS2012.dta”
*This location (“L:\stats 2020\GSS2012.dta”) should work for those of you who saved the data in exactly the way you were directed for the Week 1 activity. Some of you either named the file differently, or located it in a different folder. To double-check the location of your dataset, open your app storage in Citrix, and confirm that the data is saved as a .dta file, within a folder, within your L drive. Your command will then look like this:
use “L:\[FOLDER NAME]\[FILE NAME].dta”
From here, we can begin. Imagine we were interested in confidence in organized religion. We might want some information about that variable:
Produce a distribution table of the variable “conclerg,” a measure of respondents’ confidence in organized religion.
tab conclerg
Identify what percentage of respondents reported having “hardly any” confidence in organized religion.
Now imagine that we are interested in whether men and women in the sample appear to have different degrees of confidence in organized religion. We might want some information on that variable as well.
Produce a distribution table of the variable “sex,” a measure of the sex of respondents.
tab sex
Identify what percentage of respondents are male.
Good! Now, we will cross-tabulate the variables.
tab conclerg sex
Excellent! But a bit hard to interpret. Maybe we want to include percentage breakdowns in the column. Let’s try that next.
tab conclerg sex, col
Maybe we decide that we’d prefer the percentages in the rows.
tab conclerg sex, row
Well done! Now let’s interpret: Which of the following do you know to be true, based on the output you’ve just produced:
American women generally have greater confidence in organized religion than men
American men generally have greater confidence in organize religion than women
There is not enough information to make either of the above claims.
Finally, select two variables of your own and repeat the above steps. This time, save your outputs. In addition to producing a crosstab, These will complete the first assignment in your portfolio.
Next — flip ahead to pg. 120 in the Longest text. For some variables (like the ones we considered above), frequency distributions offer intuitive and easy to metabolize snapshots of a variable. But for other variables (in particular interval-ratio) frequency distributions can be muddy and convoluted. Consider this. Produce a frequency distribution of the variable “age.”
tab age
All the information is there, but it’s very difficult to read or interpret. In this case, it would be more valuable to consider other measures. The “sum” function provides several of these. Try this:
sum age
Much easier to interpret! This command provides “obs,” the number of “observations” or respondents, “mean,” the arithmetic average of the values, “std. dev.,” the standard deviation of the values, “min,” the lowest value in the set, and “max,” the highest value in the set.
Try a new variable on your own. I recommend the variable “partners,” which is a measure of how many sexual partners the respondent has had in the past year.
tab partners
sum partners
Interpret all 5 columns of the output and include your description in your journal.