Week 3 Activity
Descriptive Statistics & Imaging Data
This week, you will use Stata Commands to master descriptive statistics, and you will experiment with imaging data…
Here you will find directions for your assignment for week 3. You’ll be using Stata commands to produce descriptive statistics about specific GSS Variables, and you’ll be imaging data for the first time.
In the Longest text, ch 4 (or “Descriptive Statistics” in other editions), you’ll be enriching your understanding of descriptive statistics, and you will be introduced to how to image data.
First, you need to open your dataset. Use the “cd” command first to give Stata a location in your L Drive. Then use the “use” command to open your dataset. (Refer back to the Week 1 Assignment if you need to review)
From here, we can begin. Last week, you learned how to produce distribution tables for variables of interest, using the “tab” command. As a refresher, let’s tabulate a new variable, with an added challenge — You have to locate the variable!
Identify what percentage of respondents reported a belief that “homosexual sexual relations” are “always,” or “almost always” wrong.
You may be able to find the answer through trial and error. But you may prefer to use the GSS dataset codebook to locate the variable you’re interested in:
http://gss.norc.org/documents/codebook/gss_codebook.pdf
Can you cross-tabulate that variable by race?
Can you include column percentages?
Now let’s try something new.
Using the guidelines that begin on pg. 108 (towards the very beginning of “Descriptive Statistics” chapter in other editions) of the Longest text , can you identify the number of cases for which there is “missing data” for the variable “homosex”? (include the command and the output in your log)
Using the guidelines that begin on pg. 109 (towards the very beginning of “Descriptive Statistics” chapter in other editions) of the Longest text, can you produce an output that sorts categories in order of frequency, rather than the automated “logical” ordering?
Great work!
But what do you do if you need to get descriptive information about a variable, for only one subgroup of respondents?
Let’s stick with “homosex” and “race” for now. Imagine that you are trying to get descriptive information only from respondents that identified as “white.” First we have to learn a little bit about how that variable is inscribed, so we’ll pull a frequency distribution.
tab race
You should see a list of 3 race categories and their frequencies. But Stata understands those categories numerically. So we will use the “nol” option to produce a list of numerical categories.
tab race,nol
Cross-checking the frequencies listed in the two outputs above, you should be able to identify which race category is listed as “1,” which is listed as “2,” and which is listed as “3.”
Are respondents who identified as white listed as 1, 2, or 3?
Great! So let’s imagine that we would like Stata to output a frequency distribution for only respondents that identified as “black.” We need to create an “if” command. In this case, we know that “2” is the numerical category associated with respondents that identified as “black.” So we want to tell Stata to provide descriptive information about attitudes about homosexuality, only including respondents if they identified as “black.”
Try to write the command on your own using p. 111 (also in “Descriptive Statistics” chapter in other editions) in the Longest text, (syntax included below for clarity)
Great work! It should look something like this:
tab homosex if race==2
Well done. Include the output in your journal.
How would you describe, in simple language, the top row of the output (i.e. the “always wrong” category)? Don’t speculate, just interpret the information.
Great work! Before we move on to imaging data, I want to encourage you to go through the motions of the first half of this week’s activity with several other variables in the GSS. Familiarize yourself with what works, and with where conflicts emerge. Now that you have access to the GSS codebook you can read more about the other variables . In the coming weeks you’ll be invited to select a few variables of interest, which you’ll work with for the rest of the semester. Begin exploring and considering what might be interesting to you. When you’re ready… continue below.
Can you create a histogram for the variable, “homosex”? Use the directions that begin on pg. 112 (“Descriptive Statistics” chapter, syntax included below for clarity) of the Longest text to guide you.
The syntax should look something like this:
histogram homosex
What’s produced by default should be a histogram that displays the variable categories against the density of that category. Using the directions that begin on pg. 112, can you change the graphics of the histogram output to display percentage, rather than density?
Great work! Now can you change the graphics so that the output displays “value labels”?
* For a challenge, use the directions on pg. 116 (pg. 99 in 2nd edition, titled “A Closer Look: Using Commands to Create Graphs”) to attempt these outputs using commands, instead of the “point and click” directions*
Include your output in your log.
Now make an alteration to the Y-axis properties of your histogram. Be strategic about why you’re making the alteration you chose. Include the output and a brief justification for why you made this alteration in your journal. (Use the directions from pgs. 117-120, or the rest of the “Descriptive Statistics” chapter if other edition)