QUANTITATIVE EXERCISE

EACH of the following questions must be completed in full. You will need to turn in both your dataset and the codebook for your dataset in excel or google spreadsheet format, and submit a word or google doc for your write-up that includes an appendix with variable sources, definitions (including year), coding, and summary statistics as well as a table presenting your results. You will also submit your STATA log as a pdf. Points will be deducted for not following submission instructions accurately because it’s an essential skill to practice. It is very important that your quantitative exercise is not the same as any of the models used in the STATA exercise in class.

From the resources available on the external links on course site, compile a cross-section of data that includes the following variables:

NOTE: Submit your dataset as an excel spreadsheet that includes 2 sheets. Sheet 1 is your dataset, and sheet 2 is your codebook that should include data definitions and sources.

***It is important that you collect data on each of your variables for the same year.

U.S. States

State ID

Population Density

Murder rate

Education

Unemployment and/or poverty

Violent Crime Rate (define as murder and rape)

Ideology or partisanship

Code a state-level policy variable (e.g.; abortion regulation, gun policy, fracking regulation, state NEPA policies, tax regressivity, etc.)

NOTE: You will have to research state policies and devise a coding scheme for the policy variable that you choose unless you find one in the literature (see e.g.; other data posted on SPPQ website or refer to ICPSR data).

*Refer to the data resources posted on CS*

Develop a hypothesis that you can test with the dataset that you have compiled, and run a simple, linear (OLS) regression.

NOTE: You do not have to attach or upload your STATA or SPSS output. You should report your results as a table in your final exam document consistent with the format common in quantitative articles (see examples posted on Course Site). Then, use that output to answer the following questions:

State your hypothesis

State the null hypothesis

Is the model significant? How do we know?

Is the independent variable of interest significant? How do we know?

What do the R2 and Adjusted R2 tell us about this model?

Calculate the summary/descriptive statistics for each of the variables. What do the mean, median, and mode tell us about the distribution? What does the range tell us about each variable?

Don’t forget to check for and report on multi-collinearity.

Is OLS regression BLUE (the Best Least Unbiased Estimator) in this case?

Offer some analysis of the ways in which your model may be improved.