4 Comments
User's avatar
Dylan's avatar

I redid the analysis using the CDC's six-level urban-rural classification instead of population per unit area, and it removed race as a significant factor. The R-squared of the model also goes up when using the CDC's classification (0.31 -> 0.41).

Expand full comment
Zachary Donnini's avatar

Thank you for your insightful comment, although I am unable to replicate your analysis. I am wondering whether you did a linear regression instead of a beta regression, which I am more comfortable with given the output is a proportion that ranges between 0 and 1.

I downloaded the CDC six-level classification and used it instead of ln(population/unit area) and redid the analysis as you suggested but my "pseudo R-sqaured" declined from .2 to .14 and race did stay extremely significant (Black% Pr(>|z|)<2e-16).

Expand full comment
Dylan's avatar

Sorry for not getting back to this sooner: I re-did the analysis with beta regression (nice catch) and the result didn't change w.r.t variable significance. I think this is because I am not using the census dataset (how did you have the patience to clean this??). Instead, I used demographics, income, and population data from [CORGIS](https://corgis-edu.github.io/corgis/csv/county_demographics/). Another discrepancy is that my R-squared values are far higher than the 0.2 or 0.14 you reference in your comment (this doesn't mean that my data is "better", just an indicator that our datasets clearly differ substantially since we're doing the same analysis in theory).

An interesting additional result I found is that controlling for whether the state is in the South causes percent-Black to have a non-significant positive correlation, both with log-density and the CDC models. My model with the highest pseudo R-squared (0.49) is the one with log-density that controls for Southernness, but none of the models, including the ones in your initial analysis, lie outside of the bound of (0.30, 0.50) with my data, and I don't think that splitting hairs on model accuracy will lead to much here.

I can send you my scripts if you'd like, or if you send me census data I can re-run the analysis.

Expand full comment
Zachary Donnini's avatar

Sorry for the late reply. Thanks again for the insight in this comment, but I can't replicate this. Even adding a fixed or random effect in for "state" the coefficient for "Black%" is still negative with extreme significance (Black% Pr(>|z|)<2e-16). In the fixed effects model, the Pseudo R-squared increases to 0.3412. This negative coefficient and exists significance persists whether I weight by county population or chose not to. I'm not sure what's happening in your analysis, but it is hard to identify.

Potentially you have some sort of problem in your betareg related to extreme values? Try adding .0001 to every proportion then multiplying by .999 to fix that...

I sadly can't send the census data since it is a proprietary dataset, but it really shouldn't matter that much. I'm sure the CORGIS data is slightly different than the official gov't numbers but it shouldn't be that far off.

Expand full comment