How US States Vary

Ken Lee just recieved his Ph.D. in economics from GMU; I was his thesis advisor; his thesis is here. I am impressed enough with Ken’s thesis that I’ll take the next few posts to describe some of his main findings.  The first finding I’ll describe: The main way that US states vary is in their health.

Ken collected 81 features of states, 56 cultural rankings and 25 demographic variables (listed below), and did a factor an analysis on them.  A factor analysis finds a few linear combinations of features that can explain the most variance in whole set of features; the variation of all the features could result from variation in just a few behind-the-scenes factors, plus error.

The biggest factor, explaining 27% of the variance between US states, was health – some states are just healthier than others, and this fact can explain many other things about those states.  Here are the three biggest factors:

  1. (27% of variance): Top five features: “low cancer deaths, low cardiovascular deaths, low smoking rates, low levels of unnecessary medical care, low obesity rates,” Also: “high well-being index, high exercise rates, healthiest, low mortality rates for blacks and whites, higher in education (IQ Rank, Percentage of Graduates, and Smartest), higher in health (Healthiest, Exercise Frequency, and Percentage with No Insurance), and lower in crime rates (Crime Rate and Violent Crime Rate) rankings.” Map: Factor 1
  2. (15% of variance): Top five features: “low occupational death rates, high in women’s rights, high in primary care physicians per capita, high in amount of fruit eaten per capita, low in percentage on poverty.” Also: “low in teen births, high on $ spent on K-12 education, high $ for teacher salaries, smartest … a higher percentage of people in the 25-44 age group, higher income, high college graduation rate, and higher urbanization.” Map:
    Factor 2
  3. (14% of variance): Top five features: “low rates of infections (HIV, STD), high in IQ, low overall crime rates, high in graduates, low in those having no health insurance.” Also: “low in violent crime, healthiest, low in percentage urban … regular church attendance, a high regard for religion, worse overall state economic health, high manufacturing employment, and high farming output.” Map: Factor 3

To me, factor 1 seems mainly about health, factor 2 seems about left (~forager) idealism  — fruit, women’s rights, safety rules, helping the poor, and spending lots on docs and teachers — and factor 3 seems about right (~farmer) idealism — rural, religious, low crime, sexual restraint, make real stuff, finish what you start.

The fact that health is the biggest factor says that health is very important, even beyond its direct benefits. And the fact that health and a tendency to spend on docs are largely independent says that medicine isn’t very important for health, and there should be enough variation among states to study just how important it is.

Here are those 81 state features:

IQ Rank, Smartest, Obesity Rate, Exercise Rate, Church Attendance, Importance of Religion in Daily Life, Percentage Going Hungry, Freedom Index, Tax Burden, Moocher Index, Coincident Index, Pro-Business Index, Gini Index, Farming as a percentage of State GDP, Farming Productivity, Happiness Index, Well-Being Index, Generosity Index, Manufacturing Employment, Manufacturing Output as a percent of State GDP, Teacher Pay Levels, Education $ Spent per Pupil, Percentage 9th Graders Graduating High School, Womens’ Status ranking, Crime Rate – overall, Violent Crime Rate, Speeding – traffic deaths due to speeding, Traffic Deaths – overall, Gasoline Usage per capita, UFO Sightings, Starbucks per capita, Wal-Mart stores per capita, Pollution levels, Cancer deaths per capita, Coronary heart disease per capita, Cardiovascular deaths per capita, Percentage of children under 18 in poverty, Fruit portions eaten per day, Outcome Disparity within state, Percentage reporting Poor Health, Infectious disease rate, Percentage with No Health Insurance, Unnecessary hospital visits per capita, Primary Care Physicians per capita, Public Health $ per capita, Mortality rate, Autism per capita, Teen Birth rate, White Mortality rate, Black Mortality rate, Occupational Death rate, Years of Potential Life Lost (YPLL), Healthiest, Binge Drinking rate, Smoking percentage, Under-employed percentage, Latitude, Longitude, Urban percentage, Census Region, Census Division, Population Density, Square Miles, Unemployment rate, Poverty Percentage, Income per capita, Female percentage, White percentage, Black percentage, Percentage 0-17 years, Percentage 18-24 years, Percentage 25-44 years, Percentage 45-65 years, Percentage 65+ years, High School Graduation rate, College Graduation rate, Alcohol Use per capita, Smoking Rate per capita, Births per capita, Men Registered to Vote, Women Registered to Vote.

GD Star Rating
Tagged as: , ,
Trackback URL:
  • Aron

    “The fact that health is the biggest factor says that health is very important, even beyond its direct benefit”

    Huh? No. It means that you [he] picked a lot of features that are easily clustered as health-related and thus correlate to each other. Rather than, say, baseball stats and mineral deposits. Not surprising in a paper titled: “Essays in Health Economics: Empirical Studies on Determinants of Health”

  • I am half with Aron, I don’t think the list of factors is too biased towards health. But I also don’t think its not surprising that health issues cluster, it’s nice to confirm the correlation, but it’s not surprising. More interesting the claimed non-correlation between health and health care.

  • Matthew Fuller

    One would probably have to read the paper…

  • John

    So if I’m understanding correctly the features you least for each factor are the features that are most strongly correlated wtih that factor. Assuming the factor is real they would be the features that depend the most on it. You / Ken then assigned a name to that factor based on what those features had in common.

    I’m asking because your explanation of factor analysis isn’t exactly the same as the example at the wikipedia page on it that you link two. Since in the analysis stage it’s all just math it’s the same to say that some features drive variance or that an invisible factor drives variance and certain features are most strongly correlated with that factor. It’s a direction to attack the problem for the two, though.

  • There’s also no correlation between spending and educational outcomes — even though Washington, D.C., which spends the third most per student, has the worst outcome, while the state that spends the least, Utah, has among the best educational outcomes.

  • jsalvatier

    “(~forager) idealism – fruit, women’s rights, safety rules, helping the poor, and spending lots on docs and teachers — and factor 3 seems about right (~farmer) idealism — rural, religious, low crime, sexual restraint, make real stuff, finish what you start.”

    Keep in mind that this doesn’t work well with the story you’ve been telling since you’ve been framing it as Farmer vs. Forager. For these two things to be separate factors in a factor analysis, they have to be (relatively) independently varying. If you want to interpret the results this way, you should change your story to allow for societies to be high on both kinds of idealism at the same time.

  • “factor 2 seems about left (~forager) idealism – fruit, women’s rights, safety rules, helping the poor, and spending lots on docs and teachers — and factor 3 seems about right (~farmer) idealism — rural, religious, low crime, sexual restraint, make real stuff, finish what you start.”

    Factor 2 seems to me to be better summed up as “affluence.”

    Factor 3 is really particular to “Yankee” farmers. The South is very rural as well but has a very different kind of rural culture even though it has some of the same values.

    The set of variables also seems heavy on ranked outcomes relative to unranked cultural differences that could explain the differences (e.g. predominant and second most common religious affiliation, religious diversity v. homogenity, dialect area, hierarchy indicators, political affiliation). Similarly, it would also be interesting to see how these factors compare to some of Putnam’s social capital indexes and some of Florida’s creative class indexes.

  • I see “White percentage, Black percentage,” but I don’t see any reference to Hispanic, Latino, Ethnicity or the like. You do realize there are more Latinos in the U.S. than African-Americans?

  • As Daniel Patrick Moynihan used to say, the easiest way to improve your state’s ranking is to tow it up close to the Canadian border.

  • Tom

    I always like where Mississippi ends up on these studies: the worst, last, or lowest of whatever positive attribute is being measured.

  • billswift

    When you put up something like those maps, how about doing it so that the key isn’t unreadably small?

  • JenniferRM

    If I was going to name the given factors they would be (1) HA = health awareness, (2) EFF= expensive family formation EFF, (3) and C = conscientiousness.

    HA is the least interesting to me from a modeling perspective because it appears to be significantly “cultural” in a kind of geographically arbitrary way. However, from the perspective of “changing your mind and behaviors to get a better outcome” it seems like health awareness is the place to focus.

    EFF makes sense in terms of being liberal/educated/urbanized and geographically it appears to be happening in areas where the cities are crammed together or pushed against a border, a great lake, or an ocean. If “housing costs” were taken into account I’d expect it to show up as a factor because as housing costs go up, family formation is more expensive, new humans are harder to make, more attention is paid to investing in the relatively less numerous kids, and you need a paying job in order to afford to stay there during retirement.

    C looks like a difference between “ice people” and “sun people” to me. The issues that contribute to the “conscientiousness” label are church, crime, and school completion. I wouldn’t be surprised if the geographic distributions have a lot to do seasonal affective disorder and snow (neither of which are available to contribute to the factor, but I’d predict that they would be part of it if they were available, another good factor would be per capita hours of air conditioning).

    Louisiana and Mississippi are cheap places with sun people. California is expensive with sun people. Montana is cheap with ice people. The only combination that doesn’t exist is top quartile in both expense and coldness, but Wisconsin and Michigan are examples that are close to that combination.

    Interpreting EFF and C as opposite ends of the same “farmer-forager” axis seems sloppy to me. States like Montana and California fit the single axis model with their opposite extremes, but the states that are high or low in both EFF and C (like Michigan or Mississippi) give lie to the single axis model.