Which states should have the first four primaries? The case for picking the weirdest ones

This week, Democratic National Convention chair Tom Perez declared his support for changing the order of presidential primaries and caucuses so that Iowa and New Hampshire are toppled from their respective ‘first in the nation’ positions.

Perez argues that more diverse states should be prioritised in selecting the Democrats’ future presidential candidates, presumably driven partly by the dismal performance of Joe Biden in 90(ish)% white non-hispanic Iowa and New Hampshire, before going on to win elsewhere with a diverse electoral coalition.

The debate over which states should go first has led a lot of people to argue that the most ‘representative’ states should be prioritised – that is, those states whose demographics match the nation as a whole. NPR created its own ‘perfect state index’, to measure this similarity, with Illinois coming out as the most representative of the country. But is the most representative state the best to go first?

In reality, US social geography means there are very few ‘normal’ states. For example, while Illinois is only 3.9% different from the racial makeup of the US as a whole, the next closest state is Connecticut at 12.8%. The majority of states are over 37% different. When it comes to winning a primary or the eventual general election, the candidate will not have to win a ‘representative’ state, but a mixture of very unrepresentative states. If the first four primary states were representative of the country, they might each be 63% white non-hispanic, which is actually a very poor test of whether a candidate has strength with both white voters and voters of colour.

The same is true for many demographic features – wealth, education, religiosity etc. America is not a country of diverse states, but a diverse country of many fairly homogenous states and some very diverse states.

Leading the primary process with ‘representative’ states also waters down the voting strength of groups which reform is supposed to amplify. While candidates are currently expected to reach out to African American voters in South Carolina and Hispanic voters in Nevada, if the first states were representative of the country, these groups would not form a critical majority in any of the first primaries.

The solution is to choose four unusual (‘weird’?) states to kick off the primary process, ones which will each have a unique test for presidential candidates.

To find which states fit this criteria, I used Principal Components Analysis (PCA). PCA works by converting multiple variables of data into a number of ‘components’, each of which is orthogonal (at a right angle) to the others. This essentially means finding components which explain most variation in the data, simplifying the correlations between the variables into a single component, then adding another component to explain the largest proportion of the remaining variation, until all variation has been explained.

PCA is useful for this analysis because it means we can place the states in a multi-dimensional ‘space’, based on components which best explain the variation in the data. This way, we can see the distance between states and pluck out some of the extremes. My PCA is based on each state’s percentage African-American, percentage Hispanic/Latino, percentage ‘Other’ race (this implicitly means the percentage white non-hispanic is being considered), percentage with a bachelor’s degree or higher, median income, urbanisation, and percentage who attend church.

For illustration, the first two principal components (explaining approximately 63% of variation in the data) are plotted below, along with the effect each variable has on a state’s position on the two axes.

This diagram looks like a bit of a mess at first glance, but to try and simplify: we can see that by moving along the x axis (principal component 1), median income, urbanisation, and percentage with a degree, hispanic or other race decrease, while the percentage who are African American or attend church increase. On the y axis (PC2), increasing the value reduces urbanisation, percentage African American and attend church and increases the percentage of other race. This way, demographically similar states are grouped together, e.g. the deep South in the bottom right, or Maryland, New York and New Jersey in the bottom left.

These are just the first two dimensions but each state has a vector position across 9 principal components together explaining 100% of variation in the data.

There is a strong case that the first four states should be small, so there is not an inbuilt advantage for candidates with more money or access to expensive media markets. This also helps prevent an insurmountable number of delegates being chosen before most states have voted. For these reasons, I excluded states with population greater than 6.6 million. I also (possibly unfairly) excluded Alaska and Hawaii, so that the first four primaries would happen in the 48 states of the contiguous US.

To find the best first four states, I looped through all combinations of four states and summed the euclidean distance (in principal components) between all pairs of states. For example, the first combination is Alabama, Arkansas, Colorado and Connecticut. By adding the distance between pairs (AL-AR, AL-CO, AL-CT etc) the summed distance is 26.1. The four states with the greatest distance are….

*drumroll*

Maryland, Mississippi, New Mexico and Vermont

These states represent the extreme of states’ demographics. Mississippi, for example, is the state with the highest African American population. New Mexico has the highest hispanic population, Vermont is the least religious state, and Maryland has the highest median income.

But beneath these headline statistics too, they represent many of the different types of American life. For example, both Maryland and Mississippi have significant black populations, but in Maryland black people are highly urbanised while in Mississippi they are predominantly rural. Both Maryland and Vermont are highly educated and high income, but while Maryland is diverse and urbanised, Vermont is very white and rural. Both Mississippi and New Mexico have low median incomes, but New Mexico is less religious and more educated.

Each state poses a completely different electoral challenge to primary candidates and as a group they include the largest possible proportion of the US’s diversity. If a candidate can survive primaries in New Mexico, Vermont, Mississippi and Maryland, they can win anywhere.

For reference, here are some of the most diverging and homogenous combinations:

RankStatesDistance
1MD, MS, NM, VT38.47181
2MD, MS, NH, NM37.94593
3MD, ME, MS, NM37.59456
4MD, MS, NM, WV37.15109
5MD, MS, MT, NM36.80862
7,843IA, NH, NV, SC (current)24.32512
31,461IA, KS, ND, NE7.472435
31,462IA, KS, ND, NE7.414509
31,463IA, ID, ND, NE7.375798
31,464IA, ID, KS, NE6.943583
31,465IA, KS, MO, NE6.919990

Leave a comment

Discover more from Owen Winter

Subscribe now to keep reading and get access to the full archive.

Continue reading