Political operatives have long defined their target audience by one word: the universe. In a campaign, the “universe” is defined as the list of persuadables you need to convince, or the partisans you need to turn out. You then cut your list so you can mail, call, or (increasingly) serve cookie-targeted online ads to the universe.
The decision on whether one should be included in the universe is typically a binary one. You’re either in or you’re out. Little distinction is made between members of the universe.
A typical example of a universe is the “4 of 4” voter. This is the person who has voted in each of the last four major elections, and is virtually guaranteed to vote the next time.
While “4 of 4” is typically a reliable rubric for predicting voting behavior, the ability to use predictive analytics allows us to go deeper. To use a somewhat morbid example, a very elderly “4 of 4” voter is not, in fact, as likely to vote in the next election as someone with a perfect voting history just entering retirement. Certain groups of voters may statistically be more likely to move to the next state before the next election, or be disillusioned and drop out. Basic political data, enhanced with these variables, allow us to assign to each voter their own personal probability of voting, between 0 and 1.
We typically associate the use of probability and statistics with the Ph.D.’s residing inside the Obama “Cave,” but analytics is being democratized more than you might think. VAN, the Democratic Party’s leading database software, expresses turnout and support scores as statistical probabilities. Volunteers can run a query of the voters in their precincts who have a 73% chance or more of supporting the President. Republicans have been more likely to hide these scores from users of their tools, using a handful of descriptive categories, like “Hard R,” “Soft D,” or the ubiquitous “4 of 4.”
This subtle difference in how the two parties talk about data is perhaps the most telling contrast, operationally speaking, between Democrats and Republicans.
Fitting voters into neat categories and packaging them up into a “universe” is done in the name of simplicity, but it actually ends up confusing things. A system whereby each voter is assigned a score on a number of variables—everything from turnout, to support, to the likelihood of changing their mind if contacted, to the walkability of their house by canvassers—allows a campaign to create a simple, mathematically-driven decision-making framework to guide all voter contact. You simply sort voters (or precincts) by the relevant attribute, and work your way down. A system whereby the electorate is chunked into discrete universes, with each consultant offering their own pseudoscience as to how “persuadables” or “high turnout independents” need to be treated, is infinitely more complex and error-prone.
At RootsCamp, the team from BlueLabs openly talked about putting this system into action for the Terry McAuliffe campaign in Virginia. Precincts with a 0.2 “walkability” score or above, as defined by the concentration of persuadable voters per square mile, were placed on the priority list for canvassing. Modeling along more than a dozen different attributes, with continual fine-tuning throughout the election cycle, was done for less than 1% of the cost of the campaign.
The ability to assign a probability to everything within a campaign also allows you to detect errors and improve accuracy over time. A huge problem with data in politics is that it’s too often taken at face value. People receive a voter file, or a turnout model, or a consumer file from a vendor, or a cookie match, and assume it is of high enough quality to guide action. Being data driven actually means asking questions of it, and exhaustively validating it—which can easily be done as a byproduct of existing activities. You may find that killer consumer variable you just got has no impact at all on those voters’ engagement with your candidate. Or you may find your “persuasion universe” is actually not that persuadable at all compared to other groups, exposing what could be a fundamental flaw in the underlying data.
Often, people get excited about being “data-driven” but only go part way. If you’re asking for a “data driven” ad buy to women 35 to 49, how do you know women 35 to 49 are the right target? Did you test it? The reason you collect data is to optimize based on probability. Instead, try placing an ad designed to reach individuals with a score of 70 or more on your persuadability model. The targeting itself also needs to be done probabilistically.
The culture shift needed in politics is not one of technology. Everyone loves technology and wants more of it, because it lets you to do whatever you’re doing more efficiently. The problem is that what you’re doing could be the wrong thing. Applied the wrong way, technology helps you run very fast in the wrong direction.
What we need before better technology is a new way of thinking. We need to think probabilistically. This sounds like common sense, but it’s shocking how little this is understood. There has been much talk lately of making decisions based on data and not on “gut,” but little understanding of what this actually means.
First, it requires us to understand that while “universes” will continue to exist in politics—whether they’re defined by voter registration, political boundaries, and even modeled data, the boundaries of the universe can be fuzzy, and voters who belong in it one day, and leave it the next. The electorate is not a binary set of 1s and 0s, but can be represented on a spectrum of ever-changing support, turnout, and persuasion scores between 0 and 1.
Second, probability is three dimensional and can be applied to almost anything a campaign does. We can apply probability to the voter to guess how he or she will vote. We can also apply probability to the intervention we propose for that voter (phone call? mailer? digital ad?) to gauge how likely that individual is to change their mind. Combine the two and you get simulation, the ability to predict which sets of strategies and tactics will move the most votes in your direction.
Third: ABV. Always Be Validating. Simulation offers a great opportunity to judge your work based on real world outcomes (increasing support scores) prior to Election Day. But invariably, the outcome you expect will differ from what happens in the real world. This is when you tweak your model, getting it closer and closer to actual performance. And just because something worked before, that doesn’t mean it will work again. Novelty effects mean the effect of a given intervention can wear off the more times it’s tried, meaning you must constantly re-test.
Probability is how the house wins in Vegas and a big part of how our own finely tuned mental computers judge risk and reward. But it is all too foreign in the bombastic, hype-driven world of politics. Targeting based on arbitrary categories, even if derived from fancy consumer data (e.g. “Walmart moms!”) can win you points for being “data driven”—while completely missing the point.
Right now, many are focused on building better toolsets and increasing the quantity of data available to candidates. This is good, but it isn’t enough. First, let’s try changing the way we make decisions, to a simpler and more elegant system guided by probability.
If you think that the probability that I’m on to something is .7986 (or thereabouts), please tweet this post, and share it on Facebook and Google+.