Tag Archives: Big Data

Applying "Inside the Cave" Beyond Campaigns Products, , ,

After reading our report on the Obama campaign’s digital and analytics operations, Inside the Cave, you may have thought to yourself, “That’s fascinating, but how can all this help my organization?”

To answer that question for non-profits, I teamed up with Obama for America’s Michael Slaby to produce “From Politics to Public Policy: How Campaign Lessons Can Amplify Your Work.” The paper was sponsored by the Joyce Foundation, a leading foundation based in Chicago.

We’re quickly seeing the differences in digital methods employed by campaigns, corporations, and non-profits disappear as everyone zeroes in on the same question: “What can I do to make sure the right message hits the right people, and only the right people, at the right time?” With more tools than ever at our disposal, the answer to this question might seem simple. But, as we saw in 2012, tools cannot reach their potential without insight and rigorous testing — even when it means putting sacrosanct gut feelings up to the test.You can read the full paper here or read summaries in the Standford Social Innovation Review.

As always, feel free to get in touch if you have questions. You can reach me at patrick (at) engagedc (dot) com.

Leave a comment
Going Inside the Cave Analytics, Fundraising, Social, , , , , , , ,

In the immediate aftermath of the 2012 presidential campaign – like any campaign – two things happened: the winners went to bragging and the losers started pointing fingers. One thing became clear. Obama for America’s digital, technology, and analytics teams were indispensable in securing the president’s reelection.

OFA was, far and away, the most sophisticated political organization on the planet. And Republicans needed to learn from them. So we set about gathering insights, data, and anecdotes from hundreds of news articles, blog posts, interviews, podcasts, and presentations. Our findings have been collected and organized into a single slide deck called Inside the Cave, which you can download here.

The Cave is what OFA called the windowless room that housed their analytics team. Like digital in 2008, analytics came of age in the 2012 campaign. OFA’s analytics team had 50 staffers. By comparison, the Romney-Ryan campaign had a data team of 4 people.

Veterans of OFA have been surprisingly forthcoming in providing details on how they leveraged the latest in technology and digital strategy to make their campaign as effective and efficient as possible.

In 2016, Republicans can’t afford to fight the battles of 2012. We have to look forward to the future and start preparing now.

Inside the Cave is required studying for anyone involved in electoral politics. Download it now.

View all 19 comments
In Defense of Skewed Data Analytics, Social, , ,

With #skewed polling in the news this campaign season, I stand as a lonely voice for noisy, biased, self-reported, and yes, skewed data in the Presidential race.

This piece is not about the purported bias in public polling, though I could go on and on about the shoddy reporting and analysis about polls. It’s about all the people who are getting into the polling game (Engage included) by using social media and Internet data to try and get a fix on what’s going on in in real time. This post is a field guide to these types of efforts, explaining where they’re useful, and how they do (and don’t) beat the polls that captivate the political class.

Why Scientific Polls Aren’t Enough

Let’s ask the first-order question here: Why?

Public opinion polls seem to be pretty good at forecasting the winners of elections, so why reinvent the wheel with newfangled metrics like tweets-per-minute or Facebook’s “people talking about this” number that aren’t scientific and whose subjects tend to be overly partisan and biased? Why study the Internet to figure out how public opinion is changing minute-by-minute?

On this, I still think Sir Edmund Hillary’s answer when asked why he would climb Mount Everest serves as a good guide: Because it’s there.

After more than a decade doing online activation, I can testify to the fact that there’s just about nothing users like better than answering polls. Millions of people answer online surveys every day, but the public polls released at the height of the election season reflect interviews with only a few thousand respondents per night.

In this age of abundant data, why is it getting harder and harder for pollsters to collect useful data on the electorate? Response rates to telephone surveys continue to plummet, and pollsters must recalibrate their methodologies to include cell phone-only households. You would think the Internet could pick up the slack here, but curators like Talking Points Memo won’t include online-only polls like YouGov in their averages. Technology hasn’t translated into a quantum leap forward in the volume of responses and quality of polling data.

As puzzling as I find anti-Internet bias, I have to concede there are some valid concerns here. To get a perfectly unbiased sample, you have to harass people over the phone because virtually all methodologies on the Internet are opt-in, and people who opt in to things are different than those who don’t. By definition, polls are about finding people who don’t already want to take them. Finding the rare person willing to sit through an interview and then balancing their responses is an expensive proposition. According to Chuck Todd, to do it right, NBC and the Wall Street Journal shell out between $40,000 and $60,000 per poll:


The end product of these polls (which come in more and less-expensive varieties) is between 500 and 3,000 interviews that, on average, reflect public opinion as of a few days ago. After Mitt Romney’s crushing win in the first debate, we did not know that he had moved slightly ahead in the race until more than a week later. This is partly due to how polling shifts play out over several news cycles, but also because of the delay in reporting poll samples. Gallup, for instance, uses a 7-day rolling sample, which means that the median interview took place four days before the poll was released. So, polls can be accurate as of 2 to 4 days ago, but not accurate as of now.

It’s also difficult (nigh impossible if the group is small enough) to reliably measure polling movement among different subgroups in the electorate systematically over time. The smaller the group gets, the more it’s anyone’s guess as to what the real numbers are.

It was easy for folks to get excited about the fact that Romney recently moved ahead of Obama among Jewish voters in the IBD/TIPP tracking poll by 44-40 — but the subsample of Jewish voters surveyed was no more than 25. In the most recent version of this poll, Obama leads among the same group by 78-22. Maybe there was actual movement, but more likely it was just the tiny sample size.

Yes, groups often pay more to poll specific demographics, but only a few times per election cycle. We are nowhere close, for instance, to having a RealClearPolitics average of, say, married women in Ohio, a measure that would be relevant to how campaigns actually spend money. And unless something drastically changes with how traditional polling is done, we will never have this. Ever.

The reason is because providing a balanced, unbiased sample is expensive. But what if you didn’t have to balance the sample?

This is where mining relevant streams of Internet data can help.

Social Media is the World’s Biggest Data Platform

We think of social media as the world’s biggest conversational platform. But it’s no slouch in the data department either.

Facebook users generate around 684,478 pieces of content per minute. Twitter users tweet 200 million times daily. This doesn’t even count the countless petabytes (exabytes? yottabytes?) of user account data on millions of websites, tied to demographics.

The sum total of these interactions speak volumes about each of us as a person. Not every variable will be public about every person, but our tendency to interact socially online, the language we use, and what we post can speak volumes about our personalities, our values, and our political beliefs. And this is just from what we post publicly.

Pollsters and most journalists have shied away from analyzing this data for a few reasons. First, obviously, is privacy. Second, we still lack the processing power and analytical capability to usefully make sense of these large data sets. And third, the easy, topline queries are often misleading, reflecting certain skews in online phenomena, in the sites themselves, and in things that are fundamentally hard to control for, like media attention or virality. Noting that Obama leads by 3-to-1 on Facebook is not terribly interesting, because it could reflect his incumbent status, his global popularity, his 4-year-headstart, or his cult-like status in 2008.

Nonetheless, if you ask the right questions — you can get at certain answers faster and with more granular data than a traditional poll.

To Get the Data, Embrace the Skew

During the Vice Presidential debates, Xbox Live polled viewers live during the debate as to who thought they won. The answers may have been disheartening for the Romney-Ryan ticket: undecided voters on the platform thought Joe Biden won the debate by a 44 to 23 percent margin.

But the sample was skewed: Xbox viewers as a whole were voting for Obama over Romney by a 52-36 percent margin — while public polls are tied. As gamers, the Xbox voter is typically younger, and so even the undecided might be left-leaning. Data from our Trendsetter app, which measures the political affinities of page likers on Facebook, is consistent with these results, showing a roughly 60-40 pro-Obama Xbox skew.

Before we use this skew to summarily discard the results, consider this: each question got 30,000 responses, presumably tied to rich demographic information. This means that, within the Xbox community, you have large samples of hundreds of voters for one of dozens of different slices of the electorate.

These large sample sizes mean you can get an extremely granular view of opinion changing over time, especially when data is tied to real user accounts with demographic info. Even if we don’t re-weight the demographics from the Xbox poll back to the overall population, because of the sheer volume of data, there is intrinsic value in studying the data shifts and the patterns evidenced in the polls internals.

The overall skew doesn’t matter, because we aren’t interested in the toplines (the Presidential horserace number). Traditional polls do a good enough job of measuring those. What we’re interested in is measuring change among niche demographics and doing it in real time, without the 2-to-4 day delay. When it comes to measuring what happened in the last 24 hours, campaign polls give us no data or extremely rough data. Sheer volume means Internet data can do a better job of this, particularly if it can be confirmed across multiple data sets.

In the recent debates, I polled my Twitter audience as to who they thought won. Most polls received between 200 and 1,000 responses, measured as retweets. Some tried to poke fun at this, given that my Twitter followers appear to skew 10-to-1 towards Romney based on the results. But my goal wasn’t to suggest that Romney was winning public opinion by 10-to-1. It was to collect as much data as fast as possible, extracting insight where appropriate. Last night, I asked people to indicate whether they thought each candidate was winning by a little or a lot. The data could suggest that Obama voters were a bit more enthusiastic about their guy’s performance, even though there were fewer of them (irrelevant for the purposes of this analysis).



Asking the broader question of how well Twitter performed as a barometer during the debates, Twitter searches for “Romney winning” or “Obama winning” all accurately predicted the results of snap polling done after each debate. They showed Romney dominating the first debate from 20 minutes in, while a more muddled back-and-forth picture emerged from the remaining two debates — also consistent with the polls. After the conventions, we outlined the case for how Twitter reactions to major speakers forecasted the nightly movement in the polls, and found (with one or two exceptions) a clear correlation.

Twitter is full of biased and self-interested political actors, but it mirrors and reinforces the media narrative and thus public opinion. You can’t really measure undecided voters on Twitter, but you can tell which side’s partisans felt great, and which felt “Meh.” And you can quantify this in real time, getting ahead of the polls. Even with an unrepresentative sample, we’ve found it to be a good guide of broader opinion, but you have to drill down on specific search queries and eschew broad metrics like tweets-per-minute and treat sentiment analysis with caution. For instance, we found that use of a candidate’s name in conjunction with “awesome” could be a better indicator of positive reaction to a candidate than positive sentiment scores.

Towards the Hourly Tracking Poll

The 2012 elections won’t resolve the question of whether Big Data can predict election outcomes, but it holds great promise if we can embrace the heretical idea that balance isn’t the be-all, end-all, while we mine insights from the deep of Internet data.

Your next project can be fast, cheap, and good — pick two. Opinion data can be fast, balanced, and big — pick two. In looking at absurdly large data sets, and embracing the inherent skew represented by the bias in Xbox or Facebook users, asking the right questions, you can get at things no poll can — subtle changes in the samples and among specific demographics, measured day by day, or even hour by hour.

Why should this be important, beyond feeding the media-political beast with near real-time analytics?

The political world has embraced real-time data everywhere else — in everything from voter ID calls, to fundraising emails, to online advertising. Why wouldn’t public opinion research work the same way? People like giving their opinion. Is there a way to better harness these willing participants into actionable data?

After all three debates, the political discussion quickly descended into meme graphics about Big Bird, binders, and bayonets. This was fed in part by a data-driven feedback loop of hardcore partisans on social media — combined with a complete absence of data about how these attacks worked in real time with undecideds. Interviews conducted after the fact showed these attacks fell flat with those voters, yet the memes went on for days. Real-time polling might mean less Big Bird — and more messaging that’s actually relevant in Ohio.

Leave a comment
Introducing Trendsetter: Discover Who’s Influential and What They Care About Data Visualization, Products, , , ,

Understanding influence is a huge topic in social media. A number of players, like Klout and PeerIndex, have built hugely successful platforms around rewarding highly influential social media users.

These platforms are great at measuring celebrity. If you’re Lady Gaga, you have a Klout score of 92. If you’re Barack Obama, your score is 91. Beyond that, microcelebrities with large Twitter followings and a healthy degree of interaction on the platform will earn high Klout scores, but what we’re talking about is a relatively small sliver of the social media universe.

This left us wondering: what would a good influence score look like for the rest of us who aren’t Twitter celebrities? And specifically, what does it look like on Facebook, the world’s biggest social stage?

Today, we’re launching Trendsetter, a platform which lets you discover who’s influential and what they care about.

Connect with the app and you’ll get your Trendsetter score — and see where you stack up compared to your friends. Trendsetter measures interactions with pages on Facebook and generates an individualized Trendsetter score for you and your friends. A high Trendsetter score means you’re very likely to tell your friends about things on Facebook, have niche tastes, and tend to be early to the party when it comes to liking brands and content. A lower Trendsetter score means you’re quieter in interacting on Facebook and tend to have more mainstream tastes — but when you do share, it’s because it really matters.

For years, through measures like the Net Promoter Score, marketers have been trying to understand the voters and consumers most likely to share things. We have an inkling that just a cursory glance at someone’s social media profile can tell you more about people’s propensity to share, and Trendsetter aims to show you what moves them.

A Trendsetter report gives you a wealth of data about your network — who the biggest early adopters are among your friends, what Facebook pages these early adopters like, what types of things they’re interested in, and how they’re distributed throughout the country. Here’s what my Trendsetter report looks like:

I knew we were onto something when our algorithm ranked Jesse Thomas of the DC-based digital agency JESS3 as the #1 Trendsetter in my network. Jesse is the consummate early adopter, and this makes him the biggest Trendsetter amongst my friends.

Trendsetter is a joint project of Engage and the Winston Group, a strategic communications and polling firm. With the Winston Group, we’ll be developing quick, one-question surveys for Trendsetter users, and breaking down the answers in interesting ways based on user interests and social influence — a level of detail it would be very hard to get at in a traditional opinion survey.

Who are the biggest Trendsetters in your network? Let’s connect and find out!


One comment so far
Election 2012: A Choice Between Eminem and Johnny Cash Social, , , , ,

At Engage, we have long preached the gospel of social data as the new polling. A Facebook app called Wisdom is demonstrating what that actually means, building detailed demographic profiles on 3.8 million Facebook users and breaking down their page likes. This data is particularly interesting in light of the current Presidential election, where the rivaling tastes of supporters of President Barack Obama and the Republican candidates can tell you a lot about the nation’s cultural divide.

Earlier this evening on Twitter, I broke down the top 10 non-political page likes for supporters of each of the Presidential candidates as reported by Wisdom. To say the least, the results are revealing.

Barack Obama

1. Michael Jackson
2. Facebook
3. YouTube
4. Family Guy
5. Bob Marley
6. Lady Gaga
7. Starbucks
8. Eminem
9. Will Smith
10. Rihanna

Mitt Romney

1. Facebook
2. The Bible
3. Starbucks
4. The Beatles
5. Small Business Saturday
6. Walmart
7. Johnny Cash
8. House
9. Adam Sandler
10. YouTube

Ron Paul

1. The Beatles
2. Family Guy
3. Johnny Cash
4. Pink Floyd
5. South Park
6. Bob Marley
7. The Office
8. Facebook
9. The Daily Show
10. History

Jon Huntsman

1. The Daily Show
2. NPR
3. The Office
4. The Beatles
5. Facebook
6. Starbucks
7. The Colbert Report
8. Johnny Cash
9. Family Guy
10. The Onion

Rick Santorum

1. The Bible
2. Facebook
3. History
4. Jesus Daily
5. Starbucks
6. Dave Ramsey
7. Small Business Saturday
8. Chick-fil-A
9. Jon Voight
10. Capitalism

Newt Gingrich

1. The Bible
2. Facebook
3. History
4. Starbucks
5. The Beatles
6. Dave Ramsey
7. Small Business Saturday
8. Jesus Daily
9. Chick-fil-A
10. Jesus Christ

Rick Perry

1. The Bible
2. Starbucks
3. Facebook
4. Johnny Cash
5. Chick-fil-A
6. History
7. Jesus Daily
8. Dallas Cowboys
9. George Strait
10. Dave Ramsey

The Analysis

  • If the list for Barack Obama and the rest of the GOP field doesn’t scream “cultural divide!” I don’t know what does. Regardless of the outcome of the Republican primary, 2012 is shaping up as a choice between Eminem and Johnny Cash, or Lady Gaga and The Beatles.
  • Ron Paul supporters show nearly as much overlap with the cultural preferences of Obama supporters as they do with supporters of Mitt Romney and the other Republican candidates. Ron Paul is the candidate of Pink Floyd, which makes its only appearance on Ron Paul supporters’ top 10. Alone with Jon Huntsman fans, Ron Paul supporters also like watching The Office.
  • Fans of Rick Santorum, Newt Gingrich, and Rick Perry all have very similar tastes — with Jesus Daily, Chick-fil-A, the Bible, Dave Ramsey, History, and Facebook overlapping all three lists. These also appeared well down their lists, with conservative politicians, organizations, and media figures dominating likes among fans of these candidates. This probably speaks to the fact that, at least among Facebook users, these candidates remain popular primarily among political early adopters.
  • As the Republican candidate with the most Facebook likes, Romney supporters’ tastes are more likely to veer into the realm of popular culture, albeit an older version of it where Johnny Cash and The Beatles rule.
  • Huntsman supporter tastes spell eclectic. From fans of NPR to the Daily Show to The Economist, Huntsman appeals to a very unique brand of Republican relative to the field.
Leave a comment
Tracking the Most Talked About Candidates on Facebook Analytics, Social, , , , , ,

At Engage, we (heart) data. We think that Big Data harnessed from the social web can oftentimes better tell us what’s happening in real time than a traditional opinion poll. There’s no better test of this than the Republican presidential primary, which seems to yield a new frontrunner every week. Wouldn’t it be great to tell in real time, that day, who’s up and who’s down, based on the world’s biggest platforms for real-time conversation?

Last week, Facebook announced the release of its “People Talking About” metric. For any Facebook page or topic, Facebook will tell you how many people are talking about that topic across the entire site — and that’s regardless of whether these people like the given page. In your news feed, you may have seen stories aggregating your friends’ conversation around hot topics like Steve Jobs or the Occupy Wall Street protests.

The results for the Republican candidates for President are revealing, and we plan to track them in this public Google Spreadsheet every day through the primaries. We’ll periodically post our analysis of what the numbers mean to Twitter and Facebook.

Right now, Herman Cain far and away leads the field in Facebook buzz, with nearly 80,000 people talking about him daily. He’s followed by Mitt Romney, who’s also been on the rise — especially with his endorsement by Chris Christie.

A quick note: Any measure of Internet buzz — be it tweets, Facebook posts, or searches — will reward the most controversial and talked about public figures, and these aren’t always the highest vote getters. That’s probably why Cain, with his 9-9-9 plan and his recent surge in the polls, leads, and why Michele Bachmann and Ron Paul place strongly. We think that the best way to draw useful conclusions from this is to look at trends as well as the absolute numbers — and these trends will become more evident over time. If Cain were to fall below his previous performance while other candidates gained, that would be a sign of trouble. Former frontrunner Rick Perry languishing at around one seventh of Herman Cain’s current buzz is already not a great sign for his campaign.

And Facebook buzz surrounding the entire field of candidates seems to be slowly but surely picking up, but it has a ways to go before it rivals the guy they’re going after: Barack Obama, with a total of 443,882 Facebook users talking about him currently, combined with 199,034 for the Republican field in total.

View all 7 comments