Monday, October 4, 2010

The Pollsters Dilema

--or--

What's up with SurveyUSA's 18-34 polling and other observations

This post is the result of a back and forth I had with Aaron Klemz at The Cucking Stool in the comments of his post about the age gap in the most recent SurveyUSA poll of the Minnesota Governor's race. This was my initial comment:

This has been a trend for SurveyUSA all year and not just in Minnesota. I find these numbers hard to believe and suspect they have more to do with a flaw SurveyUSA's methodology when it comes to young voters. Perhaps it's an issue with cell phones perhaps it's the fact that they are an automated pollster, who knows.

The more I've thought about this the more I think it's an issue deserving of discussion outside of a comment thread.


To give you an idea what I'm talking about let's quickly run through some recent results, these are the preferences of the 18-34 age group in three races:

Image Hosted by ImageShack.us

These polls were done at different points during the year, but in all three races both polls listed were conducted within a few weeks of each other at most. And while this is far from an exhaustive study, it does highlight what many poll watchers have noticed this cycle, SurveyUSA has shown some weird 18-34 numbers. It's not the case in all of their polls, but in enough of them that it raises a red flag.

From David Jarman (Crisitunity) writing in Salon:

Strangely, though, these polls show an oddity that has been consistent in many SurveyUSA polls: They find that the Republican candidates perform the strongest among the 18-34 set, despite exit polls consistently showing that that's the Democrats' strongest age bracket.

Jarman provides a few possible explanations for these strange results:

This might mean that pollsters are better able to reach young voters but that their lack of enthusiasm means they don't fit the pollsters' screen for likely voters. Or it could mean there's been a sea change in the last year in the way young voters perceive the political parties. Neither of these possibilities is good for Democrats.

The other alternative, though, is that SurveyUSA -- which doesn't provide cross tabs that would tell a reader whether the age composition of the cellphone-only sample is different from the land line voters -- is finding a cultural difference between generations: Younger voters may still be likely to vote Democratic, but younger voters may also be more likely to not pick up, or to hang up, when they find out it’s a pollster.

As we saw looking at the polls above, other pollsters aren't showing this same 18-34 swing to the GOP that SUSA is, which leads us to Jarman's third possibility, young people don't want to talk to pollsters and they have the tools to screen them. If this is the case though, why aren't other automated pollsters, like PPP for instance, also showing this anomaly?

There are two possible reason's for this, bad luck or bad methodology. It could certainly just be some random sample variations that popped up around the same time and will just as quickly go away. To make a baseball analogy, a hitter can be in what looks like a slump but is really just bad luck, they're hitting the ball hard, but they're hitting it to the wrong places. This could be what's happening to SUSA, they are doing everything right, but keep getting screwy 18-34 samples anyway. If this is the case, if it's bad luck, it will eventually go away.

The other possibility is that something is wrong with their methodology, what that flaw would be however I couldn't say. It's possible that other pollsters like PPP are finding similar results but are weighting their sample more than SUSA. Like Jarman suggested, it could be related to how they weight for cell phone only people in their sample, it could be some combination of all of the above or it could be something else entirely.

The Brodnitz memo went even further, noting that the newest SurveyUSA figures suggest that seven percent of likely voters have already cast ballots, while the Virginia Board of Elections reports that only 329 absentee ballots have been returned and only 277 early vote ballots have been cast.

“If SurveyUSA’s numbers were actually correct, it would show less than 1% of the electorate has already voted,” Brodnitz writes in the memo.

That's from a Politico story about the complaints that Tom Perriello's campaign was making about SUSA's polling of the race in Virginia's 5th congressional district. While everyone else has the race within a couple points both of SUSA's polls showed mid-twenty point margins. Is this an issue related to their weird 18-34 numbers or just some unrelated anomaly, possibly something about Virginia's 5th CD that they aren't considering.

The rise of the cell phone only voter is a big problem for automated pollsters that will only become more difficult as time goes on. If pollsters are having difficulty reaching younger voters, then the amount of those individuals in the sample will be less than is representative of the population. The way a pollster corrects for an issue with sample composition is by weighting, but the more weighting you do, the more uncertainty creeps into the results. As the uncertainty increases you start to get weird results and SUSA's 18-34 numbers this cycle have certainly been what one would consider weird results.

And speaking of weird results, if you can sometimes question the 18-34 demographic numbers in SUSA polls, you can almost always question the partisan breakdown of MPR/Humphrey polls. The following chart is the partisan breakdowns of all the polls of the Minnesota Governors race done this year except Rasmussen (I'm not a Ras subscriber so can't get at their crosstabs otherwise I'd have included them):

Image Hosted by ImageShack.us

Can you tell which of these is not like the others? After each set of polls is the average for each partisan id type and the standard deviations, which is basically the amount of spread between the numbers. At the end of each standard deviation row is the average standard deviation. So not only do Humphrey's numbers look completely different then the other two pollsters, they also bounce around a lot more from poll to poll.

This is most likely because Humphrey doesn't weight their polls for partisan id, according to their stated methodology:

As is common with public opinion surveys, the data were weighted. In the first stage, the data were weighted based on the number of potential survey respondents and the number of landline telephone numbers in the household. In the second stage, data were weighted according to cell phone usage, as well as gender, age, race, and Hispanic ethnicity to approximate the demographic characteristics of the population according to the Census.

There can reasonably be an argument made either way, to weight or not weight by partisan id, but the track record of the MPR/Humphrey poll, one of the worst in Nate Silver's ratings, certainly causes one to question their side of that argument.

All of this uncertainty is the main reason to use an average of polls as opposed to looking at any one poll in particular, indeed, polling averages are uncannily accurate.

No comments:

Post a Comment