Why Can’t We Get Voting Systems Right? – User-Centered Design, Inc.

Warning to the reader: This is a long newsletter, but I think the topic is timely and deserves the attention.

Image of a ballot being cast, with American flag in the background

It seems that every presidential election since 2000 has brought about discussions of issues with voting systems. There is talk that we, as a country, are losing faith in our election process. Some efforts have been made to address problems, but these efforts have often been misdirected. For example, it’s generally undisputed that the issue in the 2000 election in Florida was a ballot design flaw, but the problems that were subsequently addressed were focused in large part on the mechanisms of voting systems, specifically the elimination of punch card hardware. Yes, punch cards were a solvable issue that came out of the recount effort and the ballot design issue was not forgotten. But ballot design was not the primary focus of the subsequent efforts to fix voting systems in the United States. This is unfortunate, since ballot design is also a solvable issue.

After the most recent election, controversy has risen once again about the nature, quality, and validity of our national voting system. Whether there’s a legitimate concern of software tampering or software error may be of some dispute, but I fear once again that other issues of a political or personal nature may prevent people from addressing interaction problems that we do know exist in our national voting systems.

My firm spent several years working with the National Institute of Standards and Technology (NIST) under the Help America Vote Act to try to develop a usability test standard for voting systems. The goal of this effort was to develop a means where a specific voting system implementation could be tested to see if it performed at an acceptable level in terms of its usability. The idea was that all voting systems should meet a minimum acceptable standard in terms of the ability of the voter to cast the votes they intend. Once this minimum level was established, it would be possible to raise the bar every year or every election cycle to improve this aspect of voting systems. I have always equated this project to the development and improvement of emission standards for cars.

The design of this evaluation was straightforward, though its implementation required a great deal of care and precision. A standard ballot was developed that was medium in complexity based on the number of races and referenda included. Single-member (“winner-take-all”) and multi-member races (vote for as many people as there are offices to fill) were included. Vendors were provided with the ballot specification and asked to implement it themselves on their system. Unlike a real election where randomness in names across ballots is a requirement, vendors were instructed to ensure that all ballots were identical in candidate name order and presentation order of the races and referenda. Beyond that, they implemented the ballot as they would normally do using their own system instructions, feedback, and user interface.

A population of 100 participants was used in each test of a voting system. This may seem to be a small number, but this was chosen both for statistical reasons and because it was assumed that the cost of performing a test at this level was acceptable. Participants were defined in terms of percentages across multiple characteristics including age, gender, educational background, socioeconomic status, and voting experience.

Participants were provided with a list of names to vote for in each race as well as how to vote for each of the referenda. Standard variations in normal voting behaviors were included, such as skipping some races or referenda and undervoting in multi-member races (voting for fewer than the total possible). Participants were not provided with instructions to purposely make mistakes while voting and no instructions were provided on the use of the machines. In other words, this was a true performance-based test where the participants were provided with the intended outcome and were free to proceed to reach that outcome naturally, even if that meant making and recovering from mistakes.

The primary outcome measure of this study was creating a perfect ballot — a ballot matching exactly the intended outcome in terms of votes for all names and referenda specified. Secondary outcome measures included time on task and number of errors. Subjective assessments of the user experience (the participants’ own perception of their success on task) were also collected. The secondary outcome measures of time on task and subjective assessment of performance were not very important in my mind (though required by others on the project) since they showed little or no correlation with performance, as is often the case.

Our first goal was to determine if these procedures and this size population generated data that was valid and realizable and could tell systems apart. Since all participants were performing identical tasks, with the only variation being the specific system they used, the independent variable in this study was the voting system itself. The dependent variables were the outcome measures. Two voting systems were tested in eight rounds of testing — one electronic voting system and one paper-based voting system.

The results of our preliminary research showed a consistent level of voting error (within a single standard error) across seven of eight rounds of testing on each of the two machine types tested. Three types of errors were identified: errors of omission (where a participant failed to cast a vote expected); errors of commission (where a participant cast a vote other than the one expected); and systematic errors (consistent procedural errors across voting tasks resulting in any type of error). Though I’m not at liberty to disclose the exact value of the error rate, suffice it to say that it was not zero and might surprise some people if they did know its actual value.

More importantly for this discussion, testing showed a statistically significant difference (a < .05) between participant performance when using the two systems. This means that the voting system itself affected the probability that a system will accurately capture the voter’s intent during an election, and different machines have a different effect on how likely voters are to cast correct votes. Let’s look at a few examples of what we found.

When some people are presented with an electronic voting system, they consider how the software might be processing their input and use that information before determining how to vote. Even people with strong technical backgrounds are not immune from their behavior being affected in this way (indeed, those backgrounds could potentially cause such behavior). For example, in the multi-member race on an electronic voting system, the system instruction to “vote for four” was perceived by some voters as an imperative. Despite the stated goal provided to participants to enter three of four votes, the system instruction caused them to enter an extra vote, fearing that the system might not process any of their votes if they did not follow the instruction. This systematic error caused errors of commission when using an electronic voting system.

By contrast, people using a paper ballot appear to consider that a human will be the likely one reviewing their ballot. The electronic system allows the user to make changes to a single vote if an error in voting is detected, but paper ballots do not support this. If the voter makes an error on the ballot on paper, they are supposed to obtain a new ballot to make the correction. Instead, a different systematic error showed up in our research. Several participants crossed out an erroneous vote and entered the correct vote next to it. A human could interpret this as a correction, but the software does not. The software sees this as an improperly voted race or referendum and discards the votes. This systematic error caused errors of omission when using a paper-based voting system.

For polling locations that include a scanner, errors like the ones described above can be detected. However, it is possible to force the acceptance of a ballot with an error on it if the voter does not want to make the correction. This can happen even if the voter realizes that the ballot has an error, since some people don’t want to go though the voting process again. We saw this behavior in our research. In a real election, it is not clear that poll workers follow a consistent procedure when handling a rejected ballot. It is possible that some poll workers themselves may not realize the cause of the rejection and force the acceptance of the ballot without giving the voter the option of correcting their ballot.

Though it was not part of our research scenario, the difference between polling locations can also have an even more significant effect on voting results when using paper. If the polling location does not include a scanner on site, voter errors are not detected in time to even afford an opportunity for correction. Those ballots are all forced through the scanner with no corrections made, making prosperous polling locations more likely to correct errors than poorer locations that can’t afford a local scanner.

Errors are even worse in regions that accept straight party ticket voting options. Straight party ticket voting allows a person to indicate at the beginning of the ballot they would like to vote all candidates of a particular party. Voters are then only expected to complete sections of the ballot that do not have party affiliations, such as referenda. Participants were not specifically told to use or not use this system option. In our research, consistent errors were found in the use of straight party ticket voting on both systems. And again, there were different effects of the errors depending on the type of voting system used.

On the electronic voting system, when the voter selects the straight party ticket option, the rest of the ballot is pre-filled out for them. Many of the participants reported that the system had someone else’s votes in it when they started their voting process, demonstrating that they did not realize the effect of selecting the straight party ticket option on that electronic machine. In post-test interviews, these participants stated they assumed they were indicating their affiliation within the political system but not casting votes. Some of the participants made changes across their ballot except where they were told not to vote. In that case, the votes initiated by the straight party ticket option were carried through to the results, leading to more errors of commission on electronic systems.

On paper ballots, the selection of the straight party ticket option cannot be represented across the rest of the ballot. The scanning software we used in our research saw a ballot containing both a straight party ticket choice and alternate choices as a spoiled ballot for all races where the straight party ticket option applies. In this case, instead of a few errors of commission, the scanning software saw conflicting votes across many races so all votes in this race were discarded, resulting in multiple errors of omission. Participants were far less likely to understand the error associated with the use of straight party ticket options when using the paper-based system. Again, differences in the polling location and the presence of a scanner would determine if this more extensive error is even detected.

People are human and make mistakes. This is unavoidable. However, it’s obviously the goal that there would be no errors in capturing a voter’s intent based on the interface design of the system itself. And it’s clear that is not the case today. Steps can be taken to try to address this, but it’s unclear that the steps are even being attempted. In the meantime, we continue to have elections across the country with these systems. There is no strong research evidence yet, but it’s probably safe to assume that there are differences in voter behavior and error rates across each vendor’s electronic voting systems. This threat to the integrity of our election process should not be ignored.

There are certainly advantages to electronic voting systems, such as error detection and correction, language selection, and accessibility, but they are also susceptible to coding error and potential hacking. The concept of the voter verifiable paper audit trail (VVPAT) was discussed years ago as a means for voters being assured that their votes are cast correctly, and to allow for recounts (since electronic records don’t support recounts). Multiple implementations of a VVPAT were proposed. Some solutions allowed for an audit trail but were not voter verifiable. Other solutions were voter verifiable but were never integrated into the voting process and therefore never used by voters for verification. Some solutions provided verification before the ballot was submitted. Some solutions provided verification after the ballot was submitted.

It seems we may need to address concerns over possible security issues before we can get to anything else. With that in mind, here’s a potential new voting system design, for anyone who wants to listen:

Voters interact with an electronic system to capture their votes. This would retain all the benefits of instantaneous feedback, error correction, and other factors associated with electronic systems.
Once their votes have been collected, voters are required to print out a completed ballot, which they can review to ensure that the votes they intended have been accurately captured. This is a much easier process than reviewing on paper and provides a level of comfort to the voter that their votes are being correctly recorded at least somewhere. The print out incudes a unique ballot identifier.
If they see an error on the ballot, voters destroy this version, make their correction, print a new ballot, and re-verify. (Yes, we can ensure the prior attempt is destroyed before allowing another printout.) Once satisfied, the voter ends the ballot creation process. Their attempt is stored on the machine they used to create the ballot along with the unique identifier.
Voters carry this ballot to a scanner and scan it in as usual for paper ballots. The votes are recorded by the scanning software, including the unique identifier.
We now have two software databases that can be compared to each other to ensure that there’s been no monkey business with one of these records of votes.
We also have a paper ballot that can be used for recounts. But more importantly, a random selection of these paper ballots can be part of a mandatory validation process against either of these electronic records to ensure no one has figured out how to modify both of the electronic records.

This is not a perfect system. For example, some states don’t have polling locations because they allow mail-in or drop-off voting of paper ballots for a few weeks before election day. Visually impaired users need to be considered in the verification process. Physically impaired users would have to be considered in the scanning process. But this system should significantly reduce design-induced errors and security issues, and thus increase voter confidence in the process. And maybe, if we have a system like that in place, we can start to address ballot design and interaction design issues that we do know exist.