Annu. Rev. Psychol. 1999. 50:537–67

Copyright ã 1999 by Annual Reviews. All rights reserved


Jon A. Krosnick
Department of Psychology, Ohio State University, Columbus, Ohio 43210;
e-mail: [email protected]

KEY WORDS: surveys, interviewing, polls, questionnaires, pretesting


For the first time in decades, conventional wisdom about survey methodol-
ogy is being challenged on many fronts. The insights gained can not only
help psychologists do their research better but also provide useful insights
into the basics of social interaction and cognition. This chapter reviews some
of the many recent advances in the literature, including the following: New
findings challenge a long-standing prejudice against studies with low re-
sponse rates; innovative techniques for pretesting questionnaires offer op-
portunities for improving measurement validity; surprising effects of the
verbal labels put on rating scale points have been identified, suggesting opti-
mal approaches to scale labeling; respondents interpret questions on the ba-
sis of the norms of everyday conversation, so violations of those conventions
introduce error; some measurement error thought to have been attributable to
social desirability response bias now appears to be due to other factors in-
stead, thus encouraging different approaches to fixing such problems; and a
new theory of satisficing in questionnaire responding offers parsimonious
explanations for a range of response patterns long recognized by psycholo-
gists and survey researchers but previously not well understood.

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538

SAMPLING AND RESPONSE RATES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539

PRETESTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541


QUESTIONNAIRE DESIGN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
Open versus Closed Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
Labeling of Rating-Scale Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
Role of Conversational Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545



Social Desirability Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Optimizing versus Satisficing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546

SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559

CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560


These are exciting times for survey research. The literature is bursting with

new insights that demand dramatic revisions in the conventional wisdom that

has guided this research method for decades. Such dramatic revisions are noth-

ing new for survey researchers, who are quite experienced with being startled

by an unexpected turn of events that required changing their standard practice.

Perhaps the best known such instance involved surveys predicting US election

outcomes, which had done reasonably well at the start of the twentieth century

(Robinson 1932). But in 1948 the polls predicted a Dewey victory in the race

for the American presidency, whereas Truman actually won easily (Mosteller

et al 1949). At fault were the nonsystematic methods used to generate samples

of respondents, so we learned that representative sampling methods are essen-

tial to permit confident generalization of results.
Such sampling methods soon came into widespread use, and survey re-

searchers settled into a “standard practice” that has stood relatively unchal-
lenged until recently (for lengthy discussions of the method, see Babbie 1990;
Lavrakas 1993; Weisberg et al 1996). This standard practice included not only
the notion that systematic, representative sampling methods must be used, but
also that high response rates must be obtained and statistical weighting proce-
dures must be imposed to maximize representativeness. Furthermore, al-
though face-to-face interviewing was thought to be the optimal method, the
practicalities of telephone interviewing made it the dominant mode since the
mid-1980s. Self-administered mail surveys were clearly undesirable, because
they typically obtained low response rates. And although a few general rules
guided questionnaire design (e.g. Parten 1950), most researchers viewed it as
more of an art than a science. There is no best way to design a question, said
proponents of this view; although different phrasings or formats might yield
different results, all are equally informative in providing insights into the
minds of respondents.

Today, this conventional wisdom is facing challenges from many direc-
tions. We have a refreshing opportunity to rethink how best to implement sur-
veys and enhance the value of research findings generated using this method.
This movement has three valuable implications for psychology. First, re-
searchers who use the survey method to study psychological phenomena stand
to benefit, because they can enhance the validity of their substantive results by
using new methodologies, informed by recent lessons learned. Second, these


insights provide opportunities to reconsider past studies, possibly leading to
recognize that some apparent findings were illusions. Third, many recent les-
sons provide insights into the workings of the human mind and the unfolding
of social interaction. Thus, these insights contribute directly to the building of
basic psychological theory.

Because recent insights are so voluminous, this chapter can describe only a
few, leaving many important ones to be described in future Annual Review of
Psychology chapters. One significant innovation has been the incorporation of
experiments within surveys, thus permitting strong causal inference with data
from representative samples. Readers may learn about this development from
a chapter in the Annual Review of Sociology (Sniderman & Grob 1996). The
other revelations, insights, and innovations discussed here are interesting
because they involve the overturning of long-standing ideas or the resolution
of mysteries that have stumped researchers for decades. They involve sam-
pling and response rates, questionnaire pretesting, interviewing, and question-
naire design.


One hallmark of survey research is a concern with representative sampling.
Scholars have, for many years, explored various methods for generating sam-
ples representative of populations, and the family of techniques referred to as
probability sampling methods do so quite well (e.g. Henry 1990, Kish 1965).
Many notable inaccuracies of survey findings were attributable to the failure
to employ such techniques (e.g. Laumann et al 1994, Mosteller et al 1949).
Consequently, the survey research community believes that representative
sampling is essential to permit generalization from a sample to a population.

Survey researchers have also believed that, for a sample to be representa-
tive, the survey’s response rate must be high. However, most telephone sur-
veys have difficulty achieving response rates higher than 60%, and most face-
to-face surveys have difficulty achieving response rates higher than 70%
(Brehm 1993). Response rates for most major American national surveys have
been falling during the last four decades (Brehm 1993, Steeh 1981), so surveys
often stop short of the goal of a perfect response rate.

In even the best academic surveys, there are significant biases in the demo-

graphic and attitudinal composition of samples obtained. Brehm (1993)

showed that, in the two leading, academic national public-opinion surveys (the

National Election Studies and the General Social Surveys), certain demo-

graphic groups have been routinely represented in misleading numbers.

Young and old adults, males, and people with the highest income levels are

underrepresented, whereas people with the lowest education levels are over-

represented. Likewise, Smith (1983) found that people who do not participate


in surveys are likely to live in big cities and work long hours. And Cialdini et al

(unpublished manuscript) found that people who agreed to be interviewed

were likely to believe it is their social responsibility to participate in surveys,

to believe that they could influence government and the world around them,

and to be happy with their lives. They were also unlikely to have been con-

tacted frequently to participate in surveys, to feel resentful about being asked a

personal question by a stranger, and to feel that the next survey in which they

will be asked to participate will be a disguised sales pitch. According to con-

ventional wisdom, the higher the response rate, the less these and other sorts of

biases should be manifest in the obtained data.
In the extreme, a sample will be nearly perfectly representative of a popula-

tion if a probability sampling method is used and if the response rate is 100%.

But it is not necessarily true that representativeness increases monotonically

with increasing response rate. Remarkably, recent research has shown that sur-

veys with very low response rates can be more accurate than surveys with

much higher response rates. For example, Visser et al (1996) compared the ac-

curacy of self-administered mail surveys and telephone surveys forecasting

the outcomes of Ohio statewide elections over a 15-year period. Although the

mail surveys had response rates of about 20% and the telephone surveys had

response rates of about 60%, the mail surveys predicted election outcomes

much more accurately (average error = 1.6%) than did the telephone surveys

(average error = 5.2%). The mail surveys also documented voter demographic

characteristics more accurately. Therefore, having a low response rate does not

necessarily mean that a survey suffers from a large amount of nonresponse

Greenwald et al (AG Greenwald, unpublished manuscript) suggested one

possible explanation for this finding. They conducted telephone surveys of

general public samples just before elections and later checked official records

to determine whether each respondent voted. The more difficult it was to con-

tact a person to be interviewed, the less likely he or she was to have voted.

Therefore, the more researchers work at boosting the response rate, the less

representative the sample becomes. Thus, telephone surveys would forecast

election outcomes more accurately by accepting lower response rates, rather

than aggressively pursuing high response rates.
Studies of phenomena other than voting have shown that achieving higher

response rates or correcting for sample composition bias do not necessarily

translate into more accurate results. In an extensive set of analyses, Brehm

(1993) found that statistically correcting for demographic biases in sample

composition had little impact on the substantive implications of correlational

analyses. Furthermore, the substantive conclusions of a study have often re-

mained unaltered by an improved response rate (e.g. Pew Research Center

1998, Traugott et al 1987). When substantive findings did change, no evidence


allowed researchers to assess whether findings were more accurate with the
higher response rate or the lower one (e.g. Traugott et al 1987). In light of Vis-
ser et al’s (1996) evidence, we should not presume the latter findings were less
valid than the former.

Clearly, the prevailing wisdom that high response rates are necessary for
sample representativeness is being challenged. It is important to recognize the
inherent limitations of nonprobability sampling methods and to draw conclu-
sions about populations or differences between populations tentatively when
nonprobability sampling methods are used. But when probability sampling
methods are used, it is no longer sensible to presume that lower response rates
necessarily signal lower representativeness.


Questionnaire pretesting identifies questions that respondents have difficulty
understanding or interpret differently than the researcher intended. Until re-
cently, conventional pretesting procedures were relatively simplistic. Inter-
viewers conducted a small number of interviews (usually 15–25), then dis-
cussed their experiences in a debriefing session (e.g. Bischoping 1989, Nelson
1985). They described problems they encountered (e.g. identifying questions
requiring further explanation or wording that was confusing or difficult to
read) and their impressions of the respondents’ experiences in answering the
questions. Researchers also looked for questions that many people declined to
answer, which might suggest the questions were badly written. Researchers
then modified the survey instrument to increase the likelihood that the mean-
ing of each item was clear and that the interviews proceeded smoothly.

Conventional pretesting clearly has limitations. What constitutes a “prob-
lem” in the survey interview is often defined rather loosely, so there is poten-
tial for considerable variance across interviewers in terms of what is reported
during debriefing sessions. Debriefings are relatively unstructured, which
might further contribute to variance in interviewers’ reports. And most im-
portant, researchers want to know about what went on in respondents’
minds when answering questions, and interviewers are not well positioned to
characterize such processes.

Recent years have seen a surge of interest in alternative pretesting methods,
one of which is behavior coding (Cannell et al 1981, Fowler & Cannell 1996),
in which an observer monitors pretest interviews (either live or taped) and
notes events that occur during interactions between interviewers and respon-
dents that constitute deviations from the script (e.g. the interviewer misreads
the questionnaire, or the respondent asks for more information or provides an
unclear or incomplete initial response). Questions that elicit frequent devia-
tions are presumed to require modification.


Another new method is cognitive pretesting, which involves asking respon-
dents to “think aloud” while answering questions, verbalizing whatever comes
to mind as they formulate responses (e.g. Bickart & Felcher 1996, DeMaio &
Rothgeb 1996, Forsyth & Lessler 1991). This procedure is designed to assess
the cognitive processes by which respondents answer questions, thus provid-
ing insight into the way each item is comprehended and the strategies used to
devise answers. Respondent confusion and misunderstandings can readily be
identified in this way.

These three pretesting methods focus on different aspects of the survey data
collection process and differ in terms of the kinds of problems they detect, as
well as in the reliability with which they detect these problems. Presser & Blair
(1994) demonstrated that behavior coding is quite consistent in detecting ap-
parent respondent difficulties and interviewer problems. Conventional pre-
testing also detects both sorts of problems, but less reliably. In fact, the correla-
tion between the apparent problems diagnosed in independent conventional
pretesting trials of the same questionnaire can be remarkably low. Cognitive
interviews also tend to exhibit low reliability across trials and to detect respon-
dent difficulties almost exclusively. But low reliability might reflect the capac-
ity of a particular method to continue to reveal additional, equally valid prob-
lems across pretesting iterations, a point that future research must address.


One prevailing principle of the survey method is that the same questionnaire
should be administered identically to all respondents (e.g. Fowler & Mangione
1990). If questions are worded or delivered differently to different people, then
researchers cannot be certain about whether differences between the answers
are due to real differences between the respondents or are due to the differen-
tial measurement techniques employed. Since the beginning of survey re-
search this century, interviewers have been expected to read questions exactly
as researchers wrote them, identically for all respondents. If respondents ex-
pressed uncertainty and asked for help, interviewers avoided interference by
saying something like “it means whatever it means to you.”

Some critics have charged that this approach compromises data quality in-

stead of enhancing it (Briggs 1986, Mishler 1986, Suchman & Jordan 1990,

1992). In particular, they have argued that the meanings of many questions are

inherently ambiguous and are negotiated in everyday conversation through

back-and-forth exchanges between questioners and answerers. To prohibit

such exchanges is to straight-jacket them, preventing precisely what is needed

to maximize response validity. Schober & Conrad (1997) recently reported the

first convincing data on this point, demonstrating that when interviewers were


free to clarify the meanings of questions and response choices, the validity of

reports increased substantially.
This finding has important implications for technological innovations in

questionnaire administration. Whereas survey questionnaires were tradition-

ally printed on paper, most large-scale survey organizations have been using

computer-assisted telephone interviewing (CATI) for the last decade. Interview-

ers read questions displayed on a computer screen; responses are entered im-

mediately into the computer; and the computer determines the sequence of

questions to be asked. This system can reduce some types of interviewer error

and permits researchers to vary the specific questions each participant is asked

on the basis of previous responses.
All this has taken another step forward recently: Interviewers conducting

surveys in people’s homes are equipped with laptop computers (for computer-

assisted personal interviewing, or CAPI), and the entire data collection process

is regulated by computer programs. In audio computer-assisted self-admin-

istered interviewing (audio CASAI), a computer reads questions aloud to re-

spondents who listen on headphones and type their answers on computer

keyboards. Thus, computers have replaced interviewers. Although these in-

novations have clear advantages for improving the quality and efficiency of

questionnaire administration, this last shift may be problematic in light of

Schober & Conrad’s (1997) evidence that conversational interviewing can sig-

nificantly improve data quality. Perhaps technological innovation has gone

one step too far, because without a live interviewer, conversational question-

ing is impossible.


Open versus Closed Questions

During the 1940s, a major dispute erupted between two survey research divi-
sions of the US Bureau of Intelligence, the Division of Polls and the Division
of Program Surveys. The former was firmly committed to asking closed-ended
questions, which required people to choose among a set of provided response
alternatives. The latter believed in the use of open-ended questions, which re-
spondents answered in their own words (see Converse 1987). Paul Lazarsfeld
mediated the dispute and concluded that the quality of data collected by each
method seemed equivalent, so the greater cost of administering open-ended
questions did not seem worthwhile (see Converse 1987). Over time, closed-
ended questions have become increasingly popular, whereas open-ended ques-
tions have been asked less frequently (Smith 1987).

Recent research has shown that there are distinct disadvantages to closed-

ended questions, and open-ended questions are not as problematic as they


seemed. For example, respondents tend to confine their answers to the choices

offered, even if the researcher does not wish them to do so (Bishop et al 1988,

Presser 1990). That is, people generally ignore the opportunity to volunteer a

response and simply select among those listed, even if the best answer is not in-

cluded. Therefore, a closed-ended question can only be used effectively if its

answer choices are comprehensive, and this is difficult to assure.
Some people feared that open-ended questions would not work well for re-

spondents who are not especially articulate, because they might have difficulty

explaining their feelings. However, this seems not to be a problem (Geer

1988). Some people feared that respondents would be likely to answer open-

ended questions by mentioning the most salient possible responses, not those

that are truly most appropriate. But this, too, turns out not to be the case (Schu-

man et al 1986). Finally, a number of recently rediscovered studies found that

the reliability and validity of open-ended questions exceeded that of closed-

ended questions (e.g. Hurd 1932, Remmers et al 1923). Thus, open-ended

questions seem to be more viable research tools than had seemed to be the case.

Labeling of Rating-Scale Points

Questionnaires have routinely offered rating scales with only the endpoints
labeled with words and the points in between either represented graphically or
labeled with numbers and not words. However, reliability and validity can be
significantly improved if all points on the scale are labeled with words, be-
cause they clarify the meanings of the scale points (Krosnick & Berent 1993,
Peters & McCormick 1966). Respondents report being more satisfied when
more rating-scale points are verbally labeled (e.g. Dickinson & Zellinger
1980), and validity is maximized when the verbal labels have meanings that
divide the continuum into approximately equal-sized perceived units (e.g.
Klockars & Yamagishi 1988). On some rating dimensions, respondents pre-
sume that a “normal” or “typical” person falls in the middle of the scale, and
some people are biased toward placing themselves near that point, regardless
of the labels used to define it (Schwarz et al 1985).

Another recent surprise is that the numbers used by researchers to label

rating-scale points can have unanticipated effects. Although such numbers are

usually selected arbitrarily (e.g. an 11-point scale is labeled from 0 to 10,

rather than from -5 to +5), respondents sometimes presume that these numbers

were selected to communicate intended meanings of the scale points (e.g. a

unipolar rating for the 0 to 10 scale and a bipolar rating for the -5 to +5 scale;

Schwarz et al 1991). Consequently, a change in the numbering scheme can

produce a systematic shift in responses. This suggests either that rating-scale

points should be labeled only with words or that numbers should reinforce the

meanings of the words, rather than communicate conflicting meanings.


Conversational Conventions

Survey researchers have come to recognize that respondents infer the mean-

ings of questions and response choices partly from norms and expectations

concerning how everyday conversations are normally conducted (Schwarz

1996). Speakers conform to a set of conventions regarding what to say, how to

say it, and what not to say; these conventions make conversation efficient by

allowing speakers to convey unspoken ideas underlying their utterances (e.g.

Clark 1996, Grice 1975). Furthermore, listeners presume that speakers are

conforming to these norms when interpreting utterances. Respondents bring

these same conventions to bear when they interpret survey questions, as well

as when they formulate answers (see Schwarz 1996).
Krosnick et al (1990) showed that the order in which information is pro-

vided in the stem of a question is sometimes viewed as providing information
about the importance or value the researcher attaches to each piece of informa-
tion. Specifically, respondents presume that researchers provide less important
“background” information first and then present more significant “fore-
ground” information later. Consequently, respondents place more weight on
more recently presented information because they wish to conform to the re-
searcher’s beliefs. From these studies and various others (see Schwarz 1996),
we now know that we must guard against the possibility of unwittingly com-
municating information to respondents by violating conversational conven-
tions, thus biasing answers.

Social Desirability Bias

One well-known phenomenon in survey research is overreporting of admira-

ble attitudes and behaviors and underreporting those that are not socially re-

spected. For example, the percentage of survey respondents who say they

voted in the last election is usually greater than the percentage of the popula-

tion that actually voted (Clausen 1968, Granberg & Holmberg 1991, Traugott

& Katosh 1979). Furthermore, claims by significant numbers of people that

they voted are not corroborated by official records. These patterns have been

interpreted as evidence that respondents intentionally reported voting when

they did not, because voting is more admirable than not doing so.
In fact, these two empirical patterns are not fully attributable to intentional

misrepresentation. The first of the discrepancies is partly due to inappropriate

calculations of population turnout rates, and the second discrepancy is partly

caused by errors in assessments of the official records (Clausen 1968, Presser

et al 1990). The first discrepancy also occurs partly because people who refuse

to be interviewed for surveys are disproportionately unlikely to vote (Green-

wald et al, unpublished manuscript) and pre-election interviews increase inter-

est in politics and elicit commitments to vote, which become self-fulfilling


prophecies (Greenwald et al 1987, Yalch 1976). But even after controlling for

all these factors, some people still claim to have voted when they did not.
Surprisingly, recent research suggests that the widely believed explanation

for this fact may be wrong. Attempts to make people comfortable admitting

that they did not vote have been unsuccessful in reducing overreporting (e.g.

Abelson et al 1992, Presser 1990). People who typically overreport also have

the characteristics of habitual voters and indeed have histories of voting in the

past, even though not in the most recent election (Abelson et al 1992, Sigelman

1982, Silver et al 1986). And the accuracy of turnout reports decreases as time

passes between an election and a postelection interview, suggesting that the

inaccuracy occurs because memory traces of the behavior or lack thereof fade

(Abelson et al 1992).
Most recently, Belli et al (unpublished manuscript) significantly reduced

overreporting by explicitly alerting respondents to potential memory confu-

sion and encouraging them to think carefully to avoid such confusion. These

instructions had increasingly beneficial effects on report accuracy as more

time passed between election day and an interview. This suggests that what

researchers have assumed is intentional misrepresentation by respondents may

be at least partly attributable instead to accidental mistakes in recall. This en-

courages us to pause before presuming that measurement error is due to inten-

tional misrepresentation, even when it is easy to imagine why respondents

might intentionally lie. More generally, social desirability bias in question-

naire measurement may be less prevalent than has been assumed.

Optimizing versus Satisficing

Another area of innovation involves new insights into the cognitive processes
by which respondents generate answers. These insights have been publicized
in a series of recent publications (e.g. Krosnick & Fabrigar 1998, Sudman et al
1996, Tourangeau et al 1998), and some of them have provided parsimonious
explanations for long-standing puzzles in the questionnaire design literature.
The next section reviews developments in one segment of this literature, fo-
cusing on the distinction between optimizing and satisficing.

OPTIMIZING There is wide agreement about the cognitive processes in-
volved when respondents answer questions optimally (e.g. Cannell et al 1981,
Schwarz & Strack 1985, Tourangeau & Rasinski 1988). First, respondents
must interpret the question and deduce its intent. Next, they must search their
memories for relevant information and then integrate that information into a
single judgment (if more than one consideration is recalled). Finally, they must
translate the judgment into a response by selecting one of the alternatives of-


Each of these four steps can be quite complex, involving a great deal of cog-

nitive work (e.g. Krosnick & Fabrigar 1998). For example, question interpreta-

tion can be decomposed into four cognitive steps, guided by a complex and ex-

tensive set of rules (e.g. Clark & Clark 1977). First, respondents bring the

sounds of the words into their “working memories.” Second, they break the

words down into groups, each one representing a single concept, …

Place your order
(550 words)

Approximate price: $22

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
The price is based on these factors:
Academic level
Number of pages
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more
error: Content is protected !!
Open chat
You can contact our live agent via WhatsApp! Via + 1 929 473-0077

Feel free to ask questions, clarifications, or discounts available when placing an order.

Order your essay today and save 30% with the discount code GURUH