Authentic Assessment Toolbox
created by Jon Mueller

What is Authentic Assessment? Why Do It? How Do You Do It?

 

Standards

Tasks

Rubrics

portfolios

Examples

Glossary

 

 

 

Home > Tests


Constructing Good Items

Why Focus on Multiple-choice Items?

The focus of this assessment guide is on the construction of tests using selected-response items. (See Tasks to read about the differences between selected-response tests and other types of assessments.) One type of selected-response item, the True-False question, provides a greater risk of guessing (50%) and, thus, does not typically discriminate among those who know the material and those who do not as effectively as multiple-choice items. Thus, the construction of T/F items will not be addressed in this chapter. Similarly, I will not address fill-in-the-blank items because they are less common, and because they are extremely difficult to construct so that only one possible answer could complete the blank. Instead, the following section will primarily address the construction of the most common selected-response item, the multiple-choice question.

Terminology for Multiple-choice Items

Before discussing the construction of such items, let's review the terminology commonly used to describe the parts of multiple-choice questions. The diagram below labels the specific components of a multiple-choice item.

 

Stem: A question or statement followed by a number of choices or alternatives that answer or complete the question or statement

Alternatives: All the possible choices or responses to the stem

Distractors (foils): Incorrect alternatives

Correct answer: The correct alternative!

 

Guidelines for Constructing Good Items: Eliminate Rival Explanations

In the previous section on what the test should assess, I identified the first step in test construction: reviewing the standards to be addressed. The items on the test must effectively capture a representative sample of the concepts and skills laid out in the standards to generate valid inferences from student performance. So, make sure the items that you construct align with your standards.

Validity will also be affected by how closely the selection of a correct answer on a test reflects mastery of the material contained in the standards. If a student selects the correct answer to a multiple-choice question, you want to be able to conclude with some confidence that the student understood the concept. However, there are a myriad of other reasons (rival explanations) the student might choose the correct alternative. For example, she might have closed her eyes and picked an answer at random. She might have been able to rule out the distractors because they were implausible or because other clues pointed her to the right answer without requiring her to understand the concept. In these cases the student selected the correct answer without understanding the concept. You want to be able to eliminate these rival explanations so that you can discriminate students who understand the concept from those who do not understand it.

Obviously, you cannot eliminate the first rival explanation mentioned - guessing. However, most other rival explanations can be eliminated or reduced with careful construction of the test items. What follows are some strategies to eliminate as many rival explanations as possible. The guidelines can be understood as either

reducing cognitive load or

reducing the chance of guessing correctly.

 

Reducing Cognitive Load

Cognitive load theory (and other related theories) recommends avoiding elements of instruction or assessment that will overload students' capacity to consciously process the immediate task on which they are working. A test is a task that requires considerable conscious attention. So, it is important to remove any elements of a test item that might distract or unnecessarily increase the cognitive load a student encounters. Cognitive load theory (e.g., Sweller, 1988; 1994) emphasizes the importance of the processes and limitations of working memory, the level of memory that is consciously processing information involved in immediate tasks. A considerable amount of research has found that much of our information processing occurs outside of our conscious awareness. That seems necessary because the conscious resources we are able to employ to attend to or make sense of information are quite limited. Thus, it does not take much to distract or interfere with our ability to consciously process information and, thus, overload our working memory.


Below are some strategies to reduce the cognitive load of your test items.

1. Keep the stem simple, only including relevant information.

Example:

Change
To

[Stem]: The purchase of the Louisiana Territory, completed in 1803 and considered one of Thomas Jefferson's greatest accomplishments as president, primarily grew out of our need for

a. the port of New Orleans*
b. helping Haitians against Napoleon
c. the friendship of Great Britain
d. control over the Indians

[Stem]: The purchase of the Louisiana Territory primarily grew out of our need for

 

a. the port of New Orleans*
b. helping Haitians against Napoleon
c. the friendship of Great Britain
d. control over the Indians


*an asterisk indicates the correct answer.


Any additional information that is irrelevant to the question, such as the phrase "completed in 1803…," can distract or confuse the student, thus providing an alternative explanation for why the item was missed. Keep it simple.

 


2. Keep the alternatives simple by adding any common words to the stem rather than including them in each alternative.

Example:

Change
To

When your body adapts to your exercise load,

a. you should decrease the load slightly.
b. you should increase the load slightly.*
c. you should change the kind of exercise you are doing.
d. you should stop exercising.

When your body adapts to your exercise load, you should

a. decrease the load slightly.
b. increase the load slightly.*
c. change the kind of exercise you are doing.
d. stop exercising.

 


Instead of repeating the phrase "you should" at the beginning each alternative add that phrase to the end of the stem. The less reading the student has to do the less chance there is for confusion.

 


3. Put alternatives in a logical order.

Example:

Change
To
According to the 1991 census, approximately what percent of the United States population is of Spanish or Hispanic descent?
a. 25%
b. 39%
c. 2%
d. 9%*

a. 2%
b. 9%*
c. 25%
d. 39%


The more mental effort (or cognitive load) that students have to use to make sense of an item the more likely a comprehension error can occur that would provide another rival explanation. By placing the alternatives in a logical order the reader can focus on the content of the question rather than having to reorder the items mentally. Although such reordering might require a limited amount of cognitive load, such load is finite, and it does not take much additional processing to reach the point where concentration is negatively impacted. Thus, this guideline is consistently recommended (Haladyna, Downing, & Rodriguez, 2002).

 


4. Limit the use of negatives (e.g., NOT, EXCEPT).

Example:

Change
To

Which of the following is NOT true of the Constitution?

a. The Constitution sets limits on how a government can operate
b. The Constitution is open to different interpretations
c. The Constitution has not been amended in 50 years*

Which of the following is true of the Constitution?

a. The Constitution has not been amended in 50 years
b. The Constitution sets limits on how a government can operate*
c. The Constitution permits only one possible interpretation

 


Once again, trying to determine which answer is NOT consistent with the stem requires more cognitive load from the students and promotes the likelihood of more confusion. If that additional load or confusion is unnecessary it should be avoided (Haladyna, Downing, & Rodriguez, 2002).

If you are going to use NOT or EXCEPT, the word should be highlighted in some manner so that students recognize a negative is being used.

 

5. Include the same number of alternatives for each item.

The more consistent and predictable a test is the less cognitive load that is required by the student to process it. Consequently, the student can focus on the questions themselves without distractions. Additionally, if students must transpose their answers onto a score sheet of some kind, there is less likelihood of error in the transposition if the number of alternatives for each item is always the same.

 


Reducing the Chance of Guessing Correctly

It is easy to inadvertently include clues in your test items that point to the correct answer, help rule out incorrect alternatives or narrow the choices. Any such clue would decrease your ability to distinguish students who know the material from those who do not, thus, providing rival explanations.

Below are some common clues students use to increase their chance of guessing and some advice on how to avoid such clues. (I bet you remember using some of these yourself!)


6. Keep the grammar consistent between stem and alternatives.

Example:

Change
To
What is the dietary substance that is often associated with heart disease when found in high levels in the blood?
a. glucose
b. cholesterol*
c. beta carotene
d. proteins

a. glucose
b. cholesterol*
c. beta carotene
d. protein


The distractor "proteins" is inconsistent with the stem; the stem is asking for a singular substance while "proteins" is plural. It can be easy for the test writer to miss such inconsistencies. As a result, students may more easily guess the correct answer without understanding the concept - a rival explanation.

 

7. Avoid including an alternative that is significantly longer than the rest.

Example:

Change
To
What is the best reason for listing information sources in your research assignment?
a. It is required
b. It is unfair and illegal to use someone's ideas without giving proper credit*
c. To get a better grade
d. To make it longer

a. It is required by most teachers
b. It is unfair and illegal to use someone's ideas without giving proper credit*
c. To get a better grade on the project
d. So the reader knows from where you got your information

Students often recognize that a significantly longer, more complex alternative is commonly the correct answer. Even if the longer alternative is not the correct answer, some students who might otherwise answer the question correctly could be misled by this common clue and select the wrong answer. So, to be safe and avoid a rival explanation, keep the alternatives similar in length.


8. Make all distractors plausible.

Example:

Change
To

Lincoln was assassinated by

a. Lee Harvey Oswald
b. John Wilkes Booth*
c. Oswald Garrison Villard
d. Ozzie Osbourne

Lincoln was assassinated by

a. Lee Harvey Oswald
b. John Wilkes Booth*
c. Oswald Garrison Villard
d. Louis Guiteau

 


If students can easily discount one or more distractors (obviously Ozzie Osbourne does not belong) then the chance of guessing is increased, reducing the discriminability of that item. There is some limited evidence that including humor on a test can have certain benefits such as reducing the anxiety of the test-takers (Berk, 2000; McMorris, Boothroyd, & Pietrangelo, 1997). But humor can be included in a manner that does not reduce the discriminability of the item. For example, the nature of the question in the stem may be humorous but still addresses the material in a meaningful way.

Another example of implausible distractors:

Change
To

In a study of the effect of diet on risk of diabetes, the researcher can manipulate a number of variables including the amount of food, carbohydrates, proteins or fats consumed. During the experiment the amount of food, protein and fat subjects consumed remained the same. Only the amount of carbohydrates consumed changed. What was the independent variable in this study?

a. amount of food consumed
b. amount of carbohydrates consumed*
c. amount of protein consumed
d. amount of fat consumed

In a study of the effect of diet on risk of diabetes, the researcher measured how likely the subjects were to get diabetes and how severe their symptoms were if they developed the disease. To prevent amount of exercise from influencing the results, the researcher held it constant in the two groups he was studying. What was the independent variable in this study?

a. likelihood of developing diabetes
b. severity of symptoms of diabetes
c. diet*
d. amount of exercise

 


In the first example, amount of food, protein and fat are treated identically in this study, so it is not plausible that one of them is correct while the others are incorrect. The only plausible answer is the correct one -- amount of carbohydrates consumed -- because it is the only alternative that differs in any significant way.


Some other suggestions (from Worthen, White, Fan & Sudweeks, 1999, p. 221) for creating good distractors includes

  • Base distractors on the most frequent errors made by students in homework assignments or class discussions related to that concept.
  • Use words in the distractors that are associated with words in the stem (for example, explorer-exploration).
  • Use concepts from the instructional material that have similar vocabulary or were used in the same context as the correct answer.
  • Use distractors that are similar in content or form to the correct answer (for example, if the correct answer is the name of a place, have all distractors be places instead of using names of people and other facts).


9. Avoid giving too many clues in your alternatives.

Example:

Change
To
"Yellow Journalism" is associated with what two publishers?
a. Adolph Ochs and Martha Graham
b. William Randolph Hearst and Joseph Pulitzer*
c. Col. Robert McCormick and Marshall Field III
d. Michael Royko and Walter Cronkite

a. Adolph Ochs and Martha Graham
b. William Randolph Hearst and Joseph Pulitzer*
c. Joseph Pulitzer and Adolph Ochs
d. Martha Graham and William Randolph Hearst


Since both of the publishers in choice "b" are associated with yellow journalism and none of the other people mentioned is, the student only has to know of one such publisher to identify that "b" is the correct answer. That makes the item easier than if just one name is listed for each alternative. To make the question more challenging, at least some of the distractors could mention one of the correct publishers but not the other as in the second example (e.g., in distractor "c" Pulitzer is correct but Ochs is not). As a result, the student must recognize both publishers associated with yellow journalism to be certain of the correct answer.

 


10. Do not test students on material that is already well-learned prior to your instruction.

Example:

Excessive salt intake is linked to

a. cancer
b. diabetes
c. food allergies
d. high blood pressure*


There has likely been enough attention given to the relationship between excessive salt intake and high blood pressure in the media and in previous curriculum that most high school students are already familiar with this relationship. Thus, your students could answer this question without learning anything in your class.

Of course, it is not usually obvious what knowledge students possess prior to your instruction. So, it may be helpful in certain courses to give a brief pre-test at the beginning of the course to determine the level of the students' background knowledge. That information will assist you in designing your instruction and your assessments.


11. Limit the use of "all of the above" or "none of the above."

It is sometimes easier for students to narrow the number of possible alternatives on such questions without fully understanding the concepts tested. For example, when all of the above is an alternative, all a student needs to do is recognize that one of the other alternatives is not true to also be able to rule out "all of the above." Thus, an item with four possible alternatives has now been reduced to just two, increasing the chances of guessing correctly.

Similarly, if a student recognizes that two of the four alternatives are true, the student knows that the answer is all of the above without having to know whether the remaining alternative is true or not. Such guessing requires some knowledge of the material, but not as extensive understanding as if they had to consider all four of the alternatives.

Additionally, all of the above and none of the above have been misused as alternatives on some tests because students have learned that all of the above or none of the above is almost always the right answer when it is used on those tests. So, if you use all of the above or none of the above, do not always make it the right or wrong answer. Generally, research has found more problems with the use of "all of the above" than with "none of the above," but the common recommendation for both is to limit their use (Haladyna, Downing, & Rodriguez, 2002).


12. Limit the use of always, never or similar terms.

Even if students have not yet learned that the world is black and white, they have learned that alternatives on tests that include terms such as always or never are almost always a wrong answer. Thus, students are able to eliminate an alternative without understanding the material.


13. If item alternatives include multiple terms or series of concepts, avoid over-representing or under-representing certain terms or concepts.

Example:

Change
To
Which of the following groupings contains only days of the week?
a. mercredi, jeudi, chapeau, juillet
b. manger, mardi, mercredi, homme
c. dimanche, mercredi, jeudi, lundi*
d. lundi, samedi, maison, janvier

a. mercredi, jeudi, chapeau, juillet
b. manger, mardi, juillet, homme
c. dimanche, mercredi, jeudi, lundi*
d. lundi, manger, dimanche, chapeau

Because mercredi appears in three of the four alternatives in the first example and terms such as maison only appear in one of the alternatives, students will often correctly conclude that mercredi should be included in the correct answer. Thus, students might eliminate d. as an alternative and increase the likelihood of guessing correctly.

The solution is to evenly distribute the different terms as much as possible, as in the second example above.


14. Avoid direct quotations from a text in an item.

Students can certainly memorize phrases or sentences without comprehending them. So, if you use wording in an item that too closely resembles the wording in the text, it is possible that students can answer a question correctly without understanding it. More commonly, students may recognize certain language or terms that they saw in a text and select the alternative that includes that language without comprehending the concepts. The obvious solution is to paraphrase the main ideas you are testing.


15. Avoid alternatives that are opposites if one of the two must be true.

Example:

Change
To
When your body adapts to your exercise load, you should
a. decrease the load slightly
b. increase the load slightly*
c. change the kind of exercise you are doing
d. stop exercising

a. decrease the load slightly
b. increase the load slightly*
c. decrease the load significantly
d. increase the load significantly


When students see alternatives that are opposites of each other ("a" and "b" above), they often correctly assume that one of the two is true. So, students often eliminate the other choices ("c" and "d"), increasing their chances of guessing correctly. That does not mean you have to avoid opposites as possible alternatives. Rather, avoid opposites for which one of the two must be true. To avoid the appearance that one of the two must be true, you can use two sets of opposites as in the second example above.


16. Include three or four alternatives for multiple-choice items.

Obviously, if you only have two alternatives then the chance for guessing increases significantly as there will be a 50% chance of getting the item correct just by guessing. If you include five or more alternatives the item becomes increasingly confusing or requires too much processing or cognitive load. Additionally, as the number of distractors increases, the likelihood of including a bad distractor significantly increases. Thus, research finds that providing three or four alternatives leads to the greatest ability to distinguish between those test-takers who understand the material and those who do not (Haladyna, Downing, & Rodriguez, 2002; Taylor, 2005).


17. Distribute correct answers fairly evenly among the "letters."

In other words, if students find a pattern in which answers are the correct ones (e.g., "c" is usually the right answer or "d" is never the right answer) then they can increase their chances of correctly guessing, providing another rival explanation.


18. Avoid "giveaway" items.

If you include items on the test that are intentionally so easy that virtually everyone will answer them correctly, then you have reduced the discriminability of the test. Was the purpose to be amusing? Find another way to do so. Yes, one giveaway question on a 50-item test will not make that much difference, but when you consider all the different little things mentioned above that could affect the test's discriminability it is best to avoid all of them. Moreover, you have missed one more opportunity to assess learning.

 

19. Avoid providing clues for one item in the wording of another item on the test.

Example:

One item on a test might be

The electronic online catalog includes

a. books, videos, reference materials*
b. magazine articles and compact discs
c. newspaper clippings
d. only books

A later question on the same test asks

Using the online catalog, which search term would you use to find a book by a specific writer?

a. title keyword
b. subject
c. author*
d. call number


After students see that online catalogs include books in the latter question, they can return to the first question and rule out any alternatives that do not include books. It is relatively easy to miss such clues when constructing a test since we construct many tests item by item. Thus, it is imperative to review the entire test to check for clues.


20. WORTH REPEATING: Make sure your items actually measure what they are intended to measure.

 

Summary list of guidelines

To summarize:

Reducing cognitive load

1. Keep the stem simple, only including relevant information.

2. Keep the alternatives simple by adding any common words to the stem rather than including them in each alternative.

3. Put alternatives in a logical order.

4. Limit the use of negatives (e.g., NOT, EXCEPT).

5. Include the same number of alternatives for each item.

 

Reducing the chance of guessing correctly

6. Keep the grammar consistent between stem and alternatives.

7. Avoid including an alternative that is significantly longer than the rest.

8. Make all distractors plausible.

9. Avoid giving too many clues in your alternatives.

10. Do not test students on material that is already well-learned prior to your instruction.

11. Limit the use of "all of the above" or "none of the above."

12. Limit the use of always, never or similar terms.

13. If item alternatives include multiple terms or series of concepts, avoid over-representing or under-representing certain terms or concepts.

14. Avoid direct quotations from a text in an item.

15. Avoid alternatives that are opposites if one of the two must be true.

16. Include three or four alternatives for multiple-choice items.

17. Distribute correct answers fairly evenly among the "letters."

18. Avoid "giveaway" items.

19. Avoid providing clues for one item in the wording of another item on the test.

20. WORTH REPEATING: Make sure your items actually measure what they are intended to measure.

 

Note: Some of the above examples are courtesy of Lockport Township High School, Lockport, Illinois.

 

 

 


 
Home | What is it? | Why do it? | How do you do it? | Standards | Tasks | Rubrics| Examples | Glossary

Copyright 2014, Jon Mueller. Professor of Psychology, North Central College, Naperville, IL. Comments, questions or suggestions about this website should be sent to the author, Jon Mueller, at jfmueller@noctrl.edu.