Confirmation Bias

Evidence 001 — The Wason Selection Task (1960)

Peter Wason Showed Us What We Actually Do

In 1960, Peter Wason designed an experiment that remains one of the most replicated and discussed in cognitive psychology. He presented participants with four cards. Each card had a number on one side and a letter on the other. The rule: "If a card has a vowel on one side, it has an even number on the other side." Which cards must you turn over to test whether this rule is true?

The answer almost everyone gives is wrong. People choose the vowel and the even number — confirming the rule. They don't choose the odd number, which is the card that could falsify it. There is no confirming evidence for a general rule; there is only the failure to falsify it. Every time someone tests a rule by looking for confirming evidence, they are doing the wrong Wason test.

This is not a failure of intelligence. Wason demonstrated that the bias operates regardless of education level or IQ. It is a feature of human cognition — the default mode is confirmation, not falsification. Knowing this doesn't automatically correct it. But it changes what kind of evidence you go looking for.

Wason, Peter C. "On the Failure to Eliminate Hypotheses in a Conceptual Task." Quarterly Journal of Experimental Psychology 12, no. 3 (1960): 129–140. Wason later called the selection task "the most revealing experiment I ever conducted." Fewer than 10% of participants gave the correct answer (E and 7) in the original study.

Evidence 002 — System 1 and the Confirmation Default

Kahneman: We Don't Search for Disconfirming Evidence

Daniel Kahneman's work on System 1 and System 2 thinking explains the mechanism underneath Wason's finding. System 1 — the fast, automatic, associative cognition that runs most of our mental activity — generates answers to questions we haven't actually asked. Before System 2 (slow, deliberate, analytical) can examine a belief, System 1 has already assessed it as true or false based on how easily it fits with prior beliefs.

Nickerson's 1998 review of the confirmation bias literature synthesized hundreds of studies and concluded that it is "perhaps the most widely accepted notion in the psychology of reasoning." The bias shows up not just in how we seek evidence but in how we evaluate it: studies with mixed evidence are rated as supporting our prior position, not as challenging it.

The Lord, Ross and Lepper (1979) capital punishment study is particularly telling: participants who held strong prior views on capital punishment were shown the same two studies — one supporting and one opposing their view. Both groups rated the study supporting their view as methodologically superior and the opposing study as flawed. The same study. Opposite evaluations. Prior belief was the only variable.

Kahneman, Daniel. Thinking, Fast and Slow. Farrar, Straus and Giroux, 2011. Nickerson, Raymond S. "Confirmation Bias: A Ubiquitous Phenomenon in Many Guises." Review of General Psychology 2, no. 2 (1998): 175–220. Lord, Charles G., Lee Ross, and Mark R. Lepper. JPSP 37, no. 11 (1979).

Evidence 003 — Evaluation Surveys as Circular Proof

The Kirkpatrick Level 1 Problem

Most L&D programs are evaluated primarily through post-session satisfaction surveys — Kirkpatrick Level 1 data. Participants are asked whether they found the session valuable, whether the facilitator was knowledgeable, whether they would recommend it. These surveys confirm what the program designer already believes about good training: that engaging, well-delivered content is valuable.

What they don't measure is whether anything changed. They don't test behavior, performance, business outcomes, or the specific claim the program was designed to make. And they don't gather disconfirming evidence: they don't ask "What about this program was useless?" or "What did you not apply?" in a way that actually catches data.

This is Wason's experiment running continuously across the L&D industry. We design surveys to find the vowel and the even number — the confirming evidence. We don't design them to find the odd number, the thing that would tell us the rule doesn't hold. And then we cite the survey scores as evidence that the program works.

Kirkpatrick, Donald, and James Kirkpatrick. Evaluating Training Programs: The Four Levels. 3rd ed. Berrett-Koehler, 2006. The Kirkpatrick authors themselves acknowledge Level 1 data is the least informative. The industry's over-reliance on it is a choice, not a constraint. See also Brinkerhoff, Robert O. The Success Case Method. Berrett-Koehler, 2003.

Evidence 004 — Gender & Generational Difference

Whose Experience Counts as "The Data"

Confirmation bias in L&D has a dimension that rarely gets named: whose experience is treated as the default data. When a program is designed, tested, and validated with a homogeneous group — and it almost always is — the confirming evidence comes from participants whose experience matches the designer's assumptions. The disconfirming evidence, the experience of participants whose learning style, communication style, or cultural context differs, tends to be filed as "outlier" or "special case."

Gender and generational difference operate as confirmation bias vectors. A facilitator with a particular theory of how people learn — that direct challenge is motivating, that emotional expression is secondary to cognitive clarity, that experience is the primary currency of credibility — will design programs that confirm that theory. Participants whose experience contradicts it will be under-served and may not even name what's happening. The survey scores will still be fine.

The brave spaces dimension here: it requires courage to actively seek the disconfirming evidence — to ask the participant who seemed disengaged what they actually experienced, to track outcomes for different demographic groups rather than only in aggregate, to ask whose experience the evaluation instrument was designed to capture.

Heath, Chip, and Dan Heath. Made to Stick. Random House, 2007, pp. 19–23. See also Argyris, Chris. "Double Loop Learning in Organizations." Harvard Business Review 55, no. 5 (1977): 115–125. The "undiscussables" in organizational culture are frequently the disconfirming evidence that everyone has noticed but nobody asks about.

Evidence 005 — The Practitioner's Blind Spots

How Training Design Confirms Our Theory of the Learner

Every training design begins with a theory of who the learner is. Usually this theory is implicit and unexamined: the learner is motivated to change, the learner's manager will support application, the learner's environment will not actively counteract the training, the learner processes information in roughly the way the designer processes information.

These assumptions are tested by the outcomes. But if the evaluation instruments are designed to confirm the design (see Evidence 003), the feedback loop is closed before it can generate disconfirming data. The designer continues to hold their theory of the learner. Future designs encode the same assumptions. The outcomes continue to disappoint. The explanation continues to locate the problem in the learner's motivation rather than the designer's assumptions.

The Feynman test applies here too: can you state your theory of the learner clearly enough that it could be falsified? If your theory is "adults learn best when they're challenged," what evidence would tell you this was wrong? If no evidence would tell you it was wrong, it's not a theory — it's a confirmation trap.

Argyris, Chris, and Donald Schön. Organizational Learning II. Addison-Wesley, 1996. The "ladder of inference" — Argyris's model for how we move from raw data to action through increasingly selective steps — is a map of confirmation bias in professional reasoning.

Evidence 006 — Falsification as Practice

Popper: A Theory That Can't Be Falsified Isn't a Theory

Karl Popper's criterion of falsifiability (1963) is not just a philosophy of science — it is a practical tool for any practitioner who wants to know whether what they believe is actually true. A claim is falsifiable if you can specify in advance what evidence would show it to be false. If no such evidence is possible, the claim is not scientific — it is an article of faith.

The application to L&D is direct: before designing a program, state the theory explicitly. "This program will change behavior X in context Y within Z months." Then design evaluation instruments that could catch failure, not just success. Require disconfirming evidence to be part of the evaluation design from the beginning, before the data is collected.

This is harder than it sounds. It requires accepting, in advance, that the program might not work. Most organizational cultures — and most practitioners' self-images — resist this. The confirmation bias is not just cognitive; it is structural. The organizations that fund training programs do not, in general, want to pay for evaluations that might show the program failed. The practitioner who insists on falsifiable evaluation designs is doing something professionally courageous.

Popper, Karl R. Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge, 1963. The introduction contains Popper's clearest statement of falsifiability as a demarcation criterion. It is accessible to any practitioner; it does not require a background in philosophy of science.

Wall Moment 011 — The Wason Test

Run Wason's original experiment on yourself. Then see how it maps to practitioner performance data.

The Rule: "If a card has a vowel on one side, it has an even number on the other side."

Which cards must you turn over to test whether the rule is true? Select all that apply, then check your answer.

Correct answer: E and 7.

E must be turned — if it has an odd number on the back, the rule is false.
7 must be turned — if it has a vowel on the back, the rule is false.
K doesn't matter — the rule says nothing about consonants.
4 doesn't matter — the rule says if vowel → even, not if even → vowel.

What most people do: Choose E and 4 — the confirming pair. They test whether the rule is true rather than testing whether it's false. Fewer than 10% of participants choose correctly in most replications.

This is not stupidity. This is the default mode of human cognition. You just ran it on yourself.

Now map it to your practice: When you review a training program's outcomes, do you look for evidence that it worked (the vowel and the 4) or evidence that it didn't work in specific cases (the 7)? Which participant groups, which transfer contexts, which post-training conditions are your "7"? Have you turned those cards over?

Evidence 008 — Field Work

Turning the Cards You Haven't Turned

The Odd-Number Audit: Take one program you're running. List the five pieces of evidence that confirm it works. Now list the five pieces of evidence that would show it doesn't work for some participants. Have you looked for those? If not — that's your Wason test for this week.

The Theory Test: Write down your theory of how the learners in your current program learn best. Make it specific enough to falsify. What evidence would show you're wrong? Go looking for that evidence.

The Demographic Cut: Disaggregate your outcome data by gender, generation, or any other demographic variable you have access to. If the aggregate looks fine and the cut reveals a pattern — that's disconfirming evidence. What does it tell you?

Sources: Wason (1960) · Kahneman (2011) · Nickerson (1998) · Lord, Ross & Lepper (1979) · Popper (1963) · Argyris & Schön (1996) · Heath & Heath (2007) · Kirkpatrick & Kirkpatrick (2006) · Brinkerhoff (2003)

§ · The Delight

The Man Who Kept Getting the Weather Wrong — On Purpose

Meliorism2.com · Daily briefings for practitioners