LearningLibrary

Critical Thinking·Probabilistic Reasoning

How Base-Rate Neglect Misleads

A doctor tells you that a screening test for a rare disease is 95% accurate. You test positive. How worried should you be? Most people answer somewhere around 95% — and most people are wrong, often by an order of magnitude. The reason is base-rate neglect: the tendency to focus on the diagnostic information in front of us while ignoring the underlying frequency of the thing being diagnosed.

Suppose the disease affects one person in a thousand. Imagine testing ten thousand people. Ten of them actually have the disease, and the test will correctly flag, say, all ten. But the test also has a 5% false-positive rate, which means it will wrongly flag about 500 of the 9,990 healthy people. So out of roughly 510 positive results, only 10 reflect real cases. Your probability of actually being sick, given a positive test, is closer to 2% than to 95%. The test's accuracy did not change. What changed is that we let the base rate — the prior probability of having the disease — do the work it was always supposed to do.

This pattern, first sharpened by Daniel Kahneman and Amos Tversky in the 1970s, shows up far beyond medicine. In their famous "Tom W." experiments, subjects were given a personality sketch describing a shy, orderly student with a passion for detail, and asked which graduate program he was most likely enrolled in. People overwhelmingly guessed library science or computer science, ignoring the fact that those programs enroll a small fraction of graduate students. The vivid description crowded out the prior. A representative-sounding story felt more informative than a dull statistical fact, even when the statistical fact was doing most of the predictive work.

The deeper trouble is that base rates are often invisible by design. A description is concrete; a prior probability is an abstraction you have to go look up or estimate. When a security analyst profiles a "suspicious traveler," when a hiring manager reads a résumé that "fits the type," when a journalist describes a defendant who "matches the pattern" of past offenders — in each case, the description is doing visible work and the base rate is doing none, because no one has bothered to ask how common the underlying category actually is. The result is systematic over-confidence in judgments that feel sharp and turn out to be noise.

It is worth being precise about what base-rate neglect is not. It is not the claim that descriptions are useless; sometimes they carry strong diagnostic weight. The Bayesian formula multiplies the prior by the likelihood ratio, and a sufficiently lopsided likelihood ratio can swamp even a tiny prior. The error is not weighting evidence at all — it is weighting evidence as if there were no prior, as if the question "how common is this in the first place?" had no answer worth incorporating. A 95% accurate test for a one-in-a-million disease still produces mostly false positives. A 51% accurate test for something that happens half the time barely tells you anything. The accuracy number alone is not a probability about you.

The corrective is a habit, not a formula. Before accepting a confident judgment built on a vivid case, ask two questions. First: what is the rate of this thing in the relevant population? Second: how much would my estimate change if I had only that base rate and no description at all? If the description is moving your estimate dramatically away from the base rate, the description had better be carrying real diagnostic weight — not just narrative coherence, not just the feeling that it fits. Coherence is not evidence. Vividness is not evidence. The world contains many shy, orderly people, most of whom are not librarians, and many positive test results, most of which, for rare conditions, are wrong.

Vocabulary

base-rate neglect
The cognitive tendency to ignore the underlying frequency of a category when judging the probability that a specific case belongs to it, focusing instead on descriptive or diagnostic details.
false-positive rate
The proportion of cases without a condition that a test nonetheless flags as having it; a key driver of how many positive results turn out to be wrong.
prior probability
The probability assigned to a hypothesis before new evidence is considered; in screening contexts, it corresponds to how common the condition is in the population being tested.
likelihood ratio
A measure of how much more probable a piece of evidence is under one hypothesis than under another; combined with the prior, it determines the updated probability.
representative-sounding
Describing a case that matches the stereotype or typical features of a category, which can feel diagnostic even when the category itself is rare.

Check your understanding

Question 1 of 5recall

In the passage's worked example, roughly what is the probability that a person who tests positive actually has the disease?

Closing question

Think of a recent judgment you made about a person or situation based on how well they 'fit a pattern.' What was the base rate you were implicitly ignoring, and how would knowing it have changed your confidence?

More in critical thinking