Science· March 25, 2026

The Tyranny of the Asterisk: Why Effect Sizes Tell You What P-Values Cannot

A statistically significant result answers whether something happened. An effect size answers whether it matters. Researchers have known this distinction for decades. Coverage still ignores it.

By Dr. Maya Iyer, Staff Reporter · Science Desk

Every methods section in every introductory statistics course draws the same lesson: a p-value below 0.05 means the result is statistically significant. What that lesson often fails to add is the part that matters for anyone trying to understand whether a finding changes anything in the real world. A p-value tells you roughly how surprised you should be by your data if the null hypothesis were true. It does not tell you how large the effect is. It does not tell you whether the effect is clinically or practically meaningful. It tells you, with a single number shaped by sample size as much as by biology, that something probably did not happen by chance.

Effect size is the measurement that fills the gap. Cohen's d, odds ratios, relative risk, number needed to treat, partial eta-squared: these are the metrics that describe the magnitude of a relationship or a difference, independent of how many people were in the room when you measured it. A randomized trial enrolling twenty thousand participants can return a p-value of 0.0001 for an intervention that lowers systolic blood pressure by one millimeter of mercury. That result is statistically significant. It is also, by most clinical benchmarks, not large enough to reorganize a treatment protocol around.

The problem is not new, and it is not subtle. Jacob Cohen published his landmark work on statistical power in 1962, and his follow-up on effect size conventions in 1988. The American Statistical Association issued a statement in 2016 explicitly cautioning against binary significance thresholds as the primary decision criterion in research. A broad methodological reform movement has since pushed for confidence intervals, effect size reporting, and pre-registration of hypotheses. Progress in journals has been real but uneven. Progress in science journalism has been slower.

Why does this matter for readers of science coverage? Because the structure of a press release is almost always built around the p-value. A study "found a significant association." A drug "showed significant improvement." Significant, in these sentences, is doing borrowed work, carrying the connotation of important while technically meaning only that a threshold was crossed. The effect size is either absent from the coverage or buried in the fourth paragraph as a hedging clause. The sample size often appears only if it is impressively large.

Consider what a full accounting looks like instead. A cognitive intervention study with n around 400 might report a statistically significant improvement in a memory task with a Cohen's d of 0.15, a small effect by conventional standards. Reported as a significant finding, it sounds like a breakthrough. Reported with the effect size alongside the confidence interval, it sounds like a preliminary signal that needs replication at scale and a harder look at whether the memory task translates to anything patients or students actually experience.

Neither framing is dishonest. One is incomplete.

The downstream cost of this incompleteness is not trivial. Treatment decisions, policy allocations, and research funding priorities all flow partly from how findings are characterized in public-facing coverage. When small effects are routinely reported as significant discoveries without the accompanying magnitude information, the error compounds across cycles of coverage, replication failures, and retraction.

A practical habit for anyone reading a study or a story about one: find the effect size. If it is not reported in the coverage, find the paper and look for it in the results section. Ask whether the confidence interval around that estimate is wide enough to include effects too small to matter. Ask what the number needed to treat is, if treatment is what is being discussed. The p-value is not worthless. It is just not enough.

Reporting by Dr. Maya Iyer, Staff Reporter, for the Science desk · ETL Newswire staff