Strategy

WhyMostA/BTestsAreWorthless(AndHowtoFixYours)

Statistical significance isn't the problem. The problem is testing things that don't matter, then declaring victory when noise looks like signal.

Lightdrop Team

February 9, 2026

4 min read

Here's a dirty secret: most A/B tests that "win" are actually noise.

That green "95% confident" badge in your testing tool? It's often lying to you. Not because the math is wrong—the math is fine. Because the setup is broken.

Let's fix it.

The Statistical Significance Trap

A test is "statistically significant at 95%" when there's only a 5% chance the observed difference happened by random chance.

Sounds solid. Here's the problem:

If you run 20 tests, on average one will show significance purely by chance. That's not a flaw—that's what 5% means.

Now add:

Peeking at results early (resets the significance counter)

Stopping when you see what you want (selection bias)

Testing tiny effect sizes (noise overwhelms signal)

Running tests on low traffic (takes forever to converge)

Most A/B testing programs are noise generators dressed up as science.

What Actually Invalidates Tests

Problem 1: Sample Size Too Small

The math: To detect a 10% relative conversion lift with 95% confidence and 80% power, you need roughly 3,800 conversions per variant.

The reality: Most tests run with a few hundred conversions, then declare winners based on whatever direction the noise happened to point.

The fix: Calculate required sample size before you start. If your traffic won't support it, don't run the test—or accept that you're making a directional bet, not a scientific conclusion.

Problem 2: Peeking

The problem: Every time you check results, you're running a new statistical test. Check daily for a month? You've run 30 tests, and your actual significance rate is nowhere near 95%.

The fix: Set your test duration in advance based on required sample size. Check once at the end. If you must peek, use sequential testing methods designed for it.

Problem 3: Testing Trivia

The problem: "We tested blue button vs green button and saw a 3% lift!"

Great. You moved a metric that doesn't matter by an amount that's probably noise, and now your team will spend six months testing button colors instead of value propositions.

The fix: Test things that could plausibly move the needle by 20%+. Headlines, offers, pricing, page structure. Not button colors.

Problem 4: Ignoring Context

The problem: You ran a test in December and applied the results year-round. December traffic doesn't behave like June traffic.

The fix: Re-test winners in different contexts. What works during a sale might not work at full price.

What's Worth Testing

A useful heuristic: if the test result won't change your strategy, don't run the test.

Worth testing:

Radically different value propositions

Different offers (free trial vs demo vs quote)

Pricing and packaging

Fundamentally different page structures

Adding vs removing major elements

Not worth testing:

Minor copy tweaks ("Submit" vs "Submit Now")

Color variations

Stock photo swaps

Element position by 10 pixels

Anything you'll implement regardless of results

The Alternative: High-Velocity Directional Testing

Here's what actually works for most teams:

Accept lower confidence thresholds (85-90%) in exchange for faster learnings

Test big swings where even noisy signals point you in useful directions

Run multiple variants to cover more ground faster

Use winning tests to generate hypotheses for more tests, not to declare permanent truths

Re-test winners before scaling major decisions

This is less "pure" than textbook A/B testing. It's also far more useful in practice.

When Rigorous Testing Matters

There are contexts where statistical rigor is non-negotiable:

Pricing changes: Getting this wrong is expensive

Core funnel changes: Breaking something critical is catastrophic

Results you'll publicize: Don't embarrass yourself with noise

For these, do the math. Calculate sample size. Run properly. Wait.

The Meta-Lesson

A/B testing isn't a religion. It's a tool. Tools serve purposes.

The purpose of testing is better decisions. If your testing process is generating false confidence in trivial changes, it's not serving that purpose.

Test less. Test bigger. Be honest about confidence levels. Ship faster.

The goal isn't statistical significance. The goal is better products and better marketing. Keep your eye on the actual goal.

#ab-testing#experimentation#statistics#conversion

Let's Work Together

Ready to accelerate
your growth?

Let's discuss how Lightdrop can help you build your growth machine and dominate your market.

Get in Touch