Why A/B Testing Is a Scalpel — And Most of Marketing Isn’t Surgery

Table of contents

Where A/B Testing Actually Works
Most B2B Companies Don’t Get Enough Conversions for A/B Testing
Most B2B Companies Don’t Have the Skills for A/B Testing
Experimentation ≠ CRO ≠ A/B Testing
Expanding Your Experimentation Toolkit
Start Building an Experimentation Culture
Where Does Growth Method Fit In

I was recently speaking with a prospective customer who was exploring Growth Method for their team. During the conversation, a question came up that I’ve heard a few times before: Why use Growth Method instead of an A/B testing tool?

At the time, I didn’t give the clearest answer — so I’m using this article to lay out a better, more thoughtful explanation for anyone else wondering the same thing.

The short version? It’s not an either/or decision. Growth Method and A/B testing tools are complementary, not substitutes. But most teams get the sequence wrong.

You should always start by building a strong culture of experimentation: clear hypotheses, rapid learning loops, data-driven analysis and team-wide visibility. For most B2B companies, A/B testing should be one of the last tools in your toolbox — not the first.

Where A/B Testing Actually Works

With A/B testing you have a control (A) and a test (B). You split the population in two, give some portion of them the control (A) and the remainder a version with just one variable changed, the test (B). With some basic statistics, we can be certain that any movement in the numbers is significant and causal.

If you’re new to A/B testing, this is a fantastic interview on Lenny’s Podcast:

A/B testing is the tech world equivalent of a Randomised Controlled Trial (or RCT). It’s the gold standard in science and medicine for establishing causality. In the US, the FDA requires a randomised controlled trial before approving drugs for sale.

But you’re a business, not a laboratory. You can’t put your business in a test tube and the outcomes of your work probably aren’t life and death.

Many businesses do still use A/B testing, particularly in the B2C/DTC/Ecommerce world. Some of the most well-known companies that have talked publicly about using A/B testing and experimentation include Booking.com, Amazon and Netflix. A/B testing is well-suited to these companies because:

These companies receive millions of transactions (conversions) per day
Purchases are low cost and often quick, impulsive buying decisions
Small UX changes can equate to millions of dollars of sales
Users are often inside a product and logged-in

But B2B businesses change everything.

Unfortunately, whilst running an A/B test in B2B sounds relatively simple, running an RCT properly outside of a lab environment is much harder than most marketers, growth teams and product teams realise. As Jonny Longden, Chief Growth Officer at Speero says:

A/B testing has the unfortunate characteristic of seeming incredibly easy while actually being incredibly difficult.

Jonny Longden, Chief Growth Officer at Speero

Casey Hill, CMO of DoWhatWorks echoes a similar sentiment:

The more time I spend around testing, the more I’m convinced that A/B testing is broken. First off, only 10% of A/B tests tied to revenue beat their controls. This means we’re not only spending an untold amount of time, energy and money on tests that don’t work; we’re actually losing millions of dollars putting these tests in front of our audience only to see conversion suffer.

Casey Hill, CMO of DoWhatWorks

Most B2B Companies Don’t Get Enough Conversions for A/B Testing

The most common challenge for B2B companies wanting to run A/B tests is traffic and conversions. You can only talk about percent lift meaningfully once the results are statistically significant, and statistical significance requires a sample size that most B2B businesses aren’t able to achieve in a reasonable amount of time.

Statistical significance is a function of conversion per variation, not a function of traffic or business size.

According to Ronny Kohavi:

You need over 100K users in the experiment to detect a 10% change if your conversion is about 3%

Ronny Kohavi

This assumes the absolute best case scenario:

Your conversion action is sitewide e.g. Book a demo
Your conversion action (e.g. book a demo) exists in all languages
Your conversion action (e.g. book a demo) shows for all devices

Detecting smaller changes becomes even more difficult. Typical website conversion rates have an average value of around 2.5%, and top-performing ones usually hover around 5%. It’s therefore incredibly unlikely you’re going to 2X or 3X your conversion rate and changes will be much smaller, requiring much more traffic. For example - you need approximately 3,825 conversions per variation (7650 conversions total) to reliably detect a change from 2% to 3% with 95% confidence and 80% power.

There are various A/B testing calculators available to check this for your own site and requirements:

Even huge sites like Stack Overflow don’t test everything. As Des Navadeh, Product Manager, Stack Overflow says:

Sometimes a change is obviously better UX but the test would take months to be statistically significant. If we are confident that the change aligns with our product strategy and creates a better experience for users, we may forgo an A/B test. In these cases, we may take qualitative approaches to validate ideas such as running usability tests or user interviews to get feedback from users

Des Navadeh, Product Manager, Stack Overflow

But We Can Just Test Upstream Metrics?

Sure you can test using upstream (or surrogate metrics) but these are only useful if you believe they causally impact the key outcomes your business cares about. In e-commerce, that would be revenue or transactions. In B2B, something more like form completions or qualified leads. In Ronny’s A/B Testing Class he shares a real example of a test that showed a material increase in clicks but that did little for the real metric. Always aim for surrogates close to the key metrics.

Most B2B Companies Don’t Have the Skills for A/B Testing

Even if you have enough traffic, conversions and the A/B testing infrastructure, you still require someone with a data science background to run effective A/B tests. Experiment design fills entire textbooks and multi-semester university classes.

You need someone that understands:

Statistical significance and sample sizes
Statistical power and power analysis
Confidence intervals
Bayesian vs Frequentist methods
P-values and p-hacking
Interaction effects
The Null hypothesis
Randomisation, cookies and device signals
Flicker effects
Sample ratio mismatch
Pre-test calculations
Simpson’s Paradox
Guardrail or counter metrics
Inference and data modelling in analytics tools
Stratified sampling
Data Winsorisation
Pre-experiment bias (CUPED and CUPAC)

Anyone can buy an A/B testing tool and launch some tests. There are literally hundreds of tools ready to take your money, but A/B testing tools often lie about whether something is “statistically significant”. As the famous 2022 whitepaper proved:

Some well-known commercial vendors of A/B testing software have focused on ‘intuitive’ presentations of results, resulting in incorrect claims to their users

2022 whitepaper

Experimentation ≠ CRO ≠ A/B Testing

Experimentation is not Conversion Rate Optimisation or A/B Testing. Experimentation is a methodology. CRO and A/B testing are tactics.

Experimentation is a mindset, a process, and a way of working that aims to use evidence and data to validate ideas, rather than guessing.

Nobody said experimentation can only be done on high traffic, front-end website user experiences with an A/B testing platform. The Wright Brothers conducted over 1000 experiments in wind tunnels and with fixed-wing gliders prior to their first successful powered flight in 1903 - they didn’t use an A/B testing tool.

As a B2B marketing or growth team, there are lots of things you should be testing that you can’t A/B test, for example:

Branding
Internal processes
User onboarding
Content creation
Social media automation
Market segmentation and targeting
Translation and language sites
Brand ambassadors and referral incentives
Physical mail and gifting
LLM optimisation (web traffic is diminishing and you can’t run an A/B test in ChatGPT or Claude)

The purpose and method of scientific experimentation is more important than any one specific individual technique or method. Experimentation can be applied to anything, so focus on building an experimentation culture first. The goal isn’t perfection - it’s to measure what you’re doing and to learn.

Expanding Your Experimentation Toolkit

A/B testing is a gold standard for data-backed decision-making. But it should only be ONE of the tools in your optimisation toolkit.

Jon MacDonald

A/B testing is simply one way of gathering some data that helps give you confidence that a change is more likely to help than hurt. It’s great at understanding what people/users are doing, but not good at understanding why. It gives you information, but that information has many limitations and should be considered along with all other evidence.

Your goal should be to expand your experimentation toolkit to include many other forms of research & insight.

Ton Wesseling’s “Hierarchy of Evidence” pyramid is a decision-making framework adapted from evidence-based medicine, tailored for experimentation, CRO (conversion rate optimisation), and data-driven product or marketing teams. It visually represents the relative strength of different sources of evidence, especially when deciding what changes to make or what strategies to pursue.

CRO Hierarchy of Evidence

Here are some popular CRO methods used for experiment validation and learning:

Method	What is it	Most useful for	Downsides
Industry best practices	Resources like GoodUI or Baymard Institute	Quick wins and baseline optimisations	May not apply to your specific context
Analytics event data	Tracking user behaviour through events	Understanding user flow and drop-offs	Limited context on why behaviours occur
Heatmaps & Click maps	Visual representation of user interactions	Identifying what users focus on	Can be misleading without proper context
Session recordings	Videos of actual user sessions	Understanding user struggles and confusion	Time-intensive to analyse
Sales data	Revenue and conversion metrics	Understanding business impact	Lag time and attribution challenges
Pop-up/post conversion surveys	Quick feedback collection	Getting immediate user sentiment	Low response rates, potential bias
Email surveys	Detailed feedback via email	In-depth customer insights	Low response rates, sample bias
Prototype testing	Testing concepts before full build	Validating ideas early	May not reflect real-world usage
Usability testing	Structured testing with real users	Finding specific usability issues	Small sample sizes, artificial environment
Interviews	One-on-one conversations with users	Deep qualitative insights	Time-intensive, small sample sizes
Customer support logs	Analysing support tickets and feedback	Understanding pain points	Reactive rather than proactive
Pre- & post analysis	Time series analysis of changes	Understanding impact over time	Correlation vs causation challenges
Observational data analysis	Analysing existing data patterns	Finding trends and opportunities	Limited ability to establish causation

Start Building an Experimentation Culture

The underlying principle of testing things is simply to try and use evidence to validate ideas, as opposed to guessing or using gut instinct. Experimentation is about a mindset and a framework for testing ideas systematically by applying the scientific method: Research → Ideation → Solution Design → Validation → Learning & Iteration. The primary goal of experimentation is validated learning - to increase your conviction in the problem or the solution and provide actionable insights with which to make decisions.

The underlying principle of testing things is simply to try and use evidence to validate ideas, as opposed to guessing. Nobody said this had to be a statistically significant controlled experiment performed only on high-traffic website user experience elements within an A/B testing platform

Jonny Longden, Chief Growth Officer at Speero

The goal is to avoid gut instinct, HiPPOs or hunches at all costs. Any data-backed insight is always better than no insight at all. The job of science is just to pin down or reduce uncertainty, there are many ways in which to do this.

Over time, you’ll be able to accumulate enough learnings to have an intuition of what works and what doesn’t, which makes developing new features, or shipping new campaigns, or optimising your sales funnels more effective.

Where Does Growth Method Fit In

Growth Method is the GrowthOS built for teams focused on pipeline, not projects.

Growth Method helps teams to:

Identify and track your core metric
Refine questions and problems into hypotheses
Aggregate experiment data into a central platform
Engage your marketing and growth teams
Run experiments
Evaluate the results (quantitative and qualitative)
Engage company-wide stakeholders

We help companies build a scientific approach to growth through experimentation and a more data-driven culture. The end goal is to increase revenue and sales, not just your conversion rate.

We are on-track to deliver a 43% increase in inbound leads this year. There is no doubt the adoption of Growth Method is the primary driver behind these results.

Laura Perrott, Colt Technology Services

Think of it this way: A/B testing is a scalpel. It’s incredibly precise when you need surgery. But most of marketing isn’t surgery - it’s diagnosis, treatment, and ongoing care. You need a full medical toolkit, not just the sharpest blade.

Start with building an experimentation culture. The statistical significance will follow when you’re ready for it.

Growth Method is the GrowthOS built for marketing teams focused on pipeline — not projects. Book a call at https://cal.com/stuartb/30min.