Why A/B Testing Is a Scalpel — And Most of Marketing Isn’t Surgery

Stuart Brameld, Founder at Growth Method

Article written by

Stuart Brameld


I was recently speaking with a prospective customer who was exploring Growth Method for their team. During the conversation, a question came up that I've heard a few times before: Why use Growth Method instead of an A/B testing tool?

At the time, I didn't give the clearest answer — so I'm using this article to lay out a better, more thoughtful explanation for anyone else wondering the same thing.

The short version? It's not an either/or decision. Growth Method and A/B testing tools are complementary, not substitutes. But most teams get the sequence wrong.

You should always start by building a strong culture of experimentation: clear hypotheses, rapid learning loops, data-driven analysis and team-wide visibility. For most B2B companies, A/B testing should be one of the last tools in your toolbox — not the first.

Where A/B Testing Actually Works

With A/B testing you have a control (A) and a test (B). You split the population in two, give some portion of them the control (A) and the remainder a version with just one variable changed, the test (B). With some basic statistics, we can be certain that any movement in the numbers is significant and causal.

If you're new to A/B testing, this is a fantastic interview on Lenny's Podcast:

A/B testing is the tech world equivalent of a Randomised Controlled Trial (or RCT). It's the gold standard in science and medicine for establishing causality. In the US, the FDA requires a randomised controlled trial before approving drugs for sale.

But you're a business, not a laboratory. You can't put your business in a test tube and the outcomes of your work probably aren't life and death.

Many businesses do still use A/B testing, particularly in the B2C/DTC/Ecommerce world. Some of the most well-known companies that have talked publicly about using A/B testing and experimentation include Booking.com, Amazon and Netflix. A/B testing is well-suited to these companies because:

  • These companies receive millions of transactions (conversions) per day

  • Purchases are low cost and often quick, impulsive buying decisions

  • Small UX changes can equate to millions of dollars of sales

  • Users are often inside a product and logged-in

But B2B businesses change everything.

Unfortunately, whilst running an A/B test in B2B sounds relatively simple, running an RCT properly outside of a lab environment is much harder than most marketers, growth teams and product teams realise. As Jonny Longden, Chief Growth Officer at Speero says:

"A/B testing has the unfortunate characteristic of seeming incredibly easy while actually being incredibly difficult."

Jonny Longden, Chief Growth Officer at Speero

Casey Hill, CMO of DoWhatWorks echoes a similar sentiment:

"The more time I spend around testing, the more I'm convinced that A/B testing is broken. First off, only 10% of A/B tests tied to revenue beat their controls. This means we're not only spending an untold amount of time, energy and money on tests that don't work; we're actually losing millions of dollars putting these tests in front of our audience only to see conversion suffer."

Casey Hill, CMO of DoWhatWorks

Most B2B Companies Don't Get Enough Conversions for A/B Testing

The most common challenge for B2B companies wanting to run A/B tests is traffic and conversions. You can only talk about percent lift meaningfully once the results are statistically significant, and statistical significance requires a sample size that most B2B businesses aren't able to achieve in a reasonable amount of time.

Statistical significance is a function of conversion per variation, not a function of traffic or business size.

According to Ronny Kohavi:

"You need over 100K users in the experiment to detect a 10% change if your conversion is about 3%"

Ronny Kohavi

This assumes the absolute best case scenario:

  • Your conversion action is sitewide e.g. Book a demo

  • Your conversion action (e.g. book a demo) exists in all languages

  • Your conversion action (e.g. book a demo) shows for all devices

Detecting smaller changes becomes even more difficult. Typical website conversion rates have an average value of around 2.5%, and top-performing ones usually hover around 5%. It's therefore incredibly unlikely you're going to 2X or 3X your conversion rate and changes will be much smaller, requiring much more traffic. For example - you need approximately 3,825 conversions per variation (7650 conversions total) to reliably detect a change from 2% to 3% with 95% confidence and 80% power.

There are various A/B testing calculators available to check this for your own site and requirements:

Even huge sites like Stack Overflow don't test everything. As Des Navadeh, Product Manager, Stack Overflow says:

"Sometimes a change is obviously better UX but the test would take months to be statistically significant. If we are confident that the change aligns with our product strategy and creates a better experience for users, we may forgo an A/B test. In these cases, we may take qualitative approaches to validate ideas such as running usability tests or user interviews to get feedback from users"

Des Navadeh, Product Manager, Stack Overflow

But We Can Just Test Upstream Metrics?

Sure you can test using upstream (or surrogate metrics) but these are only useful if you believe they causally impact the key outcomes your business cares about. In e-commerce, that would be revenue or transactions. In B2B, something more like form completions or qualified leads. In Ronny's A/B Testing Class he shares a real example of a test that showed a material increase in clicks but that did little for the real metric. Always aim for surrogates close to the key metrics.

Most B2B Companies Don't Have the Skills for A/B Testing

Even if you have enough traffic, conversions and the A/B testing infrastructure, you still require someone with a data science background to run effective A/B tests. Experiment design fills entire textbooks and multi-semester university classes.

You need someone that understands:

  • Statistical significance and sample sizes

  • Statistical power and power analysis

  • Confidence intervals

  • Bayesian vs Frequentist methods

  • P-values and p-hacking

  • Interaction effects

  • The Null hypothesis

  • Randomisation, cookies and device signals

  • Flicker effects

  • Sample ratio mismatch

  • Pre-test calculations

  • Simpson's Paradox

  • Guardrail or counter metrics

  • Inference and data modelling in analytics tools

  • Stratified sampling

  • Data Winsorisation

  • Pre-experiment bias (CUPED and CUPAC)

Anyone can buy an A/B testing tool and launch some tests. There are literally hundreds of tools ready to take your money, but A/B testing tools often lie about whether something is "statistically significant". As the famous 2022 whitepaper proved:

"Some well-known commercial vendors of A/B testing software have focused on 'intuitive' presentations of results, resulting in incorrect claims to their users"

2022 whitepaper

Experimentation ≠ CRO ≠ A/B Testing

Experimentation is not Conversion Rate Optimisation or A/B Testing. Experimentation is a methodology. CRO and A/B testing are tactics.

Experimentation is a mindset, a process, and a way of working that aims to use evidence and data to validate ideas, rather than guessing.

Nobody said experimentation can only be done on high traffic, front-end website user experiences with an A/B testing platform. The Wright Brothers conducted over 1000 experiments in wind tunnels and with fixed-wing gliders prior to their first successful powered flight in 1903 - they didn't use an A/B testing tool.

As a B2B marketing or growth team, there are lots of things you should be testing that you can't A/B test, for example:

  • Branding

  • Internal processes

  • User onboarding

  • Content creation

  • Social media automation

  • Market segmentation and targeting

  • Translation and language sites

  • Brand ambassadors and referral incentives

  • Physical mail and gifting

  • LLM optimisation (web traffic is diminishing and you can't run an A/B test in ChatGPT or Claude)

The purpose and method of scientific experimentation is more important than any one specific individual technique or method. Experimentation can be applied to anything, so focus on building an experimentation culture first. The goal isn't perfection - it's to measure what you're doing and to learn.

Expanding Your Experimentation Toolkit

"A/B testing is a gold standard for data-backed decision-making. But it should only be ONE of the tools in your optimisation toolkit."

Jon MacDonald

A/B testing is simply one way of gathering some data that helps give you confidence that a change is more likely to help than hurt. It's great at understanding what people/users are doing, but not good at understanding why. It gives you information, but that information has many limitations and should be considered along with all other evidence.

Your goal should be to expand your experimentation toolkit to include many other forms of research & insight.

Ton Wesseling's "Hierarchy of Evidence" pyramid is a decision-making framework adapted from evidence-based medicine, tailored for experimentation, CRO (conversion rate optimisation), and data-driven product or marketing teams. It visually represents the relative strength of different sources of evidence, especially when deciding what changes to make or what strategies to pursue.

CRO Hierarchy of Evidence

Here are some popular CRO methods used for experiment validation and learning:

Method

What is it

Most useful for

Downsides

Industry best practices

Resources like GoodUI or Baymard Institute

Quick wins and baseline optimisations

May not apply to your specific context

Analytics event data

Tracking user behaviour through events

Understanding user flow and drop-offs

Limited context on why behaviours occur

Heatmaps & Click maps

Visual representation of user interactions

Identifying what users focus on

Can be misleading without proper context

Session recordings

Videos of actual user sessions

Understanding user struggles and confusion

Time-intensive to analyse

Sales data

Revenue and conversion metrics

Understanding business impact

Lag time and attribution challenges

Pop-up/post conversion surveys

Quick feedback collection

Getting immediate user sentiment

Low response rates, potential bias

Email surveys

Detailed feedback via email

In-depth customer insights

Low response rates, sample bias

Prototype testing

Testing concepts before full build

Validating ideas early

May not reflect real-world usage

Usability testing

Structured testing with real users

Finding specific usability issues

Small sample sizes, artificial environment

Interviews

One-on-one conversations with users

Deep qualitative insights

Time-intensive, small sample sizes

Customer support logs

Analysing support tickets and feedback

Understanding pain points

Reactive rather than proactive

Pre- & post analysis

Time series analysis of changes

Understanding impact over time

Correlation vs causation challenges

Observational data analysis

Analysing existing data patterns

Finding trends and opportunities

Limited ability to establish causation

Start Building an Experimentation Culture

The underlying principle of testing things is simply to try and use evidence to validate ideas, as opposed to guessing or using gut instinct. Experimentation is about a mindset and a framework for testing ideas systematically by applying the scientific method: Research → Ideation → Solution Design → Validation → Learning & Iteration. The primary goal of experimentation is validated learning - to increase your conviction in the problem or the solution and provide actionable insights with which to make decisions.

"The underlying principle of testing things is simply to try and use evidence to validate ideas, as opposed to guessing. Nobody said this had to be a statistically significant controlled experiment performed only on high-traffic website user experience elements within an A/B testing platform"

Jonny Longden, Chief Growth Officer at Speero

The goal is to avoid gut instinct, HiPPOs or hunches at all costs. Any data-backed insight is always better than no insight at all. The job of science is just to pin down or reduce uncertainty, there are many ways in which to do this.

Over time, you'll be able to accumulate enough learnings to have an intuition of what works and what doesn't, which makes developing new features, or shipping new campaigns, or optimising your sales funnels more effective.

Where Does Growth Method Fit In

Growth Method is the GrowthOS built for teams focused on pipeline, not projects.

Growth Method helps teams to:

  • Identify and track your core metric

  • Refine questions and problems into hypotheses

  • Aggregate experiment data into a central platform

  • Engage your marketing and growth teams

  • Run experiments

  • Evaluate the results (quantitative and qualitative)

  • Engage company-wide stakeholders

We help companies build a scientific approach to growth through experimentation and a more data-driven culture. The end goal is to increase revenue and sales, not just your conversion rate.

"We are on-track to deliver a 43% increase in inbound leads this year. There is no doubt the adoption of Growth Method is the primary driver behind these results."

Laura Perrott, Colt Technology Services

Think of it this way: A/B testing is a scalpel. It's incredibly precise when you need surgery. But most of marketing isn't surgery - it's diagnosis, treatment, and ongoing care. You need a full medical toolkit, not just the sharpest blade.

Start with building an experimentation culture. The statistical significance will follow when you're ready for it.

Growth Method is the GrowthOS built for marketing teams focused on pipeline — not projects. Book a call at https://cal.com/stuartb/30min.


Stuart Brameld, Founder at Growth Method
Stuart Brameld, Founder at Growth Method
Stuart Brameld, Founder at Growth Method

Article written by

Stuart Brameld

Category:

Acquisition Channels

Real experiments. Shared monthly.

Join 500+ growth professionals