Someone once asked me, “How many visitors do I need to determine the validity of my split-testing?” For example, if you’re split-testing two different headlines, how much traffic do you need to make a good judgment call on which headline is the real winner?
That’s an interesting question, and one with no real answer, other than “a lot.” And there’s a specific reason why…
Gary Halbert once said you need 40 actions. John Reese says 200.
Personally, I like 100. You see, both Halbert and John are right, since Gary is referring mostly to offline direct mail and John to online marketing.
Offline, people receive mail and open it. They read it, and then they decide to take action.
There’s a delay, and a plethora of circumstances you can nail down, depending on how targeted your mailing list is, and how you decide to mail out your piece.
But online, so many variables come into play.
Online, people react more quickly than to offline direct mail because there’s no delay. And traffic can come from so many sources, with so many outside influences that are far harder to control.
John Reese said something on a teleseminar recently that’s well-said. On a coaching call with Johnathan Mizel, we were talking about, for example, Taguchi and multi-variable testing. John mentioned you need 1,000 actions with Taguchi to make it really statistically significant.
What he said, and what I agree to, is this…
Taguchi was created for the manufacturing industry. The difference between split-testing manufacturing parts versus marketing ads (and especially online marketing) is that, in manufacturing, you’re dealing with physical components (like car parts, since Taguchi was originally created for the car industry).
Cars and car parts are fixed, inanimate and predictable.
But in marketing, you’re dealing with people… you’re dealing with psychology… which is individual, unpredictable and susceptible to change depending on so many outside influences.
So if you get 40 actions this week, you may make a judgment call. But it may have been influenced by:
- when people saw your ad,
- where they saw it,
- the mood they were in,
- the search they conducted,
- the mindset they were in,
- the specific websites they saw prior to viewing yours,
- what brought them to your website (was it an endorsement email? an affiliate link? a simple search? an adwords ad? did they know about you before seeing your ad? did they see it in the morning or at night? Monday or Friday?)…
… And so on.
All of which can change dramatically, online, with each and every result, each and every visitor and each and every test. And thus skew your results from test to test.
For example, 40 results this week may prove different statistics than 40 results next week.
(I’ve seen this. I made copywriting decisions based on small split-test results. But when tested and tracked again, the statistics were completely different. Even with Taguchi!)
So, in my estimation, 100 is safe. It’s what I’ve personally been using in my testing. Is it perfect? Not at all. The more, the merrier.
The more traffic you generate, and the more sales or actions you produce, the better the result, the more statistically significant the test and the lesser the margin of error.
But 100 is more significant with a thinnner margin of error than with, say, 40 results. The bottom line is, really, that you’re best to go with “actions” or “results,” whether that’s sales, opt-ins or clickthroughs. Not visitors. (NEVER go with just “traffic.”)