Editor’s Note: We recently made this totally swell thing called the M+R Toolshed.  It’s a place on the interweb where you can go and get some pretty neat – and free – tools you’ve always wanted to build a better, stronger, faster fundraising, PR, social media and advocacy machine. We’re kind of excited! This here Lab post is dedicated to the first shelf in the ‘Shed: Testing Tools!

It’s a wrenching fact that’s been drilled into our heads time and again: No matter how much data you shovel and how many spreadsheets you hammer out, if you don’t nail the details of testing protocol, your results are going to be screwy.

(You guys, #SorryNotSorry for all the tool metaphors. We can’t help ourselves.)

Testing is the only way to know if the general best practices, past experience, and gut instincts you’re using add up to solid strategy. But there is one thing worse than not testing your email and web tactics: running bad tests. If your tests aren’t giving you statistically significant, repeatable, reliable results, they aren’t just wasting your time. They could be flat-out lying to you.

Enter: M+R’s Toolshed! Here to make sure your tinkering produces actual knowledge you can actually rely on. Basically we’ve taken our amazing data team’s handy spreadsheets with even handier formulas and turned them into real-live-internet calculators just for you…to bookmark. To use. And then to use again. And again. So! To start us off…here they are: a trio of terrific tests* to help you answer three critical questions about every email test you run.

(*We also reallllllly like alliteration ‘round here.)

1) Are the differences in your fundraising or advocacy response rates statistically significant? Use the chi-square test (which looks like χ² in your college stats textbook and measures whether there is a significant difference between expected and observed frequencies in a sample). Oh, and use the phrase “chi-square test” at cocktail parties. People will be blown away by your smartness.

IRL Example: Let’s say you ran a test to see if adding a call-out-box to an email helps get more actions. After you split your group randomly, and send out your test, you see that the call-out-box version had a response rate of 3% and the other was 2.9%. You’ll want to know if the difference in response rates between the two versions is large enough, or if the difference is only due to chance. You should run a chi-square test to see if you can detect a statistically significant difference in those averages. Once you know that, then you’ll have a better idea of whether to keep the call-out-box.

2) Is the difference in your average gift statistically significant? Use the t-test! (Going back to stats class, a t-test is different than a chi-square test because in t-tests you use two sets of data to determine if they are significantly different from each other.)

IRL Example: What if you ran a test on the ask string on your donation page? The original page asks for “$20, $40, or $80.” You want to see if bumping up those amounts to “$25, $50, and $100” will make people give more. The results show that the same percent of people gave to each version, but the new ask string had an average gift of $47, compared to the original average gift of $45. The test version seeeeems higher, but is it high enough to know with confidence that it is the better option? You should run a t-test to see if there is a statistically significant difference in the average gifts from each version. If there is, use the $ amounts that lead to more $$$ on your donation page!

3) Wait. Say you wanna get FANCY. Like, you want to evaluate both response rate and average gift AT THE VERY SAME TIME. How oh how can you figure out what’s most important? Use the Revenue per Recipient Test!

IRL Example: Say you’re curious about low dollar asks (you know, like, “Donate just $5!”). This is the kind of ask that has an impact on response rate AND average gift. (Check out this Lab post for more about low dollar asks, btw). You send an email to half of your audience and, instead of the usual ask for any donation amount, you ask for $5. The $5 ask will certainly have a lower average gift, but it might also have a higher response rate. That means that the $5 ask could be the “winner” in a chi-square test, and the “loser” in a t-test. You have a conundrum. We have a solution. You need the “Revenue per Recipient” test. This test takes into account both the average gift and the response rate, and it will let you decide if there was a statistically significant difference in both.

Ok, so what now?

Go to the Toolshed and test out the testing tools with your own data. The tools are free for you to use! Go change the world, friends!

Tweet @mrcampaigns if you have questions as you test. And if you want to learn more about the nuts-and-bolts of setting up an email test, we’ve got a Lab post about that too!