Tuesday, March 30, 2010

5 Big Mistakes People Make When Analyzing User Data

I was trying to write a blog post the other day about getting various different types of user feedback, when I realized that something important was missing. It doesn’t do any good for me to go on and on about all the ways you can gather critical data if people don’t know how to analyze that data once you have it.

I would have thought that a lot of this stuff was obvious, but, judging from my experience working with many different companies, it’s not. All of the examples here are real mistakes I’ve seen made by smart, reasonable, employed people. A few identifying characteristics have been changed to protect the innocent, but in general they were product owners, managers, or director level folks.

This post only covers mistakes made in analyzing quantitative data. At some point in the future, I’ll put together a similar list of mistakes people make when analyzing their qualitative data.

For the purposes of this post, the quantitative data to which I’m referring is typically generated by the following types of activities:
  • Multivariate or A/B testing
  • Site analytics
  • Business metrics reports (sales, revenue, registration, etc.)
  • Large scale surveys

Statistical Significance

I see this one all the time. It generally involves somebody saying something like, “We tested two different landing pages against each other. Out of six hundred views, one of them had three conversions and one had six. That means the second one is TWICE AS GOOD! We should switch to it immediately!”

Ok, I may be exaggerating a bit on the actual numbers, but too many people I’ve worked with just ignored the statistical significance of their data. They didn’t realize that even very large numbers can be statistically insignificant, depending on the sample size.

The problem here is that statistically insignificant metrics can completely reverse themselves, so it’s important not to make changes based on results until you are reasonably certain that those results are predictable and repeatable.

The Fix: I was going to go into a long description of statistical significance and how to calculate it, but then I realized that, if you don’t know what it is, you shouldn’t be trying to make decisions based on quantitative data. There are online calculators that will help you figure out if any particular test result is statistically significant, but make sure that whoever is looking at your data understands basic statistical concepts before accepting their interpretation of data.

Also, a word of warning: testing several branches of changes can take a LOT larger sample size than a simple A/B test. If you're running an A/B/C/D/E test, make sure you understand the mathematical implications.

Short Term vs. Long Term Effects

Again, this seems so obvious that I feel weird stating it, but I’ve seen people get so excited over short term changes that they totally ignore the effects of their changes in a week or a month or a year. The best, but not only, example of this is when people try to judge the effect of certain types of sales promotions on revenue.

For example, I've often heard something along these lines, “When we ran the 50% off sale, our revenue SKYROCKETED!” Sure it did. What happened to your revenue after the sale ended? My guess is that it plummeted, since people had already stocked up on your product at 50% off.

The Fix: Does this mean you should never run a short term promotion of any sort? Of course not. What it does mean is that, when you are looking at the results of any sort of experiment or change, you should look at how it affects your metrics over time.

Forgetting the Goal of the Metrics

Sometimes people get so focused on the metrics that they forget the metrics are just shorthand for real world business goals. They can end up trying so hard to move a particular metric that they sacrifice the actual goal.

Here’s another real life example: Once client decided that, since revenue was directly tied to people returning to their site after an initial visit, they were going to “encourage” people to come back for a second look. This was fine as far as it went, but after various tests they found that the most successful way to get people to return was to give them a gift every time they did.

The unsurprising result was that the people who just came back for the gift didn’t end up actually converting to paying customers. The company moved the “returning” metric without actually affecting the “revenue” metric, which had been the real goal in the first place. Additionally, they now had the added cost of supporting more non-paying users on the site, so it ended up costing them money.

The Fix: Don’t forget the actual business goals behind your metrics, and don’t get stuck on what Eric Ries calls Vanity Metrics. Remember to consider the secondary effects of your metrics. Increasing your traffic comes with certain costs, so make sure that you are getting something other than more traffic out of your traffic increase!

Combining Data from Multiple Tests

Sometimes you want to test different changes independently of one another, and that's often a good thing, since it can help you determine which change actually had an effect on a particular metric. However this can be dangerous if used stupidly.

Consider this somewhat ridiculous thought experiment. Imagine you have a landing page that is gray with a light gray call to action button. Let's say you run two separate experiments. In one, you change the background color of the page to red so that you have a light gray button on a red background. In another test, you change the call to action to red so that you have a red button on a gray background. Let's say that both of these convert better than the original page. Since you've tested both of your elements separately, and they're both better, you decide to implement both changes, leaving you with...a red call to action button on a red page. This will almost certainly not go well.

The Fix: Make sure that, when you're combining the results from multiple tests that you still go back and test the final outcome against some control. In many cases, the whole is not the sum of its parts, and you can end up with an unholy mess if you don't use some common sense in interpreting data from various tests.

Understanding the Significance of Changes

This one just makes me sad. I’ve been in lots of meetings with product owners who described changes in the data for which they were responsible. Notice I said “described” and not “explained.” Product owners would tell me, “revenue increased” or “retention went from 2 months to 1.5 months” or something along those lines. Obviously, my response was, “That’s interesting. Why did it happen?”

You’d be shocked at how many product owners not only didn’t know why their data was changing, but they didn’t have a plan for figuring it out. The problem is, they were generating tons of charts showing increases and decreases, but they never really understood why the changes were happening, so they couldn’t extrapolate from the experience to affect their metrics in a predictable way.

Even worse, sometimes they would make up hypotheses about why the metrics changed but not actually test them. For example, one product owner did a “Spend more than $10 and get a free gift” promo over a weekend. The weekend’s sales were slightly higher than the previous weekend’s sales, so she attributed that increase to the promotion. Unfortunately, a cursory look at the data showed that the percentage of people spending over $10 was no larger than it had been in previous weeks.

On the other hand, there had been far more people on the site than in previous weeks due to seasonality and an unrelated increase in traffic. Based on the numbers, it was extremely unlikely that it was the promotion that increased revenue, but she didn’t understand how to measure whether her changes actually made any difference.

The Fix: Say it with me, "Correlation does not equal causation!" Whenever possible test changes against a control so that you can accurately judge what effect they’re having on specific metrics. If that’s not possible, make sure that you understand ahead of time which changes you are LIKELY to see from a particular change and then judge whether that happened. For example, a successful “spend more than $10 promo” should most likely increase the percentage of orders over $10. 

Also, be aware of other changes within the company so that you can determine whether it was YOUR change that affected your metrics. Anything from a school holiday to an increased ad spend might affect your numbers, so you need to know what to expect.

I want your feedback!

Have you had problems interpreting your quantitative data, or do you have stories about people in your company who have? Please, share them in the comments section!

Also, if your company is currently working on getting feedback from users, I’d love to hear more about what you are doing and what you’d like to be doing better. Please take this short survey!