The Tester’s Dilemma: When to Test, When to Trust
Full Transcript
Another episode, same setting, same kitchen, here in Berlin with Alex, in person. Same temperature. Same temperature, yeah. We are riding the wave of good weather. Yeah, I'm glad I was gone last week. Where have you been? Lisbon. Oh. But it was like, when the weather here was shit, it was really nice over there. And when the weather got really bad there, it's now really nice here. I'm jealous about your tan. So Alex, last time we talked about building products on steroids, and I wanted to follow up on that. So my topic for today that I was planning to talk to you is about A-B testing. Think something you might like. So I mean, last time, just to quickly catch up on that, I just started with a little side project and learned a lot about product management and how to easily build products. But then I faced a little challenge, because building in quotes, cool products is one thing, but the other thing is how do you sell and market it? And I mean, obviously, that's the toughest challenge, right? I mean, there's a reason why 90% of the startups still fail these days. I mean, that's the reason why there is big budgets, that's the reason why there is usually big teams. Yeah. I mean, I know that scrapping and low budget projects are even tougher, that's something that I experienced. Nevertheless, I just noticed that if you have built a little product based on rapid prototyping, like we talked about last time, you need to come to a point where you bring it out to the real world. Of course, you can play the long game with building up a community with a website that is SEO optimized, whatever. So one approach that I was starting with was doing Facebook ads. And just to give you some background, I have a friend in Berlin, and he's living in a 220 square meter penthouse apartment. And do you know what he does for a living? He makes advertisements on Facebook for cat insurances. And he generates leads and send the leads over to the Allianz insurance. Is that a niche specifically, does he have multiple clients, or is it just like he only acquires cat leads? He acquires the leads and send them to the insurances, and that makes money based on a provision. I mean, there you go with another nice business model. Absolutely. And he taught me a lot about Facebook ads, and he told me, hey, Facebook is not dead when it comes to ads. So I just said to myself, hey, let's give it a try, right? And this is actually the point where it comes to the A-B testing because I just needed to figure out, okay, who's my target audience, which I knew kind of. So the next question is, how do you talk to them and how do you get them to press the ad, land on the landing page and then convert, right? So I mean, we're talking about the customer journey. And I built the sales funnel, so to say. So you click on my ad, you fill out a survey, you give me your email address, you get in return, test results, and then you land in my email funnel. And of course, you don't know which ad will work and which won't. So you need to create multiple ads and you see which ones are going to work. And once you have a winner, you can start taking this winner and optimize this winner. So for example, if you have like a, I don't know, an image with a title on it, you figure out which of those images with text are going to work and then you start optimizing for the text. So, but you see, there's already a process behind that. And once the people convert and land up into my sales funnel, I'm going to send them emails. And what I just started realizing is, or a question for you, what's the most important thing when it comes to email marketing? I mean, obviously people need to read your email and convert. And what is the thing that makes a person open the email or not? First impression, subject line. Subject line, exactly. So the first thing you do, so that at least something that I learned for myself is I need to optimize the headlines, the subject lines, so that people just click and open it. So I started doing A-B testing just with the headline. And once I identified a winner, I started optimizing the text inside this email. And so you go to the first email, to the second email and yeah, I mean, I just learned, okay, A-B testing is actually a big thing because you A-B test your ads, you A-B test your headlines, you A-B test the content of the email, you maybe have a CTA inside that you want to A-B test as well. Then I just realized, okay, that makes a lot of fun. Obviously I'm collecting a lot of data, but then I just realized, okay, I'm doing so many A-B tests for so many different things. I might end up in hell. Why would you? No, I'm actually good and I just want to talk to you about it today. I mean, obviously, I mean, the most important thing with A-B testing is traffic, right? I think especially if you want to go very granular, I mean, and let's look at some of the companies that successfully did a lot of like A-B testing, the booking.coms, the Amazons, I mean, Google who tested thousands of different shades of gray and blue on call to actions to see what works, right? If you go into that granularity, you need to have a ton of traffic in order to be able to say if something actually has an influence or not, because the effect of small changes overall is probably very small. So we need to have like higher numbers to make sure, like if you're testing and there is like a metric that changes by one percentage point, the amount of users that you need behind it is like crazy. And then you even need to question yourself, is it like really worth it to make an optimization for one percentage point? It is if you make like millions of revenue a week. It's not if you're a small company, right? And I think it then also comes down to how significant are the results? Because like I can send 20 people to an A-B test and I see 10 and 10, I mean, you can have so many different factors to play into it and outliers that then you might interpret the A-B test wrongly. So you would really have to see like a massive difference. And you really also need to look at the numbers overall. What is your normal conversion rate or what is your normal opening rate, right? So that you make sure that you don't interpret it wrong. If you have a 10% opening rate, it's kind of difficult with 10 people on each, right? Because like 10% of 10 people could be one person. So it could be that you have zero on one or you have five on the other, which then as a result gives you the impression that you have 0% versus 50% opening rate. But at the end, it's probably meaningless because the numbers were too low, right? So I think like obviously like this, the statistical significance and everyone can Google different numbers, right? Or ask to GPT to help them like figure out when something is statistically relevant or A-B testing tools can tell you that in many cases. But traffic, how different the numbers are, what the normal conversion rates in a specific flow is, those are important factors. I think like it's tricky and see this in a lot of companies, right? Where people use this like, oh yeah, let's A-B test it just because they cannot agree on something or nobody wants to take a decision. A-B tests are often used to just like solve these conflicts. Oh, you say A, I say B, okay, let's A-B test it. And then you don't have a lot of traffic behind it and you just like jump on the first spike because you want to interpret something. But at the end, it could be that you just had like five outliers in there to completely mess up your data and now you're optimizing for the wrong thing. And it's actually good that you mentioned that because the way I approach it is as you said, right? I mean, where do you start? I think obviously at the point where the funnel is the widest, which is the ad level and then the first email. So at least this is how I approach it, please, please challenge me. But now, and this is actually a good topic for discussion is, so I am on low budget, which means I have an ad budget on Facebook, 25 euros. I have very effective ads, I get 20 leads per day roundabout, which means I have like a cost per lead roundabout, I don't know, one euro, 20, something like that, which is in my opinion, and also based on my cat insurance friends, very, very good. I mean, it is, right? Like I think a lot of companies with 20 years, they wouldn't even get half of a client, right? Absolutely. However, so now, but still the point is 20 leads per day. And now I want to do A-B testing for, let's say, an email campaign or for my first email. And of course, I don't have a lot of traffic, but this is also something I wanted to talk to you, are these, so the first email, I have two headlines and the, I have roundabout for both emails, each emails, 50 people who opened it, which is not that much, but you see different open rates. So the first email has an open rate of 60%, which is already also a lot. The other one has 30. So, and my point of view is now, even though it's not statistically, statistically significant, I see like a trend. And my question is, is recognizing trends okay to make decisions or would you rather say, Hey, take a longer time to test? I could ask Chad TPT, but Alex TPT is much cooler. So that's why I'm asking you. I mean, I can, I can only give you my, my own opinion. It's not scientifically based and so on. I mean, in general, yes, trends are relevant, right? Okay. Like, I think if you say you have 50 people in each group and the difference is like massive, then the question is like, okay, is it likely that there is just like a lot of people by that by coincidence ended up in this one group and therefore converted better? Or is there actually something about the title now? I think working with assumptions also, or having different hypotheses on each of these headlines helps, right? Okay. Is there another one, more bold, more extreme and so on that could lead to actually grabbing more attention? Like, can you justify it somehow? Again, it's not something where I would say, if it's a company that then throws another million of budget on this headline, like I would probably test more to have like a little bit more statistical significance on that result. But if you're anyway operating with a very low budget, you can already make your assumptions and you can say, okay, then maybe because the other one didn't perform well, let me try and throw in another headline because like this, your winner stays in the test. Absolutely. And you can see if now, like if suddenly that 70% or 60% that you said drops massively, then maybe I was just lucky with those initial 50 people, right? If it stays high, it means, okay, the headline is actually good. So while you're able to then test a new headline that maybe outperforms the winner that you currently have, you can further health check the winner from the previous test, right? And then you can always combine a lot of these tests, right? Especially if you look at the funnel, it doesn't mean that you first need to have a result on the subject line before you can test the actual content of the email, right? Because like if you're sending group A and group B, you have another 100% that comes out of this of like people who opened the email. And from this 100%, you can again test like two different versions of content. Exactly. That's what I'm doing. So you can have like a B test running across like all the different stages. You can have those B tests running across all the different stages. And then you're going for, you try to like, ideally, when you're like, when you're looking at these things, and now it's like, we're talking more about like the marketing aspect itself, ideally, you never stop optimizing. So you always keep a winner, and you always throw other options in. Try to give it, to add a qualitative layer to it. It can be your own hypothesis, it can be like talking to customers, hey, why did you open the email? Right? So that you have a layer of like understanding, what was it that made people click on it, so that you can then stay in that space and keep optimizing. You solve a little bit the problem of like making sure that the data makes kind of sense with 50, 100 people, it's always difficult. But if you see that the conversion rate stays high on the same headline, you know that it's a good headline. If it then drops, it's just like, yeah, you were lucky you had a couple of people who just open every email. And you know what, while you were telling me all those things that you just told me, I just realized I made maybe a little mistake. So the thing is, let's talk about the very first email that comes in that I'm A-B testing. So I clearly saw a winner, in my opinion. And then I started changing the loser, I started changing the headline, which is good. But I just realized I could have created a new version, testing that instead of optimizing the loser, you know, because now data are not 100% accurate. And this is just something I realized. So to everyone who's listening, I made a beginner, I made a whoopsie. Of course, it would have been better to optimize. I mean, of course, after I started changing the loser with a new headline, the opening rate started increasing, which looks like it gets better. Nevertheless, it would have been cleaner to create a new one and to track it fully and maybe even send more traffic to it. Also, because the cool thing is with my A-B testing tool, the email tool, you can say how many percentage go to A and how many to B or to C. So I'm just realizing, hey, next time... I mean, you could also just like throw in a C and deliver 0% to B. Yeah, exactly, exactly. Because then you still, like you keep A as the baseline, it keeps running, it keeps collecting data. You want that number to be consistent all across, right? Because at the beginning, it will jump a lot. If you have 10 users, it's easy that it goes from like 10% to 80% and down to 20% again. So you keep your baseline, it runs, it runs, it runs, it runs, it runs, you keep collecting data and then you also try to do the same with new. So that was, I mean, honestly, that's all I need from you. I mean, the interesting thing is like, I am not an A-B test genius, so look some stuff up. Like that's just like the general way, like if I take everything that I learned about like quant, qual research, statistics and so on, I feel like that's the approach and that's also the approach that I've been working with in different setups when I was responsible for setting certain tests up. Believe me or not, I mean, I was spending, I mean, everything I'm telling you sounds very logical and this is how I did it. But I spent a lot of thoughts from, okay, at A-B testing to headline A-B testing to content A-B testing to CTA A-B testing and so on and so on and so on. And to me, it makes a lot of fun. It's also a tough game, especially when you do not have that much traffic. And believe me or not, two days ago, I was going back to my statistics university paperwork and I just started teaching myself the statistic mathematics to get back into the game because I realized if I do A-B testing, I want to do it correctly and I want to do it as clean as possible. And for sure, with low budget, you always test in a lower quality than with high volume traffic that I don't have. But maybe to someone who is, I mean, let's say you are working for a company and you want to try out something, you want to prototype something. You want to do a low budget testing by yourself. I think this might be helpful. So therefore, I wanted to pick that topic up today. And I mean, I think so. One more thing. Sorry, I need to add that. I believe for product managers these days, it's so important that you do these things so that you get to grab a little bit of company budget, that you know these things. You can go to marketing and you can figure it out. And I believe many marketers, sorry, I have experience that many people are not following the processes that I just experienced, because if you have a high budget stuff, things work on a different level, right? But it feels like to me, I have to reinvent the wheel a little bit and make the learning for myself. And it's super valuable. I believe I became a much, much better product manager than I used to be due to the fact that I'm doing it myself. Oh, yeah. Yeah. I mean, I think we learn the most by doing things. And I think the important thing, and I don't know how many times I've said this in this whole podcast or previous episodes, it's especially important that people question what they do, why they do it, and question each of these steps. Because again, A-B tests, I've seen it as a massive excuse in so many organizations for people that don't want to take decisions, for people that can't kind of fight out an argument, for people that don't want to take responsibility. So they just like put it off to, oh, yeah, let's make an A-B test. And then they're like, oh, yeah, but we A-B tested it. So that's why we went for dispersion. A, I believe people need to have strong opinions. So I want people with strong opinions. People should have strong opinions. Fair point. Question them, question yourself, and also question A-B tests, right? The thing is, there is so many things that you can do wrong if you just use it as an excuse or as a putting off the responsibility on, I don't even say someone, on something, right? Like, I mean, testing too many things, it doesn't make sense that you test two different landing pages where the whole content changes, where the whole structure changes, where your call to action changes, where literally everything changes. How do you, like, how would that be a good test? Yeah, I mean, but this is a test in a test in a test in a test, right? I mean. Yes. And it comes down to if you have a shit ton of traffic, you can do really good multivariate testing. Yeah. And that's like, again, let's take the example of booking. You can read a lot about like A, B tests and what they did. booking.com you mean? Yeah. Yeah. They have, like, I don't want to say a wrong number, but hundreds of tests running simultaneously. And that works if the traffic is high enough. Like, I can change the content, the structure and the call to action on my page if I have enough traffic to see an impact of the same call to action coming back on different pages and having an influence on the conversion rate. But we're talking about thousands, millions of views and users that allow you to do tests like this. If you're not in that space, and I'm sure that a lot of people that are listening to this are not in this space, you need to be really smart about how you test what you want to optimize. And that's something that I just at least tried. You need to structure your A, B testing from top to bottom. Where do you start? On ad level? Okay. On ad level, I test two ads against each other. On email level? Okay. I start with the headline, right? So you need to move yourself down the road instead of just going broad all the way, right? So at least that's my philosophy. If I'm wrong, please challenge me. But I just realized also it's okay for me to do multiple tests, but they need to be clearly separated from each step, right? Because I can't do three A, B tests on one level, right? On ad level. I cannot do 10 different images with 10 different labels on it. And the same with emails, right? And I cannot have six different versions, A, B, C, D, E, F versions with different headlines, different contents, different CTFs. That doesn't work. You're going to have the same headline with different images or the same image with different headlines. Yeah. That's like one test. Yeah. That's good. The second you start mixing the both, again, you need way more traffic to really say something. And still you need to track it very clean, right? I mean, even if you do multivalent testing, you have to have the good documentation. You have to have clear hypothesis for each test. You have to have a good documentation on what you want to change and why, where are you coming from, where you want to go, who you address. So now we're making a science out of it. And then you also need to document the learnings. I think that's also the sad thing where some A, B tests could have had like, or bring really valuable insights. And then you run the same A, B tests, the same A, B test three months in a row because someone has the same idea again. And talking about this, so it's good that you're saying that because documentation is important and I think that's a good CTA for this podcast. Go to the Product Bakery website, sign up for the Product Bakery newsletter because I start documenting my whole process and my learnings and examples within this newsletter starting from next week or two weeks. I'm just building a little AI myself to take away as much time from me as possible. But I want to document the whole process. So if you are interested in following my current product learnings, our product learnings, then feel free to sign up. Beautiful. Then, with that said, thanks for everyone who reached the end and talk to you next week. Bye bye. This was the Product Bakery. All links can be found in the podcast description and make sure to follow and subscribe for weekly episodes on all podcast platforms as well as YouTube.