Today on Growth, Facebook Messenger’s Head of Growth, Darius Contractor, joins Matt to talk through experiment development, design and tracking all using a system Darius developed called EVELYN (Experiment Velocity Engine Lifting Your Numbers). With EVELYN growth teams can input and prioritize their experiments as well as track them all the way from initial specing and design to build and completion.
Matt and Darius also talk through Darius’s background in growth, time spent investing, his work at Dropbox and now his tenure as Head of Growth at Facebook Messenger.
For more on EVELYN you can visit bit.ly/evelyn-airtable and be sure to tune in for the full episode.
Subscribe & Tune In
Matt Bilotti: Hello and welcome to another episode of Growth. I’m your host, Matt Bilotti. I am really excited today to have Darius Contractor joining us. We are going to talk about a system that he built that we wound up adopting at Drift and it did wonders for us. Darius, thank you so much for joining us. Would you like to give a quick introduction on yourself?
Darius Contractor: Sure. Thanks for having me Matt. My background’s in growth. I got a CS undergrad and was a programmer to start and I started Tickle.com, which is a big growth shop, back in the early days of the internet with viral tests and quizzes, gone to social networking as VP engineering at Bebo.com, did some investments for a while. Then, most recently I was at Dropbox as a growth engineering leader as well as a growth product leader. Then, most recently, I’ve joined Facebook as head of growth of Facebook messenger.
Matt: Very cool. It is an amazing background. It was at Dropbox that you developed this specific system that you call EVELYN. What is EVELYN?
Darius: Yeah, that’s right Matt. EVELYN is a tool that stands for Experiment Velocity Engine (Lifting Your Numbers). The idea is it’s pretty much a experiment tracker, an organizing tool that allows you to put in all your experiments, prioritize them well, track them from specing, designing, building and completion and also see how you did, like opportunity sizing on the way in, look at the results on the way out, and overall give you a set of processes and a storage source of truth for your experiments that I think can really help a team accelerate how they build and build the right experiments, getting you to more growth wins faster.
Matt: Got It. For us, I would agree with basically all of that. It really helped level us up. One of the things that it did too was it moved us from debating emotions of like, “We think that this thing is more important to do next,” and talking at that level, but instead moving us to a level of discussing the underlying assumptions and the actual data around those things. You’re debating data instead of what things should we do next.
Darius: That’s part of the idea. One thing I saw at Dropbox is a lot of really smart people build a lot of things, but not always utilizing all their genius and not always being able to share their insights with one another as they prioritize experiments and built experiments. I think you’re right. One of the things that EVELYN does is it allows people to put in more specifically how much they think the impact is. First, with T-shirt sizing, where you say small, medium, large, and then T-shirts sizing the engineering costs. It’s kind of small, medium, and large effort. Even with that first round of it, you can start having conversations like, “Oh, I thought this would be easy.” “No, it’s not that easy because of this thing.” “Oh, I get it now.” That can help make more specific, some of the discussions as to what is worthwhile to do next versus just saying, “I think this is a good idea.” “No, I think this is a good idea.”
Matt: Got It. This was used, presuming still is being used, across … How many different growth squads were there at Dropbox?
Darius: When I left, I think there was over 20 different teams working on growth with different roadmaps, different surface areas and in some cases, different metrics. That was a lot of complexity. Part of the reason that we built this is just to keep track of all the growth happening. I think you don’t need all the power of EVELYN if you just have a small growth team. But I think some of the prioritization and some of the source of truth-tracking can be valuable for any size team.
Matt: Got it. Can you walk through the rough process of the day to day type use of EVELYN? And then I want to talk about the outputs that you get with it.
Darius: For sure. There’s a few macro steps. First, putting in your experiment, then prioritizing it, building the best experiments, and then looking at how you did. One thing that I like about having a central place that’s as powerful as EVELYN, EVELYN is built on top of Airtable, and so it’s kind of a general case database system. One last thing about having that powerful of a system supporting EVELYN is that anyone can fill out the experiment form and submit an experiment. So you have this nice long list of experiments, hopefully, that allows you to pick from a good selection of options. Once you have that list, you go in and T-shirt size of them, so impact and effort, small, medium, large. Then the system automatically gives you a T-shirt score, which prioritize the ones that are high impact, low effort. Then, with the highest T-shirt scores, what I encourage people to do next is put an opportunity size.
So that’s actually more of a deep dive as to for my metric either ARR, amount of money you’re going to make or number of users or conversion rate, whatever metric you’re trying to optimize for, going in and piece by piece with UI saying, “Okay, we’re going to add this button here. How many people are going to see the button? What’s a reasonable conversion rate, reasonable conversion rate for them to click on the button, and then how many people are realistically going to convert after they click on the button on the next page?” If you put all those numbers together, what opportunity size do you come up with? For the number of people who will see and use this experience, how much can we really believe it’s going to move this metric?
So you might do all of that for one experiment and say, “Hey, I really believe this can move revenue by $1 million a year. That’s great. This other one that I thought was a really good experiment can probably only move revenue by 300k, and that’s based on how it integrates with the site and what reasonable conversion rates we can guess.” Now, some of that’s finger in the air. You’re not going to have exact numbers for all these conversion rates and everything. But it does inspire the team to go deep on the data, find as many numbers as they can on the existing site. You’re going to at least find the number of people who are viewing your start page, so to speak.
Then, also, over time, build better and better gut feels as to how people go through your site. Your guess might be that 20% of people are going to click the button the first time. Well, you build and launch it and you see that 5% of people click the button. You’re like, “Okay, next time I do one of the experiments, I’m going to assume that 5% of people are going to click this button.”
Opportunity sizing do can be very hard to accurately. That’s part of the reason a lot of teams, even great growth teams, don’t go deep on opportunities, I think. Because they feel like it’s too much of a tax to getting something out the door. But what I’ve learned over my years doing it is that while it is a bit of a tax up front, it gets much easier over time. If you put in 10 units of effort to opportunity-size the first project, it ends up being three units for the second project and then two units for all following projects. Because it turns out that eight of the things you did for the first project are just the same for all the projects. It’s the same surface, it’s the same conversion rate, it’s the same data pool. It ends up that you just get really good at it really fast, you surprise yourself by how easy. Then, honestly, it becomes like flossing. The first few times you floss you’re like, “This is annoying.” When you floss, you’re like, “I feel dirty. I need to go floss.” You know?
Darius: Like, “Why are we running experiments that we don’t have opportunity sizes for?” Eventually, your team will be saying this.
Matt: Yeah. It’s funny because we … We picked it up and we hadn’t actually like seen your EVELYN system, the template of it and around the opportunity sizing stuff. So we picked that up as well. We backed into it a little bit differently, which I don’t know if it’s better or not. I think it’s just a different way to think about it is we figured out the unit economics for each of our levers. So we figured out that an active user is worth a dollar. A signup on this part of the website is worth 25 cents. Then, our opportunities sizes were how much do we think we can move that number, and then it would … Our Airtable set up basically multiplies out what is the revenue output of that. It’s kind of funny how we did it in a vacuum and … I don’t know. We definitely have to play around with the way that you have set it up.
Darius: I think that makes total sense. I mean, really, what it comes down to is making your best possible guess for every experiment of how much it’s going to realistically affect your target metrics. It sounds like you did that exact thing. It sounds like you went pretty deep on the data and got a good understanding of for each different entry point, what’s the final value in revenue of that entry point? I think having those numbers that you can multiply through, while never perfect, are enough that you can get some really good balancing between the different parts of the site. What are some of the results of actually doing that for your team once you got the whole set system set up?
Matt: Once we had the whole system going, it was really cool because it allowed us to, one, really clearly align ourselves with the rest of the business, which was around revenues. The rest of the business is focusing on revenue. Instead of us having our own language of, “Oh, yeah, we can get this many more PQLs and whatnot,” we could speak in the same language as the rest of the company. So that was really helpful. It also really easily automatically prioritized the stuff for us. It helped us move faster and get more experiments out. Then, the other part, which I would say we certainly haven’t perfected yet, and I’d love if you expand on this for how it worked for your Dropbox, is it became this tool that I can go show to the executives or other leaders across the company. They can look at it and say, “Oh, the growth team has $500,000 in potential ARR in experiments being built right now.”
Darius: That’s one of my favorite things about the tool is that you can get a growth pipeline for each of your metrics. Effectively, what this does, it says you make a view in Airtable that says show me all the experiments that are ideas to wrapping up or all the experiments, actually, from idea to complete for this given metric, say ARR. Then, what you can do is you can sum up the opportunity size in each bucket. So for your ideas, you might not have opportunity size yet because you haven’t sized them yet, but at least you have a count of ideas. You can say, “Okay, we have 16 ideas, and for my size ideas we haven’t started any work on it. We haven’t speced it or designed it, but we have opportunity-sized it.” If you sum those up, we have like you said, like 500k in opportunity size in that bucket.
Then, you can look at running. How much do we have running? How much do we have concluding? How much are we in decision for launch? And then, how much hasn’t been completed? For the completed ones, you can look at your opportunity size versus the actual results. One of the fields we haven’t talked about yet is confidence. You’ll opportunity-size something and then it’ll also have a confidence. For instance, if you’re taking language that worked in an email and moving it to a payment page that’s very well understood, you might have high confidence for your opportunity size for that experiment, say 70%. You’re like, “I’m pretty sure I know what’s going to happen. This is probably going to work.” Whereas, if it’s a new surface and a new kind of change, you might have very low confidence, like 10 or 20%. The idea is that if you multiply your opportunities size by your confidence, it actually equals your result.
That’s not going to happen in your first quarter or second quarter. But as you get more into it, I’d say within a few quarters you accelerate into an intuitive understanding of what’s possible and the metrics of your site, so that you can more and more accurately plan. Then, the other parts of your business are really going to excited. They’re like, “Hey, we’re looking at optimizing revenue and you have put together this pipeline that you understand that when you say you’re going to deliver X, you deliver it within 20%. That’s fantastic and great for our business.”
Matt: Yeah. It gives you such a good foundation for adding additional teams. You have all this stuff documented. That’s been one of the other things that’s worked really well for us is … It took a lot of effort to get the team in the swing of this, but we really got in the habit of properly documenting back. Like, “We ran this experiment and the control did this, the variant did this, the uplift was 12%.” Now, we have this list of things that we could just search to say, “Yeah, for all the things that we experimented on powered by … What were those?” And anyone at the company can go look at that. There was a team working on a redesign of the chat widget, and we had run a bunch of experiments there. They reached out to us and I just gave them a link to the experiment results. It was just such an easier process.
Darius: Yeah. Having that shared all of you is really valuable and being able to cut it by a search or cut it by a team or cut it by completed. One of the reasons I put EVELYN together is I saw some of the people on our team spend all this time each week preparing for the weekly update meeting. They’d often copy all the experiments to a new document, which is the weekly update document, and then put in the current status of them. They’d have to copy it over from different subsystems. Then, they’d have to put together a different view for the quarterly update. It was like, what’s everything you did this quarter? They have to go back to the weeklies and assemble it.
I was just like, “This is silly. There’s only one of these experiments.” One thing I love about the system is that once you put in the experiment name, you say, “Hey, update CTA, copy v3,” you just type that in once and then you just have different views where you look at the data different ways. You can just pull it up in a meeting, scroll through it. Clicking a link and scrolling, I think, is the threshold of effort for sharing what you’re working on with your manager.
Matt: Yeah, I if I were to guess, there’s probably some people listening that are saying, “Oh, well, Dropbox had 20 squads and that’s a lot. Of course they need a big system. My growth team is only three people. Is it really a good investment for me to make in getting set up with this kind of system?” Or, “Our random to do list process works fine.” Is it something that any group of any team could pick up?
Darius: I think with any growth team, maybe more than one person, you should have more than 20 experiments you’re looking to potentially run. You should have more than 20 experiment ideas. Because you shouldn’t be running all your ideas. Because that means you’re not coming up with enough. So if you have more than one person doing growth, I think you can benefit from some systematization of your process. I think EVELYN, it’s got a lot of fields and a lot of options but you don’t have to use all of them. The only ones you really need are putting in the name, T-shirt sizing, opportunities, cost and then the status. Are we specing this? Are we designing this? Is this running on prod? If you simply use those fields, that’ll give you a basic experiment pipeline and a look forward list of experiments and sizes that you can use to help plan out your growth team.
The other advantage of EVELYN is that this experiment pipeline allows you to dive in and see some issues and address them. For instance, if you have a lot of value opportunity-sized but you’re not executing it, that’s a great argument for getting more engineers on the project and kind of go to your engineering lead and say, “Hey, I’ve got all these projects with all this value we’re likely to hit if I can just get it out the door. Give me some more engineers.” Or you can see that “Hey, we’re running all these experiments and they’re all falling flat. A lot of these experiments aren’t working.” So for the ones that are working, why did they work, and can you use that knowledge to influence how we prioritize experiments and maybe update how we do opportunity sizing because our current way isn’t working very well?
It can also help you see what surfaces are working and which ones aren’t. It’s also frankly good for management. One really interesting thing I heard from one partner was that they incentivize the engineers to get as much opportunity sizing as possible and then the engineers were pushing the product people for the maximum opportunity size. So you have this kind of back pressure through the system like sucking out the value with every person on the team, not just leadership.
Matt: That is awesome. Are there any other parts of this that you feel like we haven’t covered?
Darius: I think that’s that in a nutshell. I think you can probably put in the show links the URL where you can download the default template. It’s just bit.ly/evelyn-airtable. That’s Evelyn, the woman’s name, dash, hyphen, Airtable.
Matt: Yeah, and we’ll add it in there. It’s really easy to just clone and go from there. Darius, thank you so much for joining us. We really do appreciate it. For all of you listening, if you love the episode, feel free to give five stars. If you got any feedback, any questions, suggestions, topics, whatever it might be, my email is Matt@Drift.com. Thank you again, Darius, and we’ll catch you all on the next episode.