At Uber, we are launching a new driver app with a better UI. The goal is increasing driver earnings by increasing their number of trips. Outline a testing strategy to see if the new app is better than the old one

The naive answer here would be: I pick a few markets that are representative of the entire population and, on each market, I randomly split drivers in test and control. Then I do a statistical test on the target metric and check if the new app is winning. The reason this would fail is that test and control wouldn't be independent.

Let's say I take all drivers in San Francisco and give 50% of them the new app and 50% keep using the old app. If the new app is effectively making drivers take more trips, that will result in higher competition for the drivers using the old app, and, therefore, will affect their earnings. The opposite is also true. If the new app sucks and those drivers drive less, this will also affect old app driver earnings given that there is less competition. It is extremely hard to design an A/B test in marketplaces or social networks since users are all connected.

If you get a question like that, to quickly check if you can simply randomly split users, mentally take extreme cases. Let's say the new product has a bug and it is unusable. Or the new product is amazing and test users will use it 24/7. Will these options have any effect on the control group? If the answer is yes (like it is obvious for the Uber case), you can't just randomly split users.

Best way to answer:

Full course in Product Data Science