A Collection of Data Science Take-Home Challenges
How would you improve engagement on FB?
- Define the metric. Pretty much all case studies start with a vague goal and the first step is translating that into a metric that can be optimized. I.e. "I define engagement on FB as the proportion of users who take at least one action per day where action means interacting with the site, i.e. posting, liking, uploading a picture, etc". As long as it makes sense, it doesn't matter much how the metrics is exactly defined, like which thresholds you choose (once a day in this example) or which actions you choose to include or whether you say average actions per user instead of pct of users above a certain threshold. But it is crucial to define a concrete metric. For instance, response rate on Quora would not be a concrete metric. It can't be evaluated, it is too vague. But percentage of questions that get at least one response within a day is a metric.
As a general advice, try to pick a metric that is related to the company mission. Like FB cares about interactions between users, that's why above we picked actions that incentivize that. Quora cares about high level content, so you might always want to include some quality measure (ex: responses with at least 3 up-votes within the first day), Airbnb is about "belonging anywhere" so focus on the fact that if you want to go to a given place, you can do it, etc. Especially Silicon Valley companies love these things.
- After the metric, pick the variables you think would matter to move that metric. Almost always a combination of user characteristics (sex, age, country, # of friends, etc.) and related to their browsing/online behavior (device, they came from ads/SEO/direct link, session time, etc.).
- Pick a model to predict metric from point 1 using vars from point 2. Tree-based classifiers are usually your best option. Emphasize why you picked that model. I.e. I pick a Random Forest (RF) cause I want to have high accuracy. A RF works well in high dimension, with categorical variables, and outliers, as I expect to have here. I will then get model insights via partial dependence and variable importance plots.
- Come up with a couple of possible model output scenarios. I.e. after inspecting my model, I see that users from Argentina are not very engaged. On the other hand, Indians <30 yrs old are very engaged, but proportionally we don't have many of those users. Try to always have one "good" segment and one "bad". Use your knowledge of their product to try to come up with realistic segments. Otherwise, just make them up in a way that are actionable and emphasize that these are just examples, with real data you would find the real segments (after all, the job of a DS is to suggest actions based on the data, not guessing in advance what you will find in the data).
- Define next steps. I.e. I would check the Spanish translation that Argentinians see. Maybe it doesn't feel very Argentinian? We can try with a more localized version? On the other hand, since the site is doing well with young Indians, I would tell the marketing team to reach more of them via ads or specific marketing campaigns. Your case study answer should always end with possible next steps to improve the original metric.
With minor changes, this template can be used to answer pretty much all typical case study questions (i.e. How do you increase X on site Y).