Goal


A/B tests play a huge role in website optimization. Analyzing A/B tests data is a very important data scientist responsibility. Especially, data scientists have to make sure that results are reliable, trustworthy, and conclusions can be drawn.

Furthermore, companies often run tens, if not hundreds, of A/B tests at the same time. Manually analyzing all of them would require a lot of time and people. Therefore, it is common practice to look at the typical A/B test analysis steps and try to automate as much as possible. This frees up time for the data scientist to work on more high level topics.

This challenge focuses on a crucial step of A/B testing, i.e. making sure that randomization worked properly. A key assumption behind an A/B test is that the only difference between test and control has to be the feature we are testing. This implies that test and control user distribution is comparable. If this is true, we can then exactly estimate the impact of the feature change on whichever metric we are testing.




Challenge Description


Company XYZ is a worldwide e-commerce site with localized versions of the site.

A data scientist at XYZ noticed that Spain-based users have a much higher conversion rate than any other Spanish-speaking country.

Spain and LatAm country manager suggested that one reason could be translation. All Spanish-speaking countries had the same translation of the site which was written by a Spaniard. Therefore, they agreed to try a test where each country would have its own translation written by a local. That is, Argentinian users would see a translation written by an Argentinian, Mexican users written by a Mexican and so on. Obviously, nothing would change for users from Spain.

After they run the test however, they are really surprised because the test is negative. That is, it appears that the non-highly localized translation was doing better!

You are asked to:



Data


We have 2 table downloadable by clicking here.


The 2 tables are:


test_table - general information about the test results

Columns:



user_table - some information about the user

Columns:



Full course in Product Data Science