Return to site

Big Data hiring: different strokes 4 different folks

· Career

Now you got 3 to 4 candidates that is typically classified into 2 separate groups of people.


The first group comes from the school of computing. This group can quickly draw scripting language like any software engineers. The seniority for developing such sets of skills affects the speed to pick up the appropriate programming language and applies that knowledge into the project as a whole. Ask this group: what are the insights and they are clueless about the so what. This is analogous to getting nurses to administer the drugs. Nurses knows how to administer the drug but they do not have the rigour understanding of the knowledge behind administering the drug and also behind the drug itself.


The other group comes from the school of statistics or econometrics. This group can quickly run all the statistical tests, interpret the data and reveal prediction out of this data. The seniority for developing such sets of skills affects the accuracy to determine the appropriate techniques to deploy, test, analyses and interpret the data. Ask this group: ok, so how can you draw the data out to analyses and put the results back into applications and they will scratch their head to figure it out. This is analogous to getting pharmacists to prescribe drugs. They have the knowledge of the drug but lack experience in injecting the drugs with syringe as compared to that of nurses.


Data Scientist is a hybrid of the 2 kinds. They are like doctors. Yet, it is generally rare to find both. Otherwise why do they fetch a premium to hire? Of course their value lies in optimizing profits for the companies that want to scale. It's like back in those days where investment banks and hedge funds hire quants in the early 2000s to build complex assets and trade on them before the great financial crises reveal the true value of these assets.


Now what do you do if you feel you cannot find a perfect fit of the hybrid. You evaluate according to the existing resources in your company.


If your firm has a strong technology team with a strong culture for learning, then hiring econometricians who are willing to learn and work with the technology team can solve your problems.


If your firm has a strong data driven trained team with a strong culture for training statistical concepts, then hiring a database driven programmer with the willingness to learn these concepts can solve your problems.


The litmus test for the first group of hire is:

1. how long does it take for you to learn a scripting language with a mentor?

2. solve a problem with pseudo algorithm.


The litmus test for the second group of hire is:

1. how long does it take for you to learn the required statistical concepts with a mentor?

2. interpret data and evaluate appropriate statistical tests or concepts to confirm the results.


Yet, the key litmus test for both group of hire is:

1. here's the problem: what are the properties of this problem?

2. what is your proposed solution to solving it?

3. what tools do you use to solve this?

4. where can you get these tools?

5. how long does it take to solve this specific problem by yourself?


While every problem has its own context, we can use a standard PhD assignment to assess how a problem is being solved by an agent and a mentor.


It takes about 3 to 6 weeks to fully understand the problem, 3 months to develop the specific programming language and statistical techniques to approach the problem and another 3 to 6 months to sample, clean, interpret and test the data. Thereafter, it takes about 3 to 6 months to implement the solution.


Anything shorter means you have a very skilled data scientist or a econometrics or computer science professor. Of course, the same data scientist that stays with the same company will solve each problem faster with time. This is call economics of scope with time. As such, companies that believe data science is a core asset should offer long term incentives in addition to market based short term incentives to keep them.


In addition, data scientists need faculties of time to learn about the data, explore interesting questions about the data, solve problems on the data and most importantly reveal the insights to stakeholders who use those insights to drive long term returns of these companies. Think of them as the portfolio managers of your digital assets.


While this article presents a short rule of thumb, you still have to exercise your judgement on who to hire. The choice is still yours.