Master data science case studies: A hiring manager’s perspective
Let me first say: I’ve been there!
I’ve been in that situation where I got a bunch of data science case studies from different companies and I had to figure out what the problem was, what to do to solve it and what to focus on. Conversely, I’ve also designed case studies for data science and analytics positions, sent them out to candidates and evaluated the submissions I had received.
Based on this experience and conversations with others (both candidates and hiring managers) I want to address some typical questions around case studies and explain what they’re good for. I will do this by outlining a set of common expectations from hiring managers. I believe that if you understand the hiring manager’s needs for the case study, you as a candidate will know what to focus on in order to leave a great impression during the process.
Making a case for case studies
Let’s take a look at the purpose of the application process from the hiring company’s perspective: They typically have a problem or a set of problems in their business that they would like someone to solve for them, and they are trying to find out if you could do it. There is no certain way to find this out before actually hiring you to do the job. So what’s the next best thing they can do? Yes, you’ve guessed it: They just ask you to solve their problem in a case study (also often referred to as technical assessment, take-home assessment, technical homework etc.)
This seems rather sneaky, you might think. Aren’t I just giving away my time for free for working on a problem that their employees get paid for? Even if that’s true, it is still the best thing that can happen to you in an application.
If the case study indeed reflects the actual work done in the company, this is the single best insight into the kind of problems and the kind of data that you will be working with once hired. Usually there is also a follow-up interview where you discuss your solution with the data scientists in the company and you hear how they reason about the problem and the solution space. This is great because you can also get the additional benefit of learning about their progress at the given problem.
Don’t forget: An interview can be a pleasant conversation between data scientists about an interesting problem.
Why is this so valuable for you? The application process is also supposed to help you to figure out if you want to work for the company and if their challenges are interesting for you. Especially when you’ve already had some work experience and/or you have already figured out exactly what you want for your next role, the maturity level of the company might be a crucial piece of information for your decision making. Yes, I am saying that you should also evaluate your interviewers and pay attention to what they tell you about how they would approach the problem.
Minimise the pains of hiring
Hiring someone who is good at their job is expensive. But hiring someone who really sucks at their job is even more expensive. Ask anyone who has made a bad hire before and needed to take some drastic actions, how difficult and nerve-racking the process was. Not to mention the impact on the team that a bad hire can have. Just think of a bad collaboration you’ve had in the past and now imagine you have to work with this person day in, day out, for the next 2 years. And now imagine several people in the team feel the same as you do. What would that do for the team morale?
Therefore, as a hiring manager my goal is to minimise the chance of hiring a bad match for my team.1
If I take this as an objective, we can start working our way backwards: What would I need to know (and what can I reasonably find out during an interview process) in order to minimise that risk? Out of the things that I need to find out during the interview process, which are the ones that I can more naturally check with a case study? Are there perhaps some things that a work sample from a candidate can tell me more about than just asking some direct questions in an interview?
Following this reasoning I formulated three questions that a hiring manager is typically trying to answer through a case study. Answering these questions will allow them to build up a mental model of the candidate and their fit to the role. Let these questions guide your attention when you are tackling a case study as a candidate. If you make an impression there, you know it will be noticed.
Q1: How well can you apply your thinking and your tools to our business problem?
While in theory it might make sense to try to hire someone who has done the exact same job that I’m hiring for before, in practice this might be more difficult (small candidate pool, and retaining this employee might be difficult as they can easily get bored). Therefore, hiring managers usually need to expand their candidate pool, and they need to assess candidates whose experience might only be remotely related to what the job demands. So a thorough hiring manager will try to create a process that gives them several ways to probe the candidate’s fit for the job’s (skill) requirements. This is what the case study does (among other things).
If you interview for a role in a different industry or domain, don’t underestimate the value that you can add as a ‘newbie’. I dare to infer from the diversity of typical data science teams that most data science skills are transferable across industries. Sure, whenever you make a switch, there is a lot of domain knowledge that you need to pick up from scratch. But the analysis and modelling tools that you have mastered in one job/industry can still be useful in another job/industry. In fact, they might even be a bigger asset because nobody in the existing team has ever tried your toolset on the problem. You just need to find out how your tools might apply and what are their limitations in the new setup.
One of the best and most thorough submissions I ever received as a hiring manager came from a social psychology graduate. Let’s call her Jane.
In the case study’s dataset we included a feature that one could consider using as labels in order to formulate a supervised learning task out of the problem statement. But if you would look more closely, you would realise that this feature contains too many missing values, and even worse, the presence of that feature is biased. So if you were to just train a model to predict this feature, you would train a biased model.
Jane had gained a good enough understanding of our business domain through her own research and realised that bias in the data could be a problem. So she tested for interaction effects of the presence of this feature with some other features in the dataset. By doing this she could confidently conclude that there is in fact a bias. So she made the call to not use it as a label.
While Jane doesn’t have a typical data science experience, she had analysed tons of experiment data in her past academic experience. She knows what tools she can use to tease out information from an unknown dataset and how to draw conclusions from it. By familiarising herself with our product she could also come up with reasonable hypotheses to test. In the end, Jane actually ended up implementing a solution that came very close to our own solution at the time, using an unsupervised technique. Needless to say, we extended an offer to her.
Do it like Jane.
Do your homework before you come up with a suitable method for solving the problem2. If you have done your homework well, it will also be easy for you to be on top of your results and to explain why you chose your method (see Q3 below).
Q2: How will you act under imperfect information?
It is very rarely (dare I say: never) the case that you will receive a case study that has a full specification of the problem and what needs to be done. And there is of course a very good reason for that: In the real world, there is also very rarely (again, probably never) a full specification of the problem at hand. Often it is your job to identify what information is missing, devise a plan of how to actively fill those knowledge gaps, or at least how to manage the uncertainty that comes with knowledge gaps. This usually also requires you to prioritise where to dig deeper and which gaps to skim over (for the time being).3
I got a case study as a candidate once where I was told to design ‘a data science solution’ to ‘optimise […] operations’. This was literally the instruction I got. No hint about what ‘data science solution’ means. No hint about what to optimise entails. Not even any details about how the current process works. Full freedom.
I suggest to work your way backwards in these cases: Who are the users of the system and what would success mean for them? How can we formalise this success criterion in one or a few KPIs? Once we know which KPIs we are trying to optimise, brainstorm about how we can get there from where we are at now? Oh, we don’t know where we are at right now? Let’s do some research and make some educated guesses then. Once we have an idea about success, let’s also think about the risks, or the worst-case scenario. Is it captured by some KPI? What can we do to mitigate it?
Each of these questions might require some research, some thinking and some hypothesising. And that is okay, because on the job this process will look quite similar. You might have to scan a lot of internal documentation, interview product owners / stakeholders, or consult external resources to find the answers.
You will not get every single assumption or conclusion right for the case study, but that is also never the expectation. Rather, your interviewers want to see that you can come up with a plan to tackle the unknowns while making a reasonable prioritisation. Maybe in the follow-up interview they will correct some of your assumptions. Be prepared to be challenged and to adapt your solution to new information. This is also why doing your homework (see previous section, Q1) can help you to think on your feet in this discussion.
Q3: How well will you collaborate with the team?
A question I sometimes hear from data scientists is how much attention they should pay to code quality in their case study submissions. Well, what does good code quality help with?
It’s about if other people in the team can understand your code, collaborate with you on your code, and maintain your code when you’re on holidays. So ask yourself, can someone else who is seeing your code for the first time understand it easily? Would they be able to work on it without going crazy? If you can’t answer these questions, ask yourself as a proxy: Can I understand this code 3 months or 1 year from now and work on it without hating my past self?
More fundamentally, it is a question of collaboration: Will people in the team enjoy working with you? Will they understand and trust your work? Will they be able to build upon your work (and you on theirs)?
The understanding and the trust will depend on another important collaboration related skill (if not the most important) that a hiring manager will check with the case study (and throughout the process). It’s communication.
As for most case studies there is not one single correct solution, so it’s important that you can explain the approach you took and the reasoning behind it. You can easily imagine situations on the job when a data scientist has to communicate their work to team mates or justify it to stakeholders. Especially with the latter you will probably be more effective if you show some sort of storytelling skills, i.e. your work communicates a clear message in a compelling and easy-to-follow way (e.g. through visualisations, illustrative examples etc.).
So read carefully who the case study presentation is for and tailor your presentation to this audience. What is the right level of abstraction? Are you prepared to give more details of your solution and your data when needed?4 Can you explain the limitations and the trade-offs of your solution? Does your solution present clear action points / key takeaways? Can you maybe even inspire your audience with a future vision of the solution?
Don’t make the mistake of thinking that it’s enough to just build an unexplainable model that has high accuracy. Don’t underestimate the importance of collaboration skills.
Conclusion
In this article I have tried to convince you that data science case studies provide an opportunity for candidates to learn a lot about the company and the role. I also shared a common set of expectations that hiring managers have towards case studies, and derived some tips for candidates. I hope these tips help you to decide what to focus on in your next case study.
However, there might always be some more specific requirements that only the job you’re applying for has. Try out the exercise of putting yourself into the hiring manager’s shoes! If you feel you are not able to because you lack information about the hiring manager’s expectations, maybe it’s a good time to ask for a chat with the hiring manager in order to understand them better.
Best of luck!
Many thanks to Wolfgang Gassler and Joost Ronkes for providing me with a lot of valuable feedback!
-
I know, I’m a bit reductionist here and this is probably not the only objective I can have as a hiring manager. But bear with me. I think even from this one single goal we can derive some useful tips for candidates that would allow both sides to find out if they’re the right match for each other. ↩︎
-
Here are some suggestions for what you can do as homework: Try out the product if it’s a customer-facing product company, understand the business model, find out who the competitors are and what they are doing better/worse, find out common techniques / state-of-the-art solutions for their problem etc. ↩︎
-
Some hiring managers (especially in smaller companies or start-ups) might even expect you to proactively reach out to ask for more information or data before you dive deep into solving the case. So don’t forget that you always have the option to ask if you really can’t find any starting point. ↩︎
-
General tip on this topic: My impression is that these questions are usually quite similar from project to project. So with experience you will learn to anticipate them and will have prepared answers for them. In the meantime, you can get a sense of your progress by keeping track of how often a question comes up in a presentation where you have to say ‘I don’t know, I have to check the data, let me get back to you after the presentation’ vs ‘I have the answer for you, it’s XYZ / it’s on the next slide’. ↩︎