How to Build a Data Science Portfolio
How do you get a job in data science? Knowing enough statistics, machine learning, programming, etc to be able to get a job is difficult. One thing I have found lately is quite a few people may have the required skills to get a job, but no portfolio. While a resume matters, having a portfolio of public evidence of your data science skills can do wonders for your job prospects. Even if you have a referral, the ability to show potential employers what you can do instead of just telling them you can do something is important. This post will include links to where various data science professionals (data science managers, data scientists, social media icons, or some combination thereof) and others talk about what to have in a portfolio and how to get noticed. With that, let’s get started!
The Importance of a Portfolio
Besides the benefit of learning by making a portfolio, a portfolio is important as it can help get you employment. For the purpose of this article, let’s define a portfolio as public evidence of your data science skills. I got this definition from David Robinson Chief Data Scientist at DataCamp when he was interviewed by Marissa Gemma on Mode Analytics blog. He was asked about landing his first job in industry and said,
The most effective strategy for me was doing public work. I blogged and did a lot of open source development late in my PhD, and these helped give public evidence of my data science skills. But the way I landed my first industry job was a particularly noteworthy example of the public work. During my PhD I was an active answerer on the programming site Stack Overflow, and an engineer at the company came across one of my answers (one explaining the intuition behind the beta distribution). He was so impressed with the answer that he got in touch with me [through Twitter], and a few interviews later I was hired.
You may think of this as a freak occurrence, but you will often find that the more active you are, the greater chance you have of something like this occuring. From David’s blog post,
The more public work you do, the higher the chance of a freak accident like that: of someone noticing your work and pointing you towards a job opportunity, or of someone who’s interviewing you having heard of work you’ve done.
People often forget that software engineers and data scientists also Google their issues. If these same people have their problems solved by reading your public work, they might think better of you and reach out to you.
Portfolio to get around an Experience Requirement
Even for an entry level role, most companies want to have people with at least a little bit of real life experience. You may have seen memes like the one below.
The question is how do you get experience if you need experience to get your first job? If there is an answer, the answer is projects. Projects are perhaps the best substitutes for work experience or as Will Stanton said,
If you don’t have any experience as a data scientist, then you absolutely have to do independent projects.
I want to hear about a project they’ve worked on recently. I ask them about how the project started, how they determined it was worth time and effort, their process, and their results. I also ask them about what they learned from the project. I gain a lot from answers to this question: if they can tell a narrative, how the problem related to the bigger picture, and how they tackled the hard work of doing something.
If you don’t have some data science related work experience, the best option here is to talk about a data science project that you have worked on.
Types of Projects to Include in a Portfolio
Data science is such a broad field that it is hard to know what kind of projects hiring managers want to see. William Chen, a Data Science Manager at Quora, shared his thoughts on the subject at Kaggle’s CareerCon 2018 (video).
I love projects where people show that they are interested in data in a way that goes beyond homework assignments. Any sort of class final project where you explore an interesting dataset and find interesting results… Put effort into the writeup… I really like seeing really good writeups where people find interesting and novel things…have some visualizations and share their work.
A lot of people recognize the value of creating projects, but one issue a lot of people wonder is where do you get that interesting dataset and what do you do with it. Jason Goodman, Data Scientist at Airbnb, has a post Advice on Building Data Portfolio Projects where he talks about many different project ideas and has good advice on what kind of datasets you should use. He also echos one of William’s points about working with interesting data.