How to become a data scientist before you graduate

So you’re considering careers outside of academia, and you’ve heard all the data science hype. Sounds like a pretty good gig, doesn’t it? But because data science means different things to different people, it can be hard to figure out just what you should do now to prepare yourself for a job as a data scientist after you graduate.

Conveniently, data science isn’t very different from graduate research—in fact, there are some small but important ways you can change your time in grad school that will make you feel like you’re already a data scientist by the time you graduate. I’ve had a lot of fun taking this approach for the last year and a half, and I’m feeling pretty good about my job prospects once I graduate in December. Plus, many of the steps I explain here are also applicable to other non-academic careers, especially in the Bay Area tech scene. (Take all this with a grain of salt, though: I don’t have a job lined up yet!)

With no further ado, here are 10 things I’ve done during grad school to become a data scientist.

1. Start early. This is a long list and most of the steps take time. Plus, the sooner you start thinking about this stuff, the sooner you can decide if data science is a good fit for you. Better to find that out before you get the job than after!

2. Know your strengths and weaknesses. If you’re reading this, your strengths probably include simply being a science grad student at Berkeley: the field of data science is so wide open right now that it’s pretty easy to parlay a PhD from a well-known school in any kind of scientific field into job interviews. If you’re like me, your weaknesses include not speaking the same language as the CS and stats majors who will be your future coworkers, and minimal first-hand knowledge (and the accompanying impostor syndrome) about industry jobs in general and data science jobs in particular. Leveraging your strengths I’ll leave to you; the rest of this list will help address those weaknesses.

3. Write good code. By good, I mean documented, object-oriented, version-controlled, in a mainstream language, reasonably efficient, using appropriate libraries, and with automated testing. Code that checks all of these boxes is standard in industry but quite uncommon in science—partly because this style of coding doesn’t make sense for all research tasks, and partly because there’s often little incentive or support for scientists to resist cutting corners. I promise it’s worth it, though! Find a coding task where your effort will pay off (big or small, research-related or not) and learn how to do every one of those things in the same codebase. Once you get over the activation barrier, I’ve found that writing code this way is so much more pleasant and personally rewarding.

4. Learn CS fundamentals. No, this is not the same as #3, this means sophomore-level data structures and algorithms. Somehow the tech world has decided that there are “technical” jobs (aka jobs where you “write code”) and there are “nontechnical” jobs, and the way to judge if someone is qualified for a “technical” job is to make them implement a binary tree on a whiteboard. Most companies treat data science roles as “technical,” so this step is mandatory for passing interviews, and of course it doesn’t hurt with #3.

5. Take stats classes. Statistics 215A (Applied Statistics) will give you hands-on experience with most of the statistics and machine learning techniques that data scientists use on the job (with the notable exception of random forests). It’s a huge amount of work, especially if you don’t have much formal statistics experience, but I thought it was totally worth the effort. Statistics 241A (Statistical Learning Theory) is a more theoretical approach to similar topics and is supposed to be very good too. Out in MOOC-land, I particularly liked Bill Howe’s Coursera course for some solid database material and a nice quick overview of stats stuff (https://www.coursera.org/course/datasci).

6. Analyze data. Of course, you’re a grad student, you analyze data all the time. But you probably use the data analysis methods that are standard in your little corner of science, which may not be the ones that are standard in other fields. Practice applying what you learned in your stats classes (see #5) outside of the classroom. There are different schools of thought about whether it’s more useful to build a data science portfolio using research-related or “real-world” data; I think the former is fine, but the Insight Data Science Fellows program (http://insightdatascience.com/) is built around the latter. If your good code (see #3) doesn’t use a database, you should try playing around with databases at this step.

Design: Natalia Bilenko, modified from Drew Conway; Book: MTchemik; network: Qwertyus

Design: Natalia Bilenko, modified from Drew Conway; Book: MTchemik; network: Qwertyus

7. Think outside the Facebook box. What companies hire data scientists? You probably know about Facebook, Amazon, LinkedIn, and a few of the other big-name players. These companies are hiring a lot of data scientists and they should be on your radar. But they aren’t the only ones, and you might be a better fit for a different kind of company. There are good data science jobs in tiny startups, huge multinationals, and nonprofits, and in other sectors like quantified self, smart grid, and data journalism. Examine your priorities, skills, interests, and values, then actively seek out employers where you would be a uniquely good fit.

8. Go to meetups. There are a few seminar series on campus about data science (e.g., https://sites.google.com/site/berkeleyphysicscdi/speaker_series) and you should go to them. But you should also go to events off-campus where you can meet real people who have the job you want to have, and learn what they do and how they do it. It’s easy to find a bunch of groups on meetup.com related to the languages, tools, and communities that you’re most interested in (see #3, #6, and #7); I’m in groups ranging from PyLadies to Data Science for Sustainability. Most groups meet once a month on a weekday evening and feature a cool speaker, free food, and time for networking—what’s not to like? And don’t just go, participate! Work up the nerve to talk to the person next to you, talk to the speaker, or even be the speaker yourself (5-minute “lightning talks” are a great place to start). It’s daunting at first, but it’s really gratifying when you begin to feel part of a community that shares your career goals (something that can be hard to find on campus). Plus, lots of people you meet will try to get you to apply for an internship with them (see #10).

9. Build an online presence. Make a personal website or blog and keep the content up-to-date. Make a LinkedIn profile and get some connections. Make a github account and put your good code (see #3) in public repos. For extra credit, maintain a non-embarrassing Twitter account. Put links from each of these to the others. These are the places where employers and people you meet at networking events (see #8) will try to look you up, and it doesn’t take too much effort on your part to make sure they can find you. I’ve been contacted by recruiters from Google, DE Shaw, and others simply because they found my github and LinkedIn profiles.

10. Apply for internships. Nervous about applying for jobs, or excited to start getting your feet wet? Internships are a lower-stakes trial run for everything that comes after you graduate. The application process will give you practice researching companies and job openings (see #7), writing resumes and cover letters, dealing with recruiters, answering questions in technical interviews (see #4 and #5), and talking with your advisor about your career plans—all potentially-stressful things that will only be more stressful if you wait until your last semester to start thinking about them. And once you get an internship, the job experience and professional network are invaluable of course. (Full disclosure: I applied for internships last summer and didn’t get one, and I still found the process extremely useful.)

Good luck!

Anna Schneider is a biophysics graduate student in the Geissler group, where she uses computer simulation and machine learning to study pattern formation in photosynthetic membranes. She has also held various editorial positions at the Berkeley Science Review. Follow Anna on Twitter at @windupanna
website: http://annaschneider.org/

3 comments

  1. Awesome post, I think data scientist is one of the cool choice for those who are pursuing graduation course, it helps boost their career. thanks Anna for sharing this informative information.

  2. This is truly fantastic advice Anna, thanks for sharing. I am going to take that MOOC on data science and statistics! Love #8-10 especially, since they can be applied to any career path. They highlight the importance of building a community, and how useful the internet can be when used professionally. Three cheers for putting yourself out there and creating a path for yourself.

    @ActiveScientist

  3. Sreenatha Reddy K R

    Thanks a lot Anna for your advice. I have taken MOOC from Coursera and rest are valuable points.

    @SreenathReddyK

Comments are closed.