Data science is such a broad field that includes several subdivisions such as data engineering, data preparation and transformation, data visualization, machine learning,and deep learning.
While there are several skills required for doing data science (Data Science Minimum: 10 Essential Skills You Need to Know to Start Doing Data Science), the two most basic requirements necessary for data science are:
a) A solid mathematical background
b) Programming skills
This article will discuss about basic programming languages that you need for doing data science. For essential math skills needed, please see the following: Essential Math Skills for Machine Learning.
Programming Languages Used in Data Science
If you sample data scientist job ads from platforms such as indeed.com or LinkedIn, you will have an idea about the technical skills mentioned in data science job ads. Some of the technical skills (programming languages) mentioned in data scientist job ads include the following:
- Python
- R
- Matlab
- Hadoop
- SAS
- SQL
- Tableau
- Excel
- Power BI
- AWS
- Azure
- Java
- Julia
- Scala
With such a wide variety of skills mentioned in data scientist job ads, every beginner interested in learning the fundamentals of data science would naturally ask themselves:
What programming language should I focus on?
If you are interested in learning the fundamentals of data, you need to start from somewhere. Do not be overwhelmed with the ridiculous list of programming languages mentioned in data scientist job ads. While it is important to learn as many data science science tools as possible, it’s recommended to start from just one or two programming languages for a start. Then once you’ve built a solid background in data science, you can then challenge yourself to learn about different programming languages or different platforms and productivity tools that can enhance your skills set.
Programming languages you should focus on as a data science beginner
According to this article (The Most In Demand Tech Skills for Data Scientists), Python and R are still the unequivocal champions of data science when it comes to programming languages.
As a beginner, it is OK to start with one programming language, say Python, then maybe learn R later on, or you can learn the two languages concurrently.
The good thing is that you can pick up skills in Python and R as you go from data science training courses. Most data science training programs often start with fundamentals of programming. So if a data science specialization is going to be taught using R, they often start with a course such as R Basics or Python Basics, if the specialization is being taught with Python.
So if you have some basic programming background, you can actually teach yourself data science via self-study from online courses. You don’t need advance knowledge in Python or R programming to begin your journey to data science. You will learn and master these languages throughout your training as you complete homework assignments, read books, as well as utilize the numerous online resources available for providing help with R and Python programming.
If you are interested in data science specializations that can get you started with basic programming courses in R or Python, here are two of my favorite data science specializations that will teach you data science in Python and R as you go (for those with some prior exposure to basic programming):
(i) Professional Certificate in Data Science (HarvardX, through edX):https://www.edx.org/professional…
Includes the following courses, all taught using R (you can audit courses for free or purchase a verified certificate):
- Data Science: R Basics;
- Data Science: Visualization;
- Data Science: Probability;
- Data Science: Inference and Modeling;
- Data Science: Productivity Tools;
- Data Science: Wrangling;
- Data Science: Linear Regression;
- Data Science: Machine Learning;
- Data Science: Capstone
(ii) Applied Data Science with Python Specialization (the University of Michigan, through Coursera): https://www.coursera.org/special…
Includes the following courses, all taught using Python (you can audit most courses for free, some require the purchase of a verified certificate):
- Introduction to Data Science in Python;
- Applied Plotting, Charting & Data Representation in Python;
- Applied Machine Learning in Python;
- Applied Text Mining in Python;
- Applied Social Network Analysis in Python.
In summary, Python and R remain the two top programming languages in data science. In my personal experience, I use Python for machine learning applications, while I find R to be very useful for statistical analysis. Basically, everything that can be done with Python can be implemented in R as well. It is worthwhile to learn how to do data science in both Python and R, as that will help increase your chances of getting a job as a data scientists, since these languages are the top two languages mentioned in most data science job ads.