In July last year I started the Microsoft Professional Program Certificate in Data Science. In February ’17 I successfully passed the final exam. In this post I will post my findings. You can post your questions and remarks in this LinkedIn group.
How does it work?
The program is hosted on the edX platform, once you login you are redirected to their site. You can start the program, one course at the time. Each course has a start and end date:
You can take the courses for free it you want. But if you pay a small fee (between $45 and 90$) per module you get a verified certification. Each module runs for a limited period, so you have to pass the required score before the course ends.
Most of the time the courses are split in the following scheme:
- Some theory, presented by a trainer.
- Demo’s presented by the same or another trainer, typically explaining and showing the theory with hands-on products and demos.
- Online questions, usually multiple choice. Sometimes you get only one chance to answer, other times you can have 2 or 3 tries.
- Exercices or Labs where you have to use hands-on products, most of them are done (of course) in Azure. In these cases you have to provide results from calculations you have done in the Lab.
The final project is a model in Azure ML Studio, more on that later.
You can download all lab materials and the videos. These are usually between 5 and 15 minutes and sound and video are good quality.
It is very easy to browse through the modules and skip certain parts if you want to. For most of the modules you only need a passing score of 70%, so you do not need to finish all quizzes, exercises or final tests to pass and move on to the next module.
What is in it?
There is a lot of documentation on the content of each module, but in short: there are different units and each unit consists of a certain topic. Each topic is made out of several videos and labs (from a few to 10, depending on the unit).
Unit 1 : Fundamentals
There are four courses in this unit:
- Data Science Orientation: Here you learn the basics on data science, it runs through the typical statistical subjects like exploring your data, variance, correlation, comparison tests and so on. It is an introduction for people who have never heard of data science or even statistics. There is more detailed information in next sessions.
- Querying Data with T-SQL: The other course in this unit focuses solely on SQL, the query language used to query relational databases. It is a very good introduction, well presented. For ICT people there is nothing new here, although it can be a good refresh.
- Analyzing and Visualizing Data with Excel or Power BI: For the third part you can choose between Excel and Power BI. For those who have never worked with Power BI, this is a very good introduction. You won’t become a specialist but you get to know what the tool is all about.
- Statistical Thinking for Data Science and Analytics: This course is not given by Microsoft but by Columbia University. The content is primarily on statistical concepts. There are modules on visualization and Bayesian modeling. I personally find this one the less interesting, it was a bit too theoretical and the exercises were not very challenging.
Unit 2: Core Data Science
- Introduction to R or Python for Data Science: In these modules you get a quick overview of Python or R, depending on the path you took. The course is done by the people from DataCamp and very will presented. The content is well structured and not difficult to follow.
- Data Science Essentials: In this part you get a few introductions on general statistics and managing data. There is also a first introduction series on machine learning as a step stone to the next module. This is an easy to follow module, not too technical and with a lot of samples.
- Principles of Machine Learning: In this module you learn the typical techniques involved in ML: classification algorithms, regression, trees, clustering and recommender systems. If you have never heard of these things before, this module can be a bit more difficult to follow and take more time. It is an import part of the course and you should try to understand all these concepts.
Unit 3: Applied Data Science
- Programming with R or Python for Data Science: This is basically the next step in learning R or Python. Depending on your “programming” background you can quickly run through this.
- Applied Machine Learning or Developing Intelligent Apps or Implementing Predictive Solutions with Spark in Azure HDInsight: you have the choice of doing one of these tracks. I took the Developing Intelligent Apps track. Although the idea and the material is ok, I found this one rather disappointing. I spent a lot of time in Visual Studio and I can imagine for people who have never used this before (or done programming) this might be a bridge too far. There were also a few issues with bugs in certain labs?
Unit 4 : Capstone project
The capstone project is about entering a competition on the Cortana Intelligence Competition. I won’t go too much in detail, but you receive a bunch of data and you need to predict a certain value. You have to use most of the techniques you learned during the course. In order to pass you need to obtain a certain percentage, in my case it was 70%.
If you use the basic techniques and stick to “keep-it-simple”, it was not difficult to achieve this score, but I spent quite some time in optimizing the model to obtain higher scores.
How does it compare to other MOOCs?
The last 2 years I took 3 of these MOOCs, all of them in the “paid” version.
- Johns Hopkins University Data Science (+ the executive one)
- Machine Learning from Stanford
- This one, the Microsoft Professional Program for DS
As with a lot of things in life, there is no clear winner…
The JHU course was way more tougher than this one and I really needed the 12 to 16 hours per week to complete it. As you can read in my review, some of the modules were really hard (and boring), but achieving the certification made it all worth it.
I really liked the ML course from Prof Andrew Ng, this is still my favorite ML course.
If we could enhance the pure statistical and theoretical parts in this course with the parts from JHU and replace the ML part with the Stanford one (in R or Python this time and not in Octave ;-)) we would have a winner!
The strong points
The general look&feel is good, although I prefer the Coursera way of navigating and presenting results, but that is a personal matter of course. The quality of the video’s are very good, clear images, clear sound.You can discuss on the prestation skills of some the presenters, but again this is a personal preference. I found them all ok and easy to understand.
- The track on the Statistics was not my favorite, it didn’t live up to the expectations and is the odd one out in the track.
- There were some errors in the exercises and some issues with the final project, bit I am sure these will resolved in next sessions.
So, why follow this track.? Well, first of all it gives you a decent overview of data science in the Microsoft world. Most of the quality of the material is good and if you take the verified (paid) version you get a certificate. The track forces you to look into all products from Microsoft related to data science, some of them you might have never heard of or used before. Another good reason to take this track.