The thing with ML Studio…

overviewofazureml_960
I have been using the ML Studio from Microsoft for a while now, but not enough to get the product “in my fingers”, meaning I can find my way around it without thinking too much, but I still spend too much time in finding the right building blocks.

And that is exactly the problem with the tool.

Let me explain…

First for those who have no clue what ML Studio is all about, please go to the site here or read one of my earlier blog posts.

ML Studio is a wonderful tool for doing data science, it is fully browser-based and has a workflow-like drag and drop interface. You start by creating an experiment (or a project with several experiments). You can add simple steps such as remove a column from a data set or more complex steps to evaluate different algorithms and see which one is the best.

I finished the Microsoft Data Science Curriculum and the purpose there is to create an experiment to predict when people default on their loan. The idea is, of course, to do all the work in ML studio using the available building blocks. This works all very well, but I found myself going back too often to R-Studio to do some quick coding, testing and then … pasting the code in an “Execute R code” block on the canvas.

So, I was wondering, why do I do this?

  • Faster data cleaning: I found myself doing the data retrieval and cleanup in R studio, once I was ready I pasted it in ML Studio. It so much faster to interactively type in your code, look at the results, make modifications, run again and so on… Even the basic plotting to check for “weird” data was dmlview1one using R Studio. My personal feeling is that in this case, speed was the most important factor to switch back and forth from R studio to ML studio. In my case, it is just that bit more comfortable to do this in R.
    On the other hand, the “Visualize” option at each step in the ML Studio is very neat: you can get a quick overview of the data using a “tiny” graph in the header of the resultset.
  • Too much information: Another inconvenience I had, was finding the right “block”. For example to remove a column from the data set I typed in “remove”, “delete”, but the search didn’t show me what I was looking for, the correct block is “Select columns from dataset”. Of course, this makes sense, but by that time I was manipulating columns in R Studio a lot faster.
  • Variations of your model: sometimes you just want to check a different approach without changing the whole model. What I usually do in R is create a function for each approach and then call this to see the different results. This is a bit more awkward in ML Studio because you soon get a clutter of boxes and arrows. It would be great if there was some IF…THEN in the building blocks. Ideally you could then work with global variables to change the condition. It would be great if you could collapse a certain group of blocks to reduce them to a single block.
  • Comparing Experiments: it would be great if you could compare (settings of) experiments, for example if there was a possibility to export the experiment to an XML file, you could then use this to compare the different settings for each of the steps. Now you have to switch between your saved models.

On the other hand…

Once you are fully into ML Studio it is an amazing tool, the “right-click” on each step is extremely useful and you get used to it very quickly: the ability to get a quick preview and exporting the current resultset to a dataset you can use in other experiments. It is very easy to copy parts of the experiment using “copy & paste”, certainly in the situation where you want to compare different models.

In many of my tests I found that the execution speed of the experiment was usually very fast. For some of the algorithms it was  faster than my Core i7 with 16GB.

Working with R-Studio and family is not always easy and straightforward. IT people (developers) get quickly used to the IDE and are up and running in no time. But, you can spend hours and days browsing through the many packages in CRAN to find the one that suites you.

Conclusion

As with a lot of these tools, you do need to put some effort in them before they become productive. I guess for people with an IT and programming background the R-Studio environment is more productive, but for others ML Studio is the way to go.

I haven’t decided yet, so I will continue to use both.