useR! 2016 conference and Microsoft

useR! 2016

On June 28th 2016, Microsoft announced some updates on Microsoft R Client, their free, community-supported, data science tool for building high performance analytics using the full set of ScaleR functions.

After the useR! 2016 conference (June 27 – 30), Microsoft’s Channel 9 posted a lot of interesting material on their site.

userR2016

What is so special about this? Since the acquistion of Revolution Microsoft has made a giant leap forward in the support of R in their products. During the Conference there was in interesting session from David Smith, the R community lead. It can be found here (19 minutes).

https://channel9.msdn.com/Events/useR-international-R-User-conference/useR2016/R-at-Microsoft/player?format=html5

So what is all this about R and Microsoft? A bit of background is needed here…

RRO and MRO

A while ago Microsoft bought Revolution Analytics, these people made it possible to do distributed calculations in R. Their version (or enhancement) of R was called Revolution R Open (RRO) at the time.

R_MS

Since then Microsoft has rebranded this as Microsoft R Open (MRO). The current version is 3.2.5 based on the same version of R. MRO is open source. They added some enhancement specifically around multi-threading.

Microsoft R Server and R Client

Microsoft R Open provides limited performance and scalability in comparison to Microsoft R Server and R Client Editions. R Server and Microsoft R Client have proprietary ScaleR functions and packages included that are not available in standalone Microsoft R Open.

Please note that the GUI part of R Clients is integrated in Visual Studio (any versions – at time of writing this post). Below a short video explaining R Client:

And there is more “R”

Next to the Client and the Server there is also support for R in SQL Server (2016).
Applications can call the R run-time and retrieve predictions and visuals using T-SQL.
With SQL Enterprise Edition, you also get the ScaleR libraries to overcome R’s inherent performance and scale limitations.

ScaleR

Revolution Analytics had built a an extension on R called ScaleR. ScaleR provides data scientists with a range of R algorithms that provide transparent parallelization of computations and data analysis. The focus is here on data scientists. ScaleR can scale-out (parallelize) certain parts of the data science track, such as:

  • Data Preparation: import, sort, merge, split, …
  • Descriptive Statistics: min, max, stddev, var, quantiles,…
  • Data Visualization: histogram, line, ROC, tree,…
  • Statistical Tests: chi^2, fisher, …
  • Parallelized Statistical Modeling Algorithms: linear & logistic regression, GLM, K-means,..
  • Machine Learning Capabilities: Decision Trees, Random Forests, …

For a full overview see following PDF.

The bottom line is here that this is focused on pure data science work. Parts of the ScaleR are available on the R client also, see below in the comparison features table.

MKL

Next to that there is MKL, this stands for (Intel) Math Kernel Library and as its name already suggests is not really related to R or Microsoft’s version. MKL is a set of threaded and vectorized math routines that work to accelerate various math functions and applications. There are packages available that allow R (also MRO) to exploit the possibilities of MKL. On this article you can see that installing this library can have a serious impact on performance, all depending on the type of calculations of course.

So, MKL is more a general super-fast mathlib, that can benefit to the work of the datascientist as where ScaleR goes much further and offers this scalability on a higher level.

A comparison of features

Following overview is a reformatted list found on the Microsoft site:

Features Microsoft R Open Microsoft R Client Microsoft R Server
Data sizing In-memory bound.

Can only process datasets that fit into the available memory.

In-memory bound.

Can process datasets that fit in available memory

Operates on large volumes connected to R Server

Disk scalability.

Operates on bigger volumes.

Speed of Analysis Multi-threaded when MKL is installed for non-ScaleR functions Multi-threaded with MKL for non-ScaleR functions

Up to 2 threads for ScaleR functions.

Full parallel threading & processing
Enterprise
Readiness
Community support Community support Commercial support
Commercial
Viability
Open source Free for everyone Commercial licenses

It seems that there is plenty of choice when riding the R wave at Microsoft.