useR! 2016
On June 28th 2016, Microsoft announced some updates on Microsoft R Client, their free, community-supported, data science tool for building high performance analytics using the full set of ScaleR functions.
After the useR! 2016 conference (June 27 – 30), Microsoft’s Channel 9 posted a lot of interesting material on their site.
What is so special about this? Since the acquistion of Revolution Microsoft has made a giant leap forward in the support of R in their products. During the Conference there was in interesting session from David Smith, the R community lead. It can be found here (19 minutes).
https://channel9.msdn.com/Events/useR-international-R-User-conference/useR2016/R-at-Microsoft/player?format=html5
So what is all this about R and Microsoft? A bit of background is needed here…
RRO and MRO
A while ago Microsoft bought Revolution Analytics, these people made it possible to do distributed calculations in R. Their version (or enhancement) of R was called Revolution R Open (RRO) at the time.
Since then Microsoft has rebranded this as Microsoft R Open (MRO). The current version is 3.2.5 based on the same version of R. MRO is open source. They added some enhancement specifically around multi-threading.
Microsoft R Server and R Client
Microsoft R Open provides limited performance and scalability in comparison to Microsoft R Server and R Client Editions. R Server and Microsoft R Client have proprietary ScaleR functions and packages included that are not available in standalone Microsoft R Open.
Please note that the GUI part of R Clients is integrated in Visual Studio (any versions – at time of writing this post). Below a short video explaining R Client:
And there is more “R”
Next to the Client and the Server there is also support for R in SQL Server (2016).
Applications can call the R run-time and retrieve predictions and visuals using T-SQL.
With SQL Enterprise Edition, you also get the ScaleR libraries to overcome R’s inherent performance and scale limitations.
ScaleR
Revolution Analytics had built a an extension on R called ScaleR. ScaleR provides data scientists with a range of R algorithms that provide transparent parallelization of computations and data analysis. The focus is here on data scientists. ScaleR can scale-out (parallelize) certain parts of the data science track, such as:
- Data Preparation: import, sort, merge, split, …
- Descriptive Statistics: min, max, stddev, var, quantiles,…
- Data Visualization: histogram, line, ROC, tree,…
- Statistical Tests: chi^2, fisher, …
- Parallelized Statistical Modeling Algorithms: linear & logistic regression, GLM, K-means,..
- Machine Learning Capabilities: Decision Trees, Random Forests, …
For a full overview see following PDF.
The bottom line is here that this is focused on pure data science work. Parts of the ScaleR are available on the R client also, see below in the comparison features table.
MKL
Next to that there is MKL, this stands for (Intel) Math Kernel Library and as its name already suggests is not really related to R or Microsoft’s version. MKL is a set of threaded and vectorized math routines that work to accelerate various math functions and applications. There are packages available that allow R (also MRO) to exploit the possibilities of MKL. On this article you can see that installing this library can have a serious impact on performance, all depending on the type of calculations of course.
So, MKL is more a general super-fast mathlib, that can benefit to the work of the datascientist as where ScaleR goes much further and offers this scalability on a higher level.
A comparison of features
Following overview is a reformatted list found on the Microsoft site:
Features | Microsoft R Open | Microsoft R Client | Microsoft R Server |
---|---|---|---|
Data sizing | In-memory bound.
Can only process datasets that fit into the available memory. |
In-memory bound.
Can process datasets that fit in available memory Operates on large volumes connected to R Server |
Disk scalability.
Operates on bigger volumes. |
Speed of Analysis | Multi-threaded when MKL is installed for non-ScaleR functions | Multi-threaded with MKL for non-ScaleR functions
Up to 2 threads for ScaleR functions. |
Full parallel threading & processing |
Enterprise Readiness |
Community support | Community support | Commercial support |
Commercial Viability |
Open source | Free for everyone | Commercial licenses |
It seems that there is plenty of choice when riding the R wave at Microsoft.
1 Comment
Comments are closed.