Cortana Intelligence – revisited – Part I

Jun 12, 2016

Overview

In an earlier post we mentioned the “Cortana Analytics” suite as it was called at that time. The name has changed to Cortana Intelligence suite in the meantime.

There is an interesting link on the site: “What’s included”. Microsoft put 5 different categories of tools and solutions under the Cortana umbrella. We dive a bit deeper in them in this and future posts. There is a nice graphical overview on the Microsoft site, the different categories are the boxes in blue:

CortanSuiteWhiteBackGround

The categories:

Information Management
Big data stores
Machine learning and advanced analytics
Dashboards and visualizations
Intelligence

In a series of posts we will go through all of them and where possible do the hands-on tutorials!

Note that the tutorials require an Azure subscription. All of the tutorials we did were also possible on a trial subscription on Azure. We provide links to the tutorials so you can test it yourself.

Information Management

Under the information management category ou can find three subdivisions:

1. Data Factory

This is basically the tooling around creating, monitoring and storing the data. From the page:

Create, schedule, orchestrate, and manage data pipelines
Visualize data
Connect to on-premises and cloud data sources
Monitor data pipeline health
Automate cloud resource management

So, what is this all about?
It is a collection of tools to automate, orchestrate and transform data. What is usually called ETL (extract, transform and load).
There is a dedicated service in Azure that provides pipelines where you can transform your data. You create (one or more) data sets that consume certain activities. A pipeline is a group of activities. The activities run on so-called linked services.

In order to build these different components you can use several methods: Powershell, Visual Studio or the Data Factory Editor. We did the test with the Data Factory Editor tutorial.

azure-data-factory — Sample pipeline in the Data Factory Editor

The tutorial is clear and easy to follow, at the end of the exercise we created a pipeline with a transformation activity (HDInsight Activity) that runs a Hive script on an on-demand HDInsight cluster. Nice!

2. Data Catalog

According to the site the data catalog is a “place” to let you find the data you need. From the page:

Spend less time looking for data, and more time getting value from it.
Register enterprise data assets.
Discover data assets and unlock their potential.
Capture tribal knowledge to make data more understandable.
Bridge the gap between IT and the business, allowing everyone to contribute their insights.
Let your data live where you want it; connect with the tools you choose
Control who can discover registered data assets.
Integrate into existing tools and processes with open REST APIs.

The data catalog is a place that lists all data sources that users can access in your company. First, as an administrator you register the different sources in the catalog. You can enrich the sources with extra information (metadata) that can be used for searching. Please note that the data itself is NOT copied to the catalog, but you can choose to upload “preview” data.

data-lake-in-portal — Data Catalog, tile view

The purpose is two-fold:

1) it is a catalog that you can use to find and browse for data sources, check the data-model and get some preview of the data (if applicable). Also, users can annotate the content with their own words.
2) You can use the catalog to consume data sources, it acts then like some kind of “proxy” to the actual data source. In this way users do not need to know all the technical details of the data source.

We did the test, based on the exercises provided here.

Once again, the tutorial is clear and easy to follow, we had some hiccups with certain parts, but that were beginners errors.
⇒ Only one Data Catalog can be made per subscription (at this date), but why need more?
⇒ You need Azure Active Directory setup, no access with other accounts.

3. Event Hubs

The purpose of the even hubs is to log events and to connect devices. It is what is often called a “publish-subscribe” model where you can log millions of events per second and send them (“streaming”) into different applications. The is a very good description of the concepts in this article.

And there is even a tutorial, it can be found here. We didn’t test it yet but will come back to it later in another ioT context.

To see what the infrastructure can take on incoming (ingress) and outgoing (egress) and how much and long data stored, you can refer to following FAQ.

We will do some practical testing in the future where we will test the infrastructure with actual data coming from ioT devices. Note that this infrastructure relies completely on Azure. There is a possibility to work in a hybrid scenario, where the Service layer is on premise. This might be a good situation for high volume and high velocity data. You can find more info here.

Conclusion

The content on the site is evolving fast (see some article dates on the samples and tutorials). It is worth mentioning that in the few trials we did we got very good support from the Azure team.
It seems that Microsoft is working very hard to get their Information Management concepts and tools in place. Something to follow in the coming months…

In our next posts we will take a closer look at what the “Big data stores” and “Machine Learning and advanced analytics” is all about.

3 responses to “Cortana Intelligence – revisited – Part I”

Cortana Intelligence – revisited – Part II

July 1, 2016 at 4:20 pm

[…] put 5 different categories of tools and solutions under the Cortana umbrella. In the first post we discussed about the “Information Management” part, in this post we focus on […]

LikeLike
Cortana Intelligence – revisited – Part III – Data Science on Azure

September 19, 2016 at 11:07 am

[…] put 5 different categories of tools and solutions under the Cortana umbrella. In the first post we discussed about the “Information Management” part, in the second […]

LikeLike
Cortana Intelligence – revisited – Part IV – Data Science on Azure

October 18, 2016 at 3:34 pm

[…] put 5 different categories of tools and solutions under the Cortana umbrella. In the first post we discussed about the “Information Management” part, in the second […]

LikeLike