Azure and R for Data Transformation: How to?

Category: azure data factory


nilsole on Wed, 28 Dec 2016 18:42:16

Looking for advice and your recommendations: A customer of mine wants an Azure-based architecture where they can continuously update and transform data, preferably using R.

The requirements are quite simple:

  • Tiny piences of data (about 10 numeric rows per minute) are sent into the cloud (via DB connection or FTP, ...).
  • In the cloud, the following happens: the data is fetched, transformed by an automatic R process, then updated in a DB.
  • The cloud task will be executed once per minute.

How should I implement this using MS Azure? Stack/tasks? I thought about the following two options:

  • Use Azure Data Factory as the scheduled task runner; use an Azure SQL DB for the storage; use a custom .NET activity for the R transformation; this activity requires usage of HDInsight Cluster with R included and is somewhat complicated in the setup [1]. 
  • OR: Use Azure ML as the task runner and execute the R job from within ML.

But when I use Azure ML: Can I schedule jobs once per minute? And I am not sure whether Azure ML is meant for data transformations at all. Thanks for your advice, looking forward to your opinion.




Alexandre Gattiker on Thu, 05 Jan 2017 08:51:29