nilsole on Wed, 28 Dec 2016 18:42:16
Looking for advice and your recommendations: A customer of mine wants an Azure-based architecture where they can continuously update and transform data, preferably using R.
The requirements are quite simple:
- Tiny piences of data (about 10 numeric rows per minute) are sent into the cloud (via DB connection or FTP, ...).
- In the cloud, the following happens: the data is fetched, transformed by an automatic R process, then updated in a DB.
- The cloud task will be executed once per minute.
How should I implement this using MS Azure? Stack/tasks? I thought about the following two options:
- Use Azure Data Factory as the scheduled task runner; use an Azure SQL DB for the storage; use a custom .NET activity for the R transformation; this activity requires usage of HDInsight Cluster with R included and is somewhat complicated in the setup .
- OR: Use Azure ML as the task runner and execute the R job from within ML.
But when I use Azure ML: Can I schedule jobs once per minute? And I am not sure whether Azure ML is meant for data transformations at all. Thanks for your advice, looking forward to your opinion.
Alexandre Gattiker on Thu, 05 Jan 2017 08:51:29
- If the database is SQL Server 2016, you can use SQL Server R Services to do the processing directly in the database: https://msdn.microsoft.com/en-us/library/mt604845.aspx
- If using Azure Data Factory, you don't need HDInsight unless you need to scale out large computations. You can use Azure Batch that will provision a VM on-demand for your job. See a template for using R at https://github.com/Azure/Azure-DataFactory/tree/master/Samples/RunRScriptUsingADFSample. However ADF jobs cannot be scheduled at a higher frequency than 15 minutes.
- You could use R Server Operationalization to expose your R script as a web service (https://msdn.microsoft.com/en-us/microsoft-r/operationalize/about), and set up a Logic App to run at a schedule or upon an external trigger, and call your web service and then call a SQL stored procedure.
- Azure ML does not allow to schedule jobs