HESSAMZAK on Thu, 08 Oct 2015 21:07:56

I'm new to both Azure Data Factory and Application Insight and curios to know whether I can get insight about my HDInsight workflows (defined in Data Factory) using Azure Application Insights?



Gaurav Malhotra-MSFT on Tue, 05 Jan 2016 19:57:58

Thanks for the query. Currently, Data Factory workflows are not hooked up with Azure Application Insights.

Santosh Veluguri on Tue, 22 Aug 2017 13:37:04

I know its an old thread. Any new updates on this? It would be nice to have direct integration between Azure Application Insights and Data Factory.

JV_SubtleTech on Fri, 25 Aug 2017 00:37:05


I embarked on the same journey as you have.  If you are looking at getting application insights so that you can process and do analysis on them because of the complete lack of good overall system documentation, the only two things I have found to help are to turn continuous export on for the data and send it to BLOB and both I set up with a Azure SQL DB so I could connect PowerBI etc to it afterwards.  Then do one of the two following:

1. Use a Data Factory Pipeline Job (lower cost, only meant for applications with lighter loads typically): Use a copy that sources from your blob using partitionedby within the dataset to properly pull (recursively) all data from folders for the current date.  This works very well however, I believe if you have lots of data being written or the continuous export delays it's final file drop for the day you may lose data.  It only would have a failure in the last hour of the day in terms of data loss (usually not an issue unless the site is busy all the time).  Make sure the mapping copies into a database within the pipeline activity and the dataset output is properly mapped to your SQL tables (or DW).

2. Use a Stream Analytics job. (much higher cost, provides continuous rapidly inserted data from the continuous blob exports):  This is EXPENSIVE comparatively as it costs nearly $60 a month but does allow you to have constantly updated application insight data.  It also ensures you aren't getting duplicate data, if you write 1. without partitionby and/or accidentally select the whole requests folder for instance, you will get tons of duplication.

I am sure there is something better out there, and 1. would work really good if MS actually matured their products at a pace that doesn't seem like its from the 60's in terms of tech.  But sadly ADF is poorly supported and they are more worried about adding connections and different types of data instead of making sure it's core activities (in addition to adding some badly needed ones) are actually useful in real scenarios and are easy to use.

Good luck with getting things to work within your budget.