Queries relating to StreamInsight 2.1

Category: sql server streaminsight

Question

Srivattsan S on Thu, 23 May 2013 05:34:48


Hi,

I was working with StreamInsight 2.0 and after a brief period I am going to work with StreamInsight 2.1. I am not an expert with StreamInsight. I have few queries with regard to what ever I studied in the documentation. 

The first query is what does it mean when they say historical data? Since StreamInsight is mainly designed for low latency and high throughput and when we say historical data it has to be stored in some database which involves the overhead of storing and retrieving, in this case how is low latency maintained.

The second query is when I observed the architecture of StreamInsight 2.0 and 2.1, I find that the SQL server to be removed. What was the purpose of SQL server in 2.0? If it served the purpose of storing static reference data and historical data, was it optimised to provide low latency in retrieving and storing data? Since 2.1 doesn't have SQL server to its architecture, how are the static reference data and historical data stored now?

The third query is instead of standing queries I see its standing processes that contains the queries. Are these processes inside the StreamInsight or do these processes contain processID and are observable in the task manager? Why was this change brought about?

Thanks in advance for the help. I hope these questions are not bad to be posted out here.

Regards,

Srivattsan S

Replies

TXPower125 on Fri, 24 May 2013 01:59:53


When the documentation talks about historical data, they are either talking about processing streams of data that have occurred in the past OR they are talking about historical data that is being used as reference data. Data is enqueued into StreamInsight via an input adapter or IEnumerable/IObservable source. It is dequeued out of StreamInsight via an output adapter or IObserver sink. Low latency is achieved by processing the event streams in memory.

StreamInsight is/was bundled with SQL Server. You do not need SQL Server to run StreamInsight. However, StreamInsight does require a valid SQL Server license. To use SQL Server to store reference/historical data, the developer will have to write adapters/sources/sinks to meet their needs.

Queries are part of the pre-2.1 legacy adapter model. StreamInsight 2.1 introduced a Rx Extension based approach (sources/sinks). The processes are StreamInsight 2.1 processes they are not the same thing as a process in the task manager. The move from the adapter model to the source/sink model was to make StreamInsight development easier. The adapter model required some tricky multi-threaded development.

Srivattsan S on Fri, 24 May 2013 04:04:10


Thanks for the quick reply. But still I couldn't get the point on where these historical data are stored? If its in the regular databases then the point of StreamInsight is lost. As for the adapter, when I developed some applications using 2.0 I didn't use threads to develop input/output adapters. They were taken care of by the StreamInsight. Are there any other reasons for the move to processes instead of queries in 2.1, other than making development easier? 

I really appreciate your patience to write the reply. I am just trying to understand the changes in 2.1 from 2.0 deep, so that I can decide on what to use for my future applications.

Thanks and Regards,

Srivattsan S

DevBiker on Fri, 24 May 2013 13:54:40


The historical data is stored in a database or file of some sort.

Now ... you say that you don't get the point. There are three key points for utilizing historical data. First, that's your reference data. Do you want to compare to the average over the past 6 months? Do you have a list of parameters for your events that you want to compare against? That's all historical data and better served in a standard datastore and then brought into StreamInsight as reference data that enriches your real-time streams. Second, how are you going to test your queries and algorithms? By running historical datasets through your queries. The trick is to have a system that decouples the input/source from the actual analytics query. Our accelerators framework does that so it's absolutely possible. Finally, leveraging StreamInsight's temporal analytics on historical data/readings at very high speed. An example ... on a project, a customer gave us about 4 months worth of historic data for testing. BUT ... what they also found was that the temporal analytics allowed them to "see" patterns and events that were hard to do with traditional table-based analytics. And we were able to run all 4 months of data through all of the analytics using the original timestamps in about 45 minutes. It's like watching a moving is super-high-speed fast forward.

Srivattsan S on Tue, 04 Jun 2013 14:09:42


I am sorry I had to unmark it as answer cause my question is not completely answered yet. As TXPower125 mentioned that adapter development required some tricky multi-threaded development. But when I worked with SI 2.0 I didn't create any threaded adapters. Also the query on the move to a process model is yet to be clarified. Though TXPower125 mentioned that the process model eases the development of SI application, can some one clarify on how its so. As far as I can infer, I guess the process model helped in the StreamInsight team develop and efficiently handle the StreamInsight 2.1. Yes I agree now that I am able to name the sources and sinks, it gives me the opportunity to link multiple queries to a single stream but I need more clarity on the process.

Thanks !!

Srivattsan

TXPower125 on Tue, 04 Jun 2013 19:19:54


The adapter lifecycle itself is multi-threaded. The complexity in your adapters will depend on what data source you are coding against. What types of adapters were you writing?

As far as how you can tell the StreamInsight 2.1 APIs are easier to code against, just look at the code you have to write. It's much simpler to use Rx Extensions with LINQ. Take a look at the documentation and examples on MSDN.

Probably the best reason to move to 2.1 is as follows:
Input and output adapters were introduced in an earlier version of StreamInsight. Though they have been superseded by the current development model, they are still available for developers who are maintaining legacy code. This section provides information on this legacy model.
Source: http://msdn.microsoft.com/en-us/library/hh995354(v=sql.111).aspx

Srivattsan S on Wed, 05 Jun 2013 09:02:47


Thanks TXPower125 for the reply. I went through the documentation and examples on MSDN and that's how these queries popped on my head. I was working with simple adapters, but I was of the opinion that adapters needn't be explicitly threaded by the developer.

Thanks for the replies. 

DevBiker on Wed, 05 Jun 2013 14:23:58


Adapters don't need to be explicitly threaded by the developer ... but StreamInsight is inherently multi-threaded. You have to keep in mind that you may be enqueuing your items when your Stop request comes in ... on a completely different thread. That's where it gets really tricky.

Take a look at http://www.devbiker.com/post/StreamInsight-Output-Adapter-Lifetime.aspx.