Resis1 on Fri, 29 Nov 2013 18:31:37
I have a use case below whih I 'think' streaminsight is suitable for - just want a bit of advice if you guys think it is in fact suitable for StreamInsight..
We have a process used to price our products, when a price is required the process collects numerous piece of information. Some of the information that is required has a 'real-time' nature to it (and hence why I'm looking at StreamInsight).
The real-time information required is:
x number of events in last 30 seconds, 30 minutes, 1 hour, 24 hours per customer that has used the system (customer ID)
for the last x events a given set of info per customer (e.g. EventA: attribute 1,2,3,4 EventB: attribute 1,2,3,4 ...) this requirement is time independant, i.e. the time between events could be 30 seconds or 24 hours, we just want the last x of them.
The tricky bit is the pricing process will come only at any time and want the most recent set of the above data.
Can I come back to StreamInsight on an adhoc basis and ask the questions above for a given customer, or do I need to have the queries running with the results going to a sink (say a NoSql/SQL DB) and query that instead when I want to know the results.
From what I've seen of StreamInsight and from playing with the examples it seems a really good way to do the temporal counting aspect, not too sure about the last x events side though. but I am struggling with how I would get the info to the pricing application.
Thank you in advance for any insight/replies etc.
Allen Li - MSFT on Mon, 02 Dec 2013 05:45:06
Thank you for your question.
I am trying to involve someone more familiar with this topic for a further look at this issue. Sometime delay might be expected from the job transferring. Your patience is greatly appreciated.
Thank you for your understanding and support.
TXPower125 on Tue, 03 Dec 2013 05:30:59
Do you need the last x number of events in a given time period y or do you need the number of events x that occurred in the time period y?
If it's the former, your sink(s) should probably resemble a queue. You could use SQL DB or NoSQL depending on throughput requirements. Depending on what your memory situation is, you might be able to just do it all in RAM and not bother with the overhead of disk I/O.
Remember, StreamInsight does not work like a relational database where you make a query and then get a response. Rather, your StreamInsight queries are constantly yielding results that you consume through an output adapter/sink.
SandeepChalke on Tue, 03 Dec 2013 11:03:41
In addition this might help-
Resis1 on Tue, 03 Dec 2013 11:45:21
"Do you need the last x number of events in a given time period y or do you need the number of events x that occurred in the time period y?"
If i've understood your distinctions both :-)
May be worth attempting to break this down a little more:
So we have Customer 1 acting on a process which produces some events:
- Event @ 10:00
- Event @ 10:05
- Event @ 10:10
- Event @ 10:11
- Event @ 10:16
- Event @ 10:22
Ony of my Queries would be how many events did customer 1 do in the last 10 minutes, this would give me the following output events:
@10:10 = 3
@10:20 = 2
@10:30 = 1
I'm happy with this scenario - built a small little POC over the weekend with MongoDB as my sink - All good!
This answers the second part of your question (I think) "do you need the number of events x that occurred in the time period y"
My other query would be:
Give me the last 3 events for customer 1 (over any timeframe).
Here I would want event 4,5 and 6.
Is this something suitable for Streaminsight? I'm thinking it isn't and really I should be just passing the events to a 'store' and let ti worry about maintaining a list of the last x number of events.
Resis1 on Tue, 03 Dec 2013 11:54:20
Thanks Sandeep - Reading up on checkpoints is on my to do list!
TXPower125 on Tue, 03 Dec 2013 21:24:46
Yeah, the first part of your scenario is just a hopping window, group by, and count to get your results.
The second part of your query will be constantly changing depending on the frequency of events. If you don't need to store all the events, I would probably create a sink that maintains the last 3 events for each customer in memory that you could query via WCF or something along those lines. How many customers are we talking about?
Resis1 on Wed, 04 Dec 2013 09:12:21
First of all thank you for your responses - much appreciated!
Overall we're talking about 17 million as a maximum (it would take some time to get to this point) and for each event we'll be holding under 10 attributes.
We may going off-topic slightly, but what would you use to implment a sink that would maintain only the last number of events?
(To put it in context - i'm researching the general area to make a recommendation to proceed with streaminsight to solve our current problems, however my background isn't .net development (various other obscurce languages).
** I've just realised, my question in this post was a little dumb - lots of methods to implement somethnig like that (lists, arrays, etc etc).
I still be interested in the approach you would take though.