jackhanbond on Sun, 16 Jul 2017 20:02:23

A recent issue made me aware of a VERY SUBTLE Service Fabric actor behavior that other developers should be aware of. The following is stated on this page,

"The Reliable Actors runtime provides a simple turn-based access model for accessing actor methods. This means that no more than one thread can be active inside an actor object's code at any time."

There is one important caveat to that. That guarantee does not cover when the same actor is running on two different nodes. This can happen when an actor method has started on a node, the node has been demoted to secondary, and another request is made against the same actor, this time running on the newly promoted primary.

My company discovered this when we were doing some low memory testing and an actor method was stuck. It looked something like this...

Actor method Foo is called while Node 0 is primary

Node 0 is demoted to secondary, Node 1 becomes primary

Foo for same actor is now called while Node 1 is primary

Foo is now running on both Node 0 and Node 1

Foo completed on Node 1

Food completed on Node 0

Due to the work which our method "Foo" was doing, we were seeing some seemingly impossible results. Just so people know, the SF team was VERY HELPFUL, and did provide code to prevent this from happening, but I thought this might be helpful to others.