The JoinerDelay is an ClusterSyncService used in the sync chain and got
introduced as part of SLING-10489.
With SLING-10489 new-joining instances are ignored/suppressed by existing
instances in the cluster as long as they are potentially only partially
started up. The definition of partial-vs-full is the when everything
is written and the consistencyService (sync) would succeed. In other words
it includes: lease update / idMap / leaderElectionId / syncToken.
It is undefined how long a startup lasts and to avoid blocking other instances
from operating under a well-defined topology, the notion of ignoring/suppressing
partially started instances has been introduced.
Generally speaking there are the following different cases wrt changes
in the local cluster:
- only properties change -> that is handled separately already and not interesting here
- only leaving instances -> not interesting for this problem
- only joining instances: in this case, these joining instances get
ignored/suppressed until they are fully started and have written all needed discovery data.
Until that has happened, the existing instances don't do anything discovery-related
yet, ie they don't store syncToken yet neither. Thus, the newly joined instances,
once they are finished with the startup, they would have to wait for the
existing ones to yet take note of their full startup, of them writing their own
sync token, so that the new-joiners can see those sync tokens and finish.
So this case is perfectly fine simply with ignoring/suppressing.
- some leaving, some joining instances: it is this case which is a bit more tricky:
with SLING-10489 the joining instances are now ignored/suppressed until they
are fully started, so upon a cluster change they don't trigger any discovery
activity on the existing instances.
However, because there are also some instances leaving, the existing instances
will take note of a cluster change and *therefore* update the syncToken etc.
In that case, we have a new situation: the cluster change has been
announced in the existing instances, the existing instances wrote their new
sync token, but the new-joiners are still partially starting up.
Let's say now the new-joiners finish their startup, so they write down the
sync token. In that very moment the following happens concurrently:
(a) the new-joiners check the topology and notice that everybody else already
wrote the new sync token, so they can immediately go ahead and do a TOPOLOGY_INIT.
(b) the existing instances just now stop ignoring/suppressing the new-joiners
and then go through the consistencyService/syncing - but before they can do that,
they have to inform existing listeners with a TOPOLOGY_CHANGING. Since that
might take a while, it is realistic that the new-joiner already thinks they
are in the new topology *while* the existing ones haven't received
a TOPOLOGY_CHANGING event yet. And voila, we have a sort of short-lived split-brain.
Now usually this should really only be very short-lived, as all that is holding
back is TopologyEventListeners reacting to TOPOLOGY_CHANGING - plus then some
repository writes. So all of that shouldn't take too long. But it could be a few
seconds. And the aim of discovery is to provide guarantees that there are never
different topologies in the , aehm .., topology.
Now to fix this, we'd have to probably do another synching, which would be
unfeasibly complicated.
But there's a rather simple way out: we can artificially delay the new-joiners
from sending their very first TOPOLOGY_INIT. That way, if that delay is
bigger than the above described race-condition, things would be fine.
And that's what this JoinerDelay is about : delay new-joiner's TOPOLOGY_INIT.