Distributed Systems: C6 Termination Detection (TD)
Distributed Systems: C6 Termination Detection (TD)
Distributed Systems: C6 Termination Detection (TD)
Distributed Systems
- Master in CS -
C6
Termination Detection
(TD)
Fall 2020
Introduction
Context
– In DS, a problem is solved by the
cooperation of a set of processes
– In some applications, the problem to be
solved is divided into many subproblems,
and the execution of a subproblem cannot
begin until the execution of the previous
subproblem is complete
Determining computing termination - a
fundamental problem in DS
2
Introduction
In a DS inferring if a DC has ended is
essential
the results produced by the computation can
be used (may be in solving other problems)
DC termination detection - a difficult
problem
– Problem complexity due to
» No process has complete knowledge of the global
state, and
» Global time does not exist
Introduction (2)
A DC is considered to be globally terminated if
– every process is “locally terminated” and
– there is no message in transit between any processes
“locally terminated” state
– a state in which a process has finished its
computation and
– will not restart any action unless it receives a
message
In the TD problem
– a particular process (or all of the processes) must
infer when the corresponding computation has
terminated
TD algorithms
– infer if a certain DC has terminated
4
Introduction (3)
Two distributed computations taking place in the
distributed system
– The business logic computation and
– The TD algorithm
Basic messages
– Messages used in the business logic computation
Control messages
– Messages used by TD algorithms
A TD algorithm must ensure
– Execution of a TD algorithm cannot indefinitely delay
the underlying computation (i.e. the execution of the TD
algorithm must not freeze the business logic computation)
– The TD algorithm must not require new communication
channels between processes
TD algorithms based on:
– Distributed snapshot collection,
– Weight throwing,
– Spanning-tree
6
System Model (2)
At any given time during execution
of the DC, a process can be in only
one of the two states:
– Active (or busy)
» it is doing local computation
– Idle (or passive)
» the process has (temporarily) finished the
execution of its local computation and
will be reactivated only on the receipt of a
message from another process
8
System Model (4)
Since we are not concerned with the
initialization problem:
– we assume that all processes are initially idle
and
– a message arrives from outside the system to
start the computation
A message can be received by a process
when the process is in either of the two
states, i.e., active or idle.
– On the receipt of a message, an idle process
becomes active
The sending of a message and the receipt
of a message occur as atomic actions
10
10
System Model (6)
Definition of termination detection
Notations
– pi(t) - the state (active or idle) of
process pi at instant t and
– ci,j (t) - the number of messages in
transit in the channel at instant t from
process pi to process pj
A distributed computation is said to
be terminated at time instant t1 if:
( Vi, pi(t1) = idle ) ^ (Vi,j, ci,j(t1) = 0)
11
11
12
12
Termination detection using
distributed snapshots
Informal description
The main idea behind the algorithm
– when a computation terminates, there must exist a
unique process which became idle last
The process that request termination detection test
or any external agent may collect all the local
snapshots of a request
When a process goes from active to idle
– it issues a request to all other processes to take a local
snapshot, and
– also requests itself to take a local snapshot
When an idle process receives the request
– if it agrees that the requester became idle after itself, it grants
the request by taking a local snapshot for the request
A request is said to be successful if
– all processes have taken a local snapshot for it
If a request is successful:
– A global snapshot of the request can thus be obtained and the
recorded state will indicate termination of the computation
Termination detection in the recorded snapshot
– In the recorded snapshot, all the processes are idle and there is
no message in transit to any of the processes
13
13
14
14
Termination detection using
distributed snapshots
Formal description (2)
15
15
Rule R1 states:
– When a process sends a basic message
to any other process, it sends its
logical clock value in the message
16
16
Termination detection using
distributed snapshots
Formal description (4)
Rule R2 states:
– when a process receives a basic
message, it updates its logical clock
based on the clock value contained in
the message
17
17
Rule R3 states:
– when a process becomes idle:
» updates its local clock
» sends a request for snapshot R(x, k) to every
other process, and
» takes a local snapshot for this request
18
18
Termination detection using
distributed snapshots
Formal description (6)
19
Termination detection
by weight throwing
In this technique, a process called
controlling agent monitors the
computation
– The controlling agent can be one of
the processes involved in computation
A communication channel exists
– between each of the processes and the
controlling agent and also
– between every pair of processes
20
20
Termination detection by
weight throwing
Basic idea
Initially, all processes are in the idle state
The weight at each process is zero and the weight at the
controlling agent is 1
The computation starts when the controlling agent sends a
basic message to one of the processes
– The process becomes active and the computation starts
A non-zero weight W (0 <W ≤ 1) is assigned to each
process in the active state and to each message in transit in
the following way:
– When a process sends a message, it sends a part of its weight
in the message
– When a process receives a message, it add the weight
received in the message to its weight.
=> The sum of weights on all the processes and on all the
messages in transit is always 1
When a process becomes passive
– it sends its weight to the controlling agent in a control
message, which the controlling agent adds to its weight
The controlling agent concludes termination if its weight
becomes 1
21
21
Termination detection by
weight throwing
Notations
22
22
Termination detection by
weight throwing
Formal Description
23
23
Termination detection by
weight throwing
Formal Description (2)
Rule 3:
– A process switches from the active state to
the idle state at any time by sending a control
message C(DW=W) to the controlling agent
and making its weight W: = 0
Rule 4:
– On the receipt of a message C(DW), the
controlling agent adds DW to its weight (W:
= W + DW)
– If W = 1, then it concludes that the
computation has terminated
24
24
Termination detection by
weight throwing
Formal Description (3)
Algorithm correctness
To prove the correctness of the
algorithm, the following sets are defined:
– A: set of weights on all active processes;
– B: set of weights on all basic messages in
transit;
– C: set of weights on all control messages in
transit;
– Wc: weight on the controlling agent
25
25
Termination detection by
weight throwing
Formal Description (4)
Invariant I1 states:
The sum of weights at the controlling
process, at all active processes, on all basic
messages in transit, and on all control
messages in transit is always equal to 1.
Invariant I2 states:
The weight at each active process, on each
basic message in transit, and on each control
message in transit is non-zero.
26
26
Termination detection by
weight throwing
Formal Description (5)
As a result:
27
27
Termination detection
based on spanning trees
Assumptions
– N processes Pi located in the nodes i, 0 ≤ i ≤ N, of a
fixed connected undirected graph
– Graph edges represent the communication channels
The algorithm uses a fixed spanning tree of the
graph with process P0 at the root
Root node (process P0)
– Responsible for termination detection
– Communicates with other processes to determine their
states
– The messages used for this purpose are called signals
– Concludes that termination has occurred, if it has
terminated and all of its immediate children have also
terminated
Leaf nodes
– All leaf nodes report to their parents, if they have terminated
Interior node
– Reports to its parent when
» it has completed processing and
» all of its immediate children have terminated
28
28
Termination detection
based on spanning trees
TD algorithm generates two waves of signals
moving inward and outward through the spanning
tree
Initially, (as a request from the root) a contracting
wave of token signals, moves inward from leaves
to the root
If this token wave reaches the root without
discovering that termination has occurred, the
root initiates a second outward wave of repeat
signals
As this repeat wave reaches leaves, the token
wave gradually forms and starts moving inward
again
This sequence of events is repeated until the
termination is detected
29
29
Token wave
– A contracting wave of signals that move
inward from the leaves to the root
Repeat signal
– If a token wave fails to detect termination,
node P0 initiates another round of termination
detection by sending a signal called Repeat, to
the leaves.
Set S
– The set of graph nodes having one or more
tokens at any instant
30
30
Termination detection based
on spanning trees
A simple (but incorrect) algorithm
31
31
Figure 1 - Node 5
sends a message to
node 1 (source [3])
32
32
Termination detection based on
spanning trees
A simple algorithm (incorrect)
Algorithm problem (2)
33
33
34
34
Termination detection based on
spanning trees
The correct algorithm (2)
Basic description
Use a coloring scheme
– Enables the root node to know that a node in its
children’s subtree, that was assumed to be terminated,
has become active due to a message
– The root determines that an idle process has been
activated by a message, based on the color of the
token it receives from its children
All tokens are initialized to white
If a process had sent a message to some other
process, it sends a black token to its parent on
termination; otherwise, it sends a white token on
termination
The parent process on getting the black token
knows that its child had sent a message to some
other process
35
35
36
36
Termination detection based on
spanning trees
The correct algorithm (4)
Algorithm description
1. Initially
– Each leaf process is provided with a token
– All processes and tokens are white
The set S is used for book-keeping to know which
processes have the token
=> S will be the set of all leaves in the tree
2. When a leaf node terminates, it sends the
token it holds to its parent process
3. A parent process will collect the token sent
by each of its children
– After it has received a token from all of its
children and after it has terminated, the parent
process sends a token to its parent
37
37
38
38
Termination detection based on
spanning trees
The correct algorithm (6)
39
39
An example
1. Initially, all nodes 0 to 6 are white
(Figure 2). Leaf nodes 3, 4, 5, and 6 are
each given a token. Node 3 has token
T3, node 4 has token T4, node 5 has
token T5, and node 6 has token T6.
Hence, S is { 3, 4, 5, 6 }.
2. When node 3 terminates, it transmits T3
to node 1. Now S = {1, 4, 5, 6}. When
node 4 terminates, it transmits T4 to
node 1 (Figure 3). S = { 1,5,6 }
3. Node 1 has received a token from each
of its children and, when it terminates, it
transmits a token T1 to its parent (Figure
4). S = { 0,5,6 }
40
40
Termination detection based on
spanning trees
The correct algorithm (8)
41
41
42
Termination detection based on
spanning trees
The correct algorithm (10)
3 Nodes 3 and 4
become idle
S = { 1,5,6}
43
43
5 Node 5 sends a
message to node 1
44
44
Termination detection based on
spanning trees
The correct algorithm (12)
6 Nodes 5 and 6
become idle
S = {0,2}
7 Node 2 becomes
idle
S = {0}
Node 0 initiates a
Repeat signal
45
45
Performance
Best case message complexity of the
algorithm is O(N), N is the number of
processes in the computation
– The best case occurs when all nodes send all
computation messages in the first round
In this case, the algorithm executes only twice
and the message complexity depends only on
the number of nodes
Worst case complexity of the algorithm is
O(N * M), M is the number of
computation messages exchanged.
– Worst case occurs when only computation
message is exchanged every time the
algorithm is executed
46
46
Termination detection in
a general DC model
Until now, we have assumed that the reception of
a single message is enough to activate a passive
process
In the general model of distributed computing, a
passive process does not necessarily become
active on the receipt of a message
– Instead, the activation condition of a passive process is
more general and a passive process requires a set of
messages to become active
– This requirement is expressed by an activation
condition defined over the set DSi of processes from
which a passive process Pi is expecting messages
– The set DSi associated with a passive process Pi is
called the dependent set of Pi
A passive process becomes active only when its
activation condition is fulfilled
47
47