SKR 5302: Advanced Distributed Computing
6. Chapter 6: Coordination and Agreement
6.4. Elections Algorithm
Elections Algorithm
- Need to find one process that is the coordinator
- Assume
- Each process has a unique identifier (for example, network address)
- One process per machine
- Every process knows the process number of every other process
- Processes don’t know which processes are down and which ones are still running
- End result of the algorithm: all processes agree on who is the new coordinator/leader
- Bully algorithm & Ring Algorithm

Ring-based
Elections
Does NOT use a token
Assume
- processes are ordered
- each process knows its successor and the successor’s successor, and so on (needed in case of failures)
Process P detects that the coordinator is dead
- sends an ELECTION message to its successor
- includes its process number in the message
- each process that receives it, adds its own process number and then forwards it to its successor
- eventually it gets back that message, now what does it do?




Figure
6.7: A ring-based election in progress

A process notices that coordinator is not responding
- it starts an election (any process can start one)
Election algorithm
- P sends an ELECTION message to processes with higher numbers
- If no one responds, P wins the election
- If some process with higher process number responds
- P’s job is done, that process takes over
- the receiver sends an OK message to P
- receiver starts an election process
Eventually all processes give up, except one
This process sends out a message saying that it is the new “COORDINATOR”
A process that was down, when it comes back up starts a new election of its own
The bully algorithm
- Allows processes to crash during an election
- Assumes that message delivery between processes is reliable
- Assumes that the system is synchronous
- Uses timeouts to detect a process failure
- Assumes that each process knows which processes have higher identifiers, and that it can communicate with all such processes.
- Three types of message
- Election message – to announce an election
- Answer message – response to an election message
- Coordinator message – to announce the identity of the elected process – the new ‘coordinator’.
- A process begins an election when it notices, through timeouts, that the coordinator has failed.
- Upper bound on the time between sending a message to another process and receiving a response:
- T = 2Ttrans + Tprocess
Ttrans: maximum message transmission delay
Tprocess: maximum delay for processing a message
- T = 2Ttrans + Tprocess
- The process that knows it has the highest identifier can elect itself as the coordinator simply by sending a coordinator message to all processes with lower identifiers.
- A process with a lower identifier can begin an election by sending an election message to those processes that have a higher identifier and awaiting answer messages in response.
- If none arrives within time T, the process considers itself as the coordinator and sends a coordinator message to all processes with lower identifiers.
- Otherwise, the process waits a further period T’ for a coordinator message to arrive from the new coordinator.
- If none arrives, it begins another election.
- If a process pi receives a coordinator message, it sets its variable electedi to the identifier of the coordinator contained within it.
- If a process receives an election message, it sends back an answer message and begins another election – unless it has begun one already.
- When a process is started to replace a crashed process, it begins an election.
- If it has the higher process identifier, then it will decide that it is the coordinator and announce this to the other processes.
- Thus, it will become the coordinator, even though the current coordinator is functioning.
Figure 6.8: The bully algorithm
