SKR 5302: Advanced Distributed Computing

6. Chapter 6: Coordination and Agreement

6.3. Distributed Mutual Exclusion II

Figure 6.6:Maekawa’s algorithm

Fault tolerance
- Main points in evaluating the above algorithms with respect to fault tolerance are:
  - What happens when message are lost?
  - What happens when a process crashes?
- None of the algorithms above would tolerate the loss of messages, if the channels were unreliable.
- The ring-based cannot tolerate a crash failure of any single process.
- Maekawa’s algorithm can tolerate some process crash failures
  - If a crashed process is not in a voting set that is required, then its failures will not affect the other processes.
- The central server algorithm can tolerate the crash failure of a client process that neither holds nor has requested the token.
- The Ricart and Agrawala algorithm can be adapted to tolerate the crash failure of such a process, by taking it to grant all requests implicitly.
- Even with a reliable failure detector, care is required to allow for failures at any point (including during a recovery procedure), and to reconstruct the state of the processes after a failure has been detected.