8. Chapter 8: Replication
8.4. Transactions with replicated data
Introduction
- From a client’s view point, a transaction on replicated objects should appear the same as one with non-replicated objects.
- In a non-replicated system, transaction appear to be performed one at a time in some order.
- This is achieved by ensuring a serially equivalent interleaving of clients’ transactions.
- The effect of transactions performed by clients on replicated objects should be the same as if they had been performed one at a time on a single set of objects.
- This property is called one-copy serialization.
Architectures for replicated transactions
- Client request to replica managers issue
- A front end may either multicast client requests to groups of replica managers or send each request to a single replica manager, which is then responsible for processing the request and responding to the client.
- Primary copy approach – all front ends communicate with a distinguished ‘primary’ replica manager to perform an operation, and that replica manager keeps the backups up-to-date.
- Front ends may communicate with any replica manager to perform an operation – but coordination between replica manager is consequently more complex.
- Number of replica managers required for the successful completion of an operation issue
- Different replication schemes have different rules as to how many of the replica managers in a group are required for the successful completion of an operation.
- Example: read-one/write-all scheme – a read request can be performed by a single replica manager, whereas a write request must be performed by all the replica managers in the group.
- Quorum consensus schemes – to reduce the number of replica managers that must perform update operations, but at the expense of increasing number of replica managers required to perform read-only operations
- Another issue is whether the replica manager contacted by a front end should defer the forwarding of update requests to other replica managers in the group until after a transaction commits - Lazy approach to update propagation
Quorum consensus methods
- One way of preventing transactions in different partitions from producing inconsistent results is to make a rule that operations can be carried out within only one of the partitions.
- A quorum is a subgroup of replica managers whose size gives it the right to carry out operations.
- Gifford [1979] developed a file replication scheme in which a number of ‘votes’ are assigned to each physical copy at a replica manager of a single logical file.
- A vote can be regarded as a weighting related to the desirability of using a particular copy.
- Each read operation must first obtain a read quorum of R votes before it can proceed to read from any up-to-date copy, and each write operation must obtain a write quorum of W votes before it can proceed with an update operation.
- R and W are set for a group of replica managers such that
- W > half the total votes
- R + W > total number of votes for the group
- This ensures that any pair consisting of a read quorum and a write quorum or two write quorum must contain common copies.
- To perform read operation
- A read quorum is collected by making sufficient version number enquiries to find a set of copies, the sum of whose votes is not less than R.
- Not all of these copies need be up-to-date.
- Since each read quorum overlaps with every write quorum, every read quorum is certain to include at least one current copy.
- The read operation may be applied to any up-to-date copy.
- To perform write operation
- A write quorum is collected by making sufficient version number enquiries to find a set of replica managers with up-to-date copies, the sum of whose votes is not less than W.
- If there are insufficient up-to-date copies, then a non-current file is replaced with a copy of the current file, to enable the quorum to be established.
- The updates specified in the write operation are then applied by each replica manager in the write quorum, the version number is incremented and completion of the write is reported to the client.
- The files at the remaining available replica managers are then updated by performing the write operation as a background task.
- Any replica manager whose copy of the file has an older version number than the one used by the write quorum updates it by replacing the entire file with a copy obtained from a replica manager that is up-to-date.
- Configurability of groups of replica managers
- An important property of the weighted voting algorithm is that groups of replica managers can be configured to provide different performance or reliability characteristics.
- Once the general reliability and performance of a group of replica managers is established by its voting configuration, the reliability and performance of write operations maybe increased by decreasing W and similarly for reads by decreasing R.
- An example from Gifford
- Gifford gives 3 examples showing the range of properties that can be achieved by allocating weights to the various replica managers in a group and assigning R and W appropriately.
- The blocking probabilities give an indication of the probability that a quorum cannot be obtained when a read or write request is made.
- They are calculated assuming that there is a 0.01 probability that any single replica manager will be unavailable at the time of a request.
- Example 1:
- Configured for a file with a high read-to-write ratio in an application with several weak representatives and a single replica manager.
- Replication is used to enhance the performance of the system, not the reliability.
- There is one replica manager on the local network that can be accessed in 75 milliseconds.
- 2 clients have chosen to make weak representatives on their local disks, which they can access in 65 milliseconds, resulting in lower latency and less network traffic.

- Example 2:
- Configured for a file with a moderate read-to-write ratio, which is accessed primarily from one local network.
- The replica manager on the local network is assigned two votes and the replica managers on the remote networks are assigned one vote each.
- Reads can be satisfied from the local replica manager, but writes must access the local replica manager and one remote replica manager.
- The file will remain available in read-only mode if the local replica manager fails.
- Clients should create local weak representatives for lower read latency.
- Example 3:
- Configured for a file with a very high read-to-write ratio.
- Clients can read from any replica manager, and the probability that the file will be unavailable is small.
- Updates must be applied to all copies.
- Once again, clients could create weak representatives on their local machines for lower read latency.
- The main disadvantage of quorum consensus is that the performance of read operations is degraded by the need to collect a read quorum from R replica managers.