Recovery in Distributed Databases

What is recovery in distributed databases?

Recovery is the most complicated process in distributed databases. Recovery of a failed system in the communication network is very difficult.

For example:
Consider that, location A sends message to location B and expects response from B but B is unable to receive it. There are several problems for this situation which are as follows.
  • Message was failed due to failure in the network.
  • Location B sent message but not delivered to location A.
  • Location B crashed down.
  • So it is actually very difficult to find the cause of failure in a large communication network.
  • Distributed commit in the network is also a serious problem which can affect the recovery in a distributed databases.

Two-phase commit protocol in Distributed databases

  • Two-phase protocol is a type of atomic commitment protocol. This is a distributed algorithm which can coordinate all the processes that participate in the database and decide to commit or terminate the transactions. The protocol is based on commit and terminate action.
  • The two-phase protocol ensures that all participant which are accessing the database server can receive and implement the same action (Commit or terminate), in case of local network failure.
  • Two-phase commit protocol provides automatic recovery mechanism in case of a system failure.
  • The location at which original transaction takes place is called as coordinator and where the sub process takes place is called as Cohort.

  • Commit request:
    In commit phase the coordinator attempts to prepare all cohorts and take necessary steps to commit or terminate the transactions.

    Commit phase:
    The commit phase is based on voting of cohorts and the coordinator decides to commit or terminate the transaction.

Concurrency problems in distributed databases.

Some problems which occur while accessing the database are as follows:

1. Failure at local locations
When system recovers from failure the database is out dated compared to other locations. So it is necessary to update the database.

2. Failure at communication location
System should have a ability to manage temporary failure in a communicating network in distributed databases. In this case, partition occurs which can limit the communication between two locations.

3. Dealing with multiple copies of data
It is very important to maintain multiple copies of distributed data at different locations.

4. Distributed commit
While committing a transaction which is accessing databases stored on multiple locations, if failure occurs on some location during the commit process then this problem is called as distributed commit.

5. Distributed deadlock
Deadlock can occur at several locations due to recovery problem and concurrency problem (multiple locations are accessing same system in the communication network).

Concurrency Controls in distributed databases

There are three different ways of making distinguish copy of data by applying:

1) Lock based protocol
A lock is applied to avoid concurrency problem between two transaction in such a way that the lock is applied on one transaction and other transaction can access it only when the lock is released. The lock is applied on write or read operations. It is an important method to avoid deadlock.

2) Shared lock system (Read lock)
The transaction can activate shared lock on data to read its content. The lock is shared in such a way that any other transaction can activate the shared lock on the same data for reading purpose.

3) Exclusive lock
The transaction can activate exclusive lock on a data to read and write operation. In this system, no other transaction can activate any kind of lock on that same data.