Architecture of Cassandra

  • Cassandra is designed to handle big data with no single point failure.
  • Cassandra follows peer-to-peer distributed system across the multiple nodes in a cluster. Each node is independent of the other nodes in an interconnected network and plays a different role in the system.
  • All nodes can accept read and write requests from the user with different data locations. If any node in cluster goes down, read and write requests can be served by the other nodes in the network.

Components of Cassandra

The important components of Cassandra are as follows:

1. Node
It is a single computer, where the data is stored.

2. Data Center
It consists of multiple related nodes, which are working in an interconnected network.

3. Cluster
Cluster consists of one or more data centers.

4. Commit log
Commit log is used as a crash-recovery mechanism in Cassandra. All write operations are written into commit log for recovery.

5. Mem-table
A mem-table is a memory-resident data structure in Cassandra. The entire data which entered in commit log can be written into the mem-table.
6. SSTable
It is a secondary disk file in which the data can be removed from the memory table, when its contents rise to the threshold value.

7. Bloom filter
Bloom filter is an algorithm to check, if a particular element is a member of set or not.

Data caching and tunning mechanism

  • Cassandra system allows an integrated caching and distributes data cache around the cluster.
  • The integrated data cache can solve the cold start problem (The system is unable to draw any inferences for user or item) by saving the cache to the disk, after regular interval of time and makes the content available to read operations,  and then the system restarts.
  • The first request coming  from the user can  fetch the row cache and return the requested rows.
  • The subsequent read operation requests a row which may not be present in the row cache but it may be present in the partition key cache. Then, the  system can return the data and fill the row cache with current read operation, after accessing row in the SSTable.

Data replication

In Cassandra, multiple nodes in cluster act as replicas for data. If  any problem is detected with the data in any node, the most recent value is returned to the client. Cassandra performs the read repair in the background to update stale values.   

data replication