Apache Flume

Apache flume is a reliable and distributed tool, which is designed to collect streaming data form several databases to HDFS.

Advantages of Apache Flume

  • Apache Flume is very useful to store the data into centralized store.
  • Apache Flume is reliable.

Features of Apache Flume

  • Apache Flume collects data from several web sources and stores and can be stored into centralized store. For example: HDFS
  • Apache flume supports several sources and destinations.

Architecture of Apache Flume

Flume architecture is based on streaming data flows. It uses a simple extensible data model, that allows online analytic applications.

Components of Apache Flume
The Flume agent receives the data from clients or other agents and forwards it to source or sink. The Flume Agent consists of three main components: source, channel and sink.

Source
A source receives the data from the several data generators and transfers it to the channels.

Channel
A channel receives the events from source and buffers these events till they are accepted by sink.

Sink
  • A sink is used to store the data into centralized stores. For example: HDFS.
  • A sink consumes the data from one or more channels and delivers to the destination (the destination may be another agent or the central store).

components of apache flume