What is Apache Kafka?
Apache Kafka is an open source publish-subscribe messaging system that enables fast, scalable and fault-tolerant handling of real-time data feeds. In contrast to traditional enterprise messaging software, Kafka is able to process all data flowing through a company in near real time.
Kafka is written in Scala and was originally developed by LinkedIn. Since that time, a number of companies have used it to build real-time platforms.
Kafka has a lot of similarities with transaction logs and it manages feeds of news in topics. Manufacturers write data in topics and consumers read by those topics. These topics are partitioned and replicated across multiple nodes in a distributed system format. Kafka is unique in that it treats each topic partition as a log and assigns a unique offset to each message in a partition.
It keeps all messages for a certain amount of time and users are responsible for keeping track of their location in each log. This differs from previous systems where brokers were responsible for this tracking, which severely limited the scalability of the system as the number of consumers increased. This structure enables Kafka to support many consumers and store large amounts of data with very little overhead.
Kafka can be used:
- As a traditional news broker
- For tracking website activity
- For log aggregation
- For processing large data streams