Understanding Apache Kafka and How it Can Make Your Business More Efficient: Part 2

June 23, 2020 Godwin Kodan

Kafka 102: Specific Use Case Details

Welcome to our second part in our series on Apache Kafka.

In the last article, we discussed the origins of Kafka, its basic structure and functionality, and some terminology used when discussing the platform.

In this article, we will go over more specific use cases and a few real-world cases where Kafka is used in applications that we all use on a regular basis.

Without further ado, let us get started.

Apache Kafka Use Cases

Apache Kafka is used in a variety of applications in different capacities. The ability of Kafka to handle huge amounts of streaming data quickly and efficiently makes it ideal for a variety of uses such as the use cases below.

1. Kafka Metrics

Often used in operation monitoring, Apache Kafka is ideally suited for aggregating statistics from various applications to produce centralized feeds of operational data. That data can then be used for alerts and reports on operational metrics.

2. Stream Processing

Apache Kafka coupled with a framework such as Spark or Storm creates a reliable stream process where data is read from a topic, data is processed and written to a new topic where it becomes available for other subscribers, i.e. users and applications to use.

3. Centralize Raw Logs Data

Kafka has exceptionally low latency making it perfect for processing and supporting multiple data sources and data consumers. Kafka can be used to collect logs from multiple services and make them available to multiple consumers. Essentially becoming a transport layer for raw log data.

That data can be used by consumers to aggregate in real-time and automate alerts. It can also be used to distribute the log data to many platforms at the same time.

You can use Kafka as a messaging system, a storage system, or as a streaming processing platform.

4. Website Activity Tracking

Tracking user activity on websites in real-time is critical in today’s highly competitive online world. Pageviews, searches, what products were viewed, what was added to the cart, and other user actions can be published to topics and become available for real-time processing in visual user interfaces such as dashboards. The data can also be used in offline analytics such as Google’s BigQuery.

Kafka can be used to track, and fine-tune ads based on performance. Ads are monitored based on the position on the page, what search terms were used to view them, and how many times it was viewed. That data is sent to Kafka. Once data is in Kafka it can be used in a Hadoop cluster for further analysis or can be consumed in real-time to adjust ads based on performance.

5. Processing Data in Real-Time

Another use case for Kafka is to work with events in real-time. Banks use Kafka when monitoring for fraud. For example, when a credit card is used to make a purchase, every transaction is sent to Kafka. Based on the locations and frequency of those transactions, a decision can be made to suspend the card and send an alert to the user and fraud department for further analysis.

Those alerts would be generated by an application that consumes the data in a Kafka topic in a real-time data pipeline.

6. Kafka Messaging

Kafka works well as a replacement for traditional message brokers. Kafka was built as a distributed publish-subscribe messaging system with extremely high throughput, built-in partitioning, replication and fault tolerance in comparison to other messaging systems.

Kafka could be used to decouple processing from data producers or buffer unprocessed messages. It is a good solution for large-scale message processing applications.

7. Kafka Commit Log

Kafka is built as a distributed system making it useful as an external commit log. Data is replicated between nodes and can be used to restore data in the case of a failed node thus reducing downtime.

8. Distributable Streaming Platform

As we learned in Part I in our series, Kafka is highly scalable, has extremely low latency and strong data durability. Kafka can be used as a messaging system, a storage system, or as a streaming processing platform.

Data can be transformed as it arrives and be queried in real-time. Logs can be treated as events for further analysis. This can be especially useful when debugging production systems.

9. Communication Between Services

Kafka is well suited to work with a microservice architecture, thus improving overall application performance.

One example might be in an eCommerce situation.

A user places an order on the website. The order event is sent to Kafka. Another app will process the payment and if it is successful will send another event to another topic in Kafka.

A microservice consumes the event from the Kafka topic and sends an email confirmation and start shipping the product.

Kafka works as a message queue like Rabbit MQ or AWS Kinesis. Each microservice will use the publish-subscribe mechanisms from Kafka to interact with each other.

With Kafka, you can decouple the architecture making any failures in the system unnoticed by the end-user.

10. IoT and Data Analysis

Another scenario is to use Kafka as the central location to send and read data from IoT devices.

Kafka’s scalability and real-time capability make it invaluable when lives are on the line.

Take for example monitoring equipment in a public transportation system carrying tens of thousands of riders daily.

A subway car uses an IoT device to monitor information about the train. For example brakes, hydraulic line pressure, or motors could be monitored for faults. Each device will be sending data to a Kafka topic where consumers read the data and send out alerts when a preset threshold of safety is crossed.

The subway car could then be pulled from service for maintenance before there is an accident.

Fortune 500 Companies Use Kafka

Many of the services and companies we interact with on a daily basis use Kafka somewhere in there IT stack.

Let us take a look at some of those companies and how Apache Kafka helps them do business more efficiently.

Entertainment: Netflix

We all love watching our favorite series or movies on Netflix. But did you know that Kafka plays a critical role in streaming all that data to your screen?

Netflix has been an innovator in the field and uses Apache Kafka in its keystone pipeline project to push and receive notifications.

Netflix uses two types of Kafka. Fronting Kafka is used for data collection and buffering by producers. Consumers Kafka is used for content routing to the end consumers.

Netflix handles a huge amount of data daily. At last count, they were using 36 Kafka clusters (24 Fronting Kafka and 12 Consumer Kafka) to work on about 700 billion instances per day!

The data loss rate is about 0.01% and Apache Kafka is a key driver in such a low number of data loss.

Travel: Uber

Uber has revolutionized an industry with their innovation. Apache Kafka plays a critical role in that infrastructure.

One example of how Uber uses Kafka is in its driver injury protection program. Drivers for Uber pay a premium on every ride. They have been using Kafka and the unblocked batch processing method which allows Uber engineers to maintain a steady throughput of data. This has allowed their team to realize real-time updates and flexibility.

Social Media: LinkedIn

The creators of Apache Kafka were also the founders of LinkedIn. The B2B social network was an early pioneer in implementing Kafka in their stack.

Their platforms handle over a trillion messages per day! Kafka has facilitated this with its inherent scalability. The number of messages has increased over 1200X over the last few years.

Broker Kafka clusters at LinkedIn help them to differentiate and white list certain users to allow them a higher bandwidth and ensure the seamless user experience.

Financial Services: Goldman Sachs

Goldman Sachs, a giant in the financial services sector, developed a Core Platform that uses Apache Kafka to handle huge amounts of data, almost 1.5 Tb per week.

The system has a higher data loss prevention rate, low outage time, and vastly improved disaster recovery. Transparency was also an objective of the project which is essential in any financial services firm.

Media: New York Times

The New York Times has been a leader in technology implementation as it has moved into the era of digital information.

Apache Kafka has been used to transform the way the NYT processes data.

One example of this is when an article is published, it needs to be available on various platforms and delivered to subscribers in real-time. To handle this vast distribution, they developed a project called the Publishing Pipeline in which Kafka was used to removing API based issues through its log-based architecture.

The Kafka based system was set up in 2015 and has been billed as a success by the firm. It has reportedly simplified the backend and frontend deployments and decreased the workload of their dev teams.

Key Takeaways

Apache Kafka is a hugely powerful part of any IT stack and helps corporations around the world manage data in ways that were previously slow and riddled with downtime and failures.

If you have questions about how Apache Kafka can help your business, do not hesitate to drop us a line. We are here to help.