Optimizing Apache Kafka® for High Throughput

This guide explores the most important consumer and producer properties of Apache Kafka for achieving a high throughput.

...
By Stefan Sprenger

Apache Kafka is a very powerful technology for storing data in a scalable, fault-tolerant, and robust manner and serves a plethora of data streaming use cases, like streaming data pipelines. It persists data in a distributed log and allows applications to either produce data to the log or consume data from it. Kafka calls events records and organizes them in topics. It does not make any assumption about their format - Users are free to bring their own data format and need to provide a serializer or deserializer when producing to or consuming from Kafka topics.

Throughput is an important performance indicator of an application working with Apache Kafka. It measures the amount of data that a consumer or producer can process in a specified time interval. Typically, throughput is defined in terms of records per second or megabytes (MB) per second.

Latency is another important performance metric and measures the time needed for processing single events. Latency resembles how close an application gets to the real-time processing of records and is typically defined in terms of milliseconds.

When tuning the performance of an application, you typically want to maximize throughput and minimize latency. In practice, it is often a trade-off between both - You sacrifice a bit of latency to win some throughput, and vice versa.

In this guide, we describe the most important configuration properties of the consumer and producer APIs of Apache Kafka that impact the throughput when reading from or writing to Kafka. If you are building streaming data pipelines with DataCater, you can adjust the consumer and producer properties in the configuration of Streams.

Optimizing Kafka consumers for throughput

In general, if you want to maximize the amount of data that are consumed per second, it is very helpful to consume records in batches instead of processing them one by one. This comes at the cost of an increased latency because you need to wait until a batch fills up before you start processing it.

In the following, we describe the properties of the Apache Kafka consumer API that are essential to optimizing consumers for high throughput.

max.poll.records

The consumer property max.poll.records defines the number of records that are returned by one call of the poll() function (default: 500).

Depending on how you process the records retrieved from Kafka, increasing this property can lead to an increase in the overall throughput. For instance, Kafka Streams, which is typically used with rather complex use cases that require the management of state, sets this property to 1000 by default.

fetch.min.bytes

The consumer property fetch.min.bytes defines the minimum number of bytes that a consumer intends to fetch from an Apache Kafka broker (default: 1).

If the broker has less than fetch.min.bytes of records available, it will wait for the amount of data to complete before responding to the fetch request. The waiting can be constrained by setting the consumer property fetch.max.wait.ms.

If you want to reduce the number of network requests and increase the throughput of your consumer, you might want to try increasing this consumer property.

fetch.max.wait.ms

The consumer property fetch.max.wait.ms defines the number of milliseconds that an Apache Kafka Broker shall wait for data before responding to a fetch request if less than fetch.min.bytes of data are available (default: 500). A higher value will improve the throughput and sacrifice latency, and vice versa.

💡 What is the difference between polling and fetching?
Applications retrieve records from subscribed Apache Kafka topics by recurringly calling the poll() function provided by the Apache Kafka client. While the application is processing the records and before it is calling poll() another time, the Kafka client can already fetch data from the broker internally, which minimizes future waiting time. As opposed to the poll() function, which is available through a public interface, your application cannot actively invoke the fetching of data from the broker.

Optimizing Kafka producers for throughput

Similar to consumers, optimizing Apache Kafka producers for a high throughput often involves transferring more records with fewer network requests.

batch.size

The producer property batch.size defines the maximum number of records that are sent by the Apache Kafka client to the broker with one request (default: 16384).

Depending on the size of your records, you might want to consider setting this property to a few hundred thousands, typically to something between 100000 and 200000. Please note that increasing the batch.size might have a negative impact on the memory consumption of your application.

linger.ms

The producer property linger.ms defines the number of milliseconds that the Kafka client waits until batch.size records are available before sending them to the broker (default: 0).

By increasing this property to, for instance, 100 milliseconds, you can reduce the number of network requests and increase the throughput but negatively impact the latency of your producer.

compression.type

The producer property compression.type allows you to define a compression algorithm that is applied to the records before sending them to the Kafka broker (default: none). Applying compression reduces the amount of data that need to be transferred from your application to the broker and can increase the throughput of your producer, especially when producing records in batches.

Apache Kafka clients support various compression algorithms, e.g., gzip, lz4, and zstd.

acks

The producer property acks defines how many in-sync replicas must acknowledge a write request before the producer can consider it as successful (default: 1).

By requiring all in-sync replicas to acknowledge a request, i.e., acks=all, you get very strong guarantees about the durability of your records but spend a lot of time waiting for the responses of the replicas, which negatively impacts the throughput.

By requiring no in-sync replica to acknowledge a request, i.e., acks=0, you can increase the performance of your producer but risk losing records.

For most use cases, we recommend sticking to the default configuration, i.e., acks=1.

Summary

This guide discussed various configuration options for consumers and producers that can have a positive impact on the throughput of your application when being adjusted.

In general, we can witness a trade-off between a high throughput and a low latency. Configurations that improve the throughput typically negatively impact the latency, and vice versa.

The default values of Apache Kafka configuration often favor latency over throughput, e.g., by default, producers do not wait until batches fill up but immediately start sending data to the broker (linger.ms=0). As a consequence, you need to experiment with different settings for the configuration options discussed above if you have high-throughput requirements.