Posts

How to install Apache Kafka on a local machine

Mar 28, 2024
Catherine Edelveis
11.3

Our previous article explained how Apache Kafka provides an efficient and resilient platform for event streaming. 

It’s time for some coding! In this article, we will set up Kafka on our local machine, which is useful when you want to explore the technology without disrupting the production environment.

By the way, if you deploy Spring Boot services to the cloud, check out Alpaquita Containers tailor-made for Spring Boot: they can reduce the memory footprint of your containers by up to 30%! Ad recently they became even more powerful with added CRaC support that enables to reduce the startup and warmup time of your services for minutes to milliseconds. Follow the guide on using CRaC with Java in a container and start experimenting with the feature.

Install Java

To work with Kafka, you need to install Java. Kafra requires JDK 8 or later. I will use JDK 21, the latest LTS release, and recommend you to try it out as it is packed with new and enhanced features facilitating development and increasing JVM performance.

Note that the commands below are provided for Linux and macOS, so if you use Windows, you can follow this guide by installing WSL to create a Linux environment for your Kafka cluster.

Download Java 21 for your platform. You can also pull the bundle from Linux repositories or via package managers. Make sure that the JAVA_HOME variable is set to Liberica JDK installation directory:

$ export JAVA_HOME=<Liberica installation dir>

Verify the installation by running

$ java -version

The output will be similar to

openjdk version "21.0.2" 2024-01-16 LTS
OpenJDK Runtime Environment (build 21.0.2+14-LTS)
OpenJDK 64-Bit Server VM (build 21.0.2+14-LTS, mixed mode, sharing)

Download and set up Kafka with KRaft

Download the latest stable release of Apache Kafka (I got 3.6.1, your version may be different) Note that you need to download the binary file, not the source! Open the Terminal, unzip the file and go to the Kafka root directory.

$ tar -xzf kafka_2.13-3.6.1.tgz 
$ cd kafka_2.13-3.6.1

Alternatively, you can install Kafka via a package manager, for instance, Homebrew:

$ brew install kafka

To set up a Kafka cluster, you need Zookeeper or KRaft. I suggest we use KRaft as Zookeeper is deprecated and will soon be removed.

First of all, we need to generate the cluster ID. Run

$ bin/kafka-storage.sh random-uuid

This will yield a UUID, which you need to add to the KRaft server.properties file. You can format the file from the command line ( replace the <your_UUID> with the UUID generated at the previous step) :

$ bin/kafka-storage.sh format -t <your_UUID> -c config/kraft/server.properties
Formatting /tmp/kraft-combined-logs with metadata.version 3.6-IV2.

You can now launch the Kafka Server with KRaft with the following command:

$ bin/kafka-server-start.sh config/kraft/server.properties
…
[2024-02-16 12:50:49,056] INFO [BrokerServer id=1] Transition from STARTING to STARTED (kafka.server.BrokerServer)
[2024-02-16 12:50:49,057] INFO Kafka version: 3.6.1 (org.apache.kafka.common.utils.AppInfoParser)
[2024-02-16 12:50:49,057] INFO Kafka commitId: 5e3c2b738d253ff5 (org.apache.kafka.common.utils.AppInfoParser)
[2024-02-16 12:50:49,057] INFO Kafka startTimeMs: 1708077049056 (org.apache.kafka.common.utils.AppInfoParser)
[2024-02-16 12:50:49,061] INFO [KafkaRaftServer nodeId=1] Kafka Server started (kafka.server.KafkaRaftServer)

Closing this window will stop the cluster, so open a new Terminal window, and let’s proceed with creating our first topic.

Create a Kafka topic

To read, write, or process events with Kafka, we first need to create a topic where these events will be stored. For that purpose, we will use the kafka-topics.sh script in the bin directory and pass several options to it:

  • --create to create a new topic,
  • --topic with the topic name, 
  • --bootstrap-server to connect the script to the server. Right now, we have only one broker, so we will specify localhost and the default port Kafka listens to, 9092

The full command looks like this:

$ bin/kafka-topics.sh --create --topic user-requests --bootstrap-server localhost:9092
Created topic user-requests.

You can see the detailed information about your topic by running

$ bin/kafka-topics.sh --describe --topic user-requests --bootstrap-server localhost:9092
Topic: user-requests	TopicId: pMx2tM65SzOYUWMD02IjoQ	PartitionCount: 1	ReplicationFactor: 1	Configs: segment.bytes=1073741824
	Topic: user-requests	Partition: 0	Leader: 1	Replicas: 1	Isr: 1

As we didn’t specify the number of partitions, the resulting topic has one partition with ID=0 and a leader with ID=1.

Write and read events to and from the topic

To write events to our topic, we need to start a producer. We can do that by running the kafka-console-producer.sh script in the bin directory and specifying the name of the topic and the bootstrap-server:

$ bin/kafka-console-producer.sh --topic user-requests --bootstrap-server localhost:9092
>

Each line you write will be written to the topic as a separate event. For instance,

>User A left a request
>User B left a request
>User C left a request

After having written several messages, you can stop the producer with Ctrl-C.

Okay, we have some events, now we want to read them. For that purpose, we need to start a consumer with the kafka-console-consumer.sh script:

$ bin/kafka-console-consumer.sh --topic user-requests --from-beginning --bootstrap-server localhost:9092
User A left a request
User B left a request
User C left a request

Note that if you run the command without the --from-beginning flag, there will be no output to the console because by default, consumers start reading events from the head of the partition in the topic from the moment they started. To change this behavior, you can adjust the offset with the --consumer-property instead of using the --from-beginning flag:

$ bin/kafka-console-consumer.sh --topic user-requests --bootstrap-server localhost:9092 --consumer-property auto.offset.reset=earliest

You can verify that the consumer reads the data in real-time. Open a new Terminal window, start the consumer, and in the previous window, start the producer. You will see that the events you send via the producer are immediately read by the consumer.

Adjust the event retention policies

You can store the events for as long as you need: a minute, a day, a month, etc. We didn’t change the default retention policy, so our events will be stored forever.

If you run

$ ls -la /tmp/kraft-combined-logs/user-requests-0

You will notice that there’s one .log file. Open it and you’ll see that our events are saved there.

The fastest way to delete old events would be to set the retention.ms configuration for our topic, say, for one minute. But it works only if you stop the producer and wait for a minute. What if the producer writes events continuously?  

The thing is, events are stored in partitions in segments. Once the segment fills in, it closes and another one opens. The closed segment can be deleted. If you open the server.properties file in the config directory, you will see the following lines:

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

It means that by default, the segment (or a log file) closes in one week or after reaching 1GB in size. The log.retention.check.interval property indicates that the segments’ condition is checked every 5 minutes.

You can change the settings at the broker level and at the topic level so that you have global settings for all topics and custom settings for specific ones.

Let’s change the settings at the broker level. Stop the broker and change the log.retention.hours to log.retention.ms=1000 so that the segments are closed every second, and set log.retention.check.interval.ms=10000 so that the log segments are checked every 10 seconds.

Restart the broker. Start the consumer with the --from-beginning property and the output will be empty (and if you check the .log file in /temp, you’ll see it is empty, too). 

Let’s play around. Start the producer in a separate window and write several events, one by one. If you run ls -la /tmp/kraft-combined-logs/user-requests-0 in less than 10 seconds, you will see additional .log.deleted files — these are our closed segments. But if you run this command in 10 seconds after you sent your last message, these files will be gone. 

You can rerun the consumer and see that the output is indeed empty.

Stop the Kafka cluster

After you’re done with experiments, terminate the consumer and producer with Ctrl-C, and then stop the broker with Ctrl-C.

Conclusion

Congratulations, you set up your first Kafka cluster! But this is just the tip of the iceberg. In the next article, we will learn how to use Apache Kafka with Spring Boot microservices. Subscribe to our newsletter so as not to miss the new guides and tutorials!

 

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.

Further reading