JRush episode 4th: Event streaming with Apache Pulsar

Transcript:

Let me introduce myself. I’m a Senior Developer Advocate at DataStax. Alex has already given an overview, so I won’t repeat that. Previously, I was a Developer Advocate at IBM, focusing on Java and open-source technologies, including WebSphere. I’m based in Chicago and am an active community builder, serving as a Java Champion and President/Executive Board Member of the Chicago Java Users Group. Before this, I worked as a development engineer and technical architect at various companies in the Chicago area.

Let’s move on quickly, as time is limited. This session aims to excite you about event streaming, Apache Pulsar, and how DataStax supports it. Here’s a quick agenda: First, we’ll align on what event streaming is. Even for seasoned developers, it can be confusing at first. I’ll introduce the basic terms, discuss the value of event streaming, and explain how it improves systems.

Next, I’ll introduce Apache Pulsar—what makes it unique, why it’s worth considering, and some of its developer-friendly features. I’ll also cover the DataStax Managed Cloud Platform, which simplifies working with Pulsar. If time permits, I’ll do a quick demo. If not, I’ll share resources where you can watch live demonstrations.

Let’s start with the basics: what is an event? According to Merriam-Webster, an event is “something that happens—an occurrence.” In technical terms, it’s a point in space and time, with three spatial coordinates (X, Y, Z) and a time dimension. Events are immutable—they cannot be erased once they occur. For instance, the birth of a baby or a musical performance happens at a specific time.

Event-driven kind of way of describing messaging and then there's also message-driven in which the sender and receiver have to know each other—in other words, your address has to be known. So, that's kind of like the two differences. But they are a highly decoupled way of passing data from one point to another.

Now, let's take a look. An event is not a new thing. If you kind of look at the event approach, it's basically an event happens, and then you immediately process it. The more traditional way is batch processing. Batch processing has its place in the computing world—you gather all the data that's there and, when you're ready, batch them all up and send them in one batch. That's why it's called batch processing. It has specific uses in our computing world, but, as you can see, with events, you're able to process it right away. The advantage is the real-time aspect of it.

Now let's look at the patterns again. The event-driven kind of way is essentially streaming too, which is "pub-sub." That's very much an architectural pattern. Basically, your client will send the data to the topic. Your publishing client sends it to the topic, and the broker owns the topic and delivers it to whoever subscribes to the topic. When the messages arrive at the topic, the broker delivers them accordingly to the subscribers.

There’s another way: message queuing, which is a form of asynchronous service-to-service communication. It’s a bit more like the sender sending data to some queues. Once it gets into the queue, the receiver picks up the messages from the queue. Once the messages are picked up, they’re gone from the queue.

Event streaming is a step beyond event messaging. What’s driving the change is the need for real-time data to enhance customer experiences and create a competitive advantage. Delivering results to users faster is crucial. Today, with AI and machine learning applications, building data pipelines is essential. Event streaming supports scalability—it enables systems to ingest high volumes of data during peak times and scale back dynamically. Built-in mechanisms handle backpressure to prevent system overload.

Event streaming allows systems to watch for events and act immediately without waiting. Subscribing to specific topics ensures efficiency and low latency in processing high-frequency messages. Comparing traditional batch processing (e.g., ETL processes) to modern event streaming highlights the shift. Batch processing deals with huge data volumes but takes time as it involves disk I/O operations. In contrast, event streaming processes data in memory in real time, transforming it before outputting to a sink.

Now, let’s talk about Apache Pulsar. Pulsar is an open-source software created by Yahoo and later donated to Apache in 2016. It’s designed to be cloud-native, supporting multi-tenancy and separating compute from storage. Pulsar brokers handle message delivery, and Apache BookKeeper manages message storage efficiently. This separation simplifies scalability and ensures message reliability with features like deduplication and retry mechanisms.

Pulsar also includes serverless "Pulsar Functions" for lightweight, real-time data transformation and processing. It supports offloading less active messages to cost-effective storage options like S3 buckets or Google Cloud Storage, improving sustainability and cost efficiency. Pulsar is compatible with multiple programming languages, including Java, Python, Go, and more.

Pulsar’s architecture separates compute and storage, unlike Apache Kafka. This segmentation allows efficient log management, improving scalability. Pulsar also integrates with existing systems like Kafka, JMS, and RabbitMQ through Starlight APIs, ensuring compatibility during migration. It offers features like geo-replication for disaster recovery and Pulsar IO for building efficient data pipelines.

At DataStax, we offer Astro Streaming, a managed Pulsar platform for quick deployment, proof-of-concept projects, and scalability. Open-source Pulsar and enterprise support options are also available. Pulsar has a rich ecosystem of connectors and clients, making it a versatile unifying platform for messaging, streaming, and queuing.

Summary

Find out what event streaming is and how it will influence future software development. Meet Apache Pulsar, an open-source distributed messaging and streaming platform for cloud-native environment. Browse Pulsar’s key characteristics and features that make it stand out from similar solutions.

About Mary

Mary Grygleski, Streaming Developer Advocate at DataStax

Social Media

Videos
card image
Jan 30, 2025
Dockerize Spring Boot Wisely: 6 tips to improve the container images of your Spring Boot apps

Your Spring Boot applications deserve a top-notch package!

Videos
card image
Jan 22, 2025
JEP 483: Ahead-of-Time Class Loading & Linking. Project Leyden in JDK 24

JEP 483 introduces Ahead-of-Time (AOT) Class Loading and Linking in JDK 24, which enhances Java application startup times by loading and linking classes ahead of time and storing them in a reusable AOT cache. This feature, part of Project Leyden, reduces the JVM's workload during startup without requiring changes to application code, though a training run mimicking production is needed to create an efficient cache. Early tests with a Spring Boot app showed significant improvements, cutting startup time from two seconds to just one second.

Further watching

Videos
card image
Feb 25, 2025
PF4J: Plugin Framework for Java. Plugin Systems for Backend

PF4J (Plugin Framework for Java) is a lightweight framework that allows developers to create modular applications using plugins. It enables the integration of custom code into applications through extension points, with support for lifecycle management and Spring integration for dependency injection. PF4J is useful for both desktop and web applications, offering flexibility in scaling and extending functionality without altering the core system.

Videos
card image
Feb 13, 2025
How to Profile Java Applications in Docker Containers with JFR

Java applications in Docker containers using Java Flight Recorder (JFR), a built-in OpenJDK tool. It covers three profiling methods: enabling JFR at application startup, attaching to a running container using an ephemeral container with jcmd, and monitoring real-time performance with JDK Mission Control via remote JVM connections.

Videos
card image
Feb 7, 2025
How to Create Dynamic SQL Queries with Spring Boot

Build SQL queries dynamically based on the user input with the Specification interface. Use the JPA Static Model Generator to create type-safe queries. Run the tests to check your application.