JRush episode 4th: Event streaming with Apache Pulsar

Transcript:

Let me introduce myself. I’m a Senior Developer Advocate at DataStax. Alex has already given an overview, so I won’t repeat that. Previously, I was a Developer Advocate at IBM, focusing on Java and open-source technologies, including WebSphere. I’m based in Chicago and am an active community builder, serving as a Java Champion and President/Executive Board Member of the Chicago Java Users Group. Before this, I worked as a development engineer and technical architect at various companies in the Chicago area.

Let’s move on quickly, as time is limited. This session aims to excite you about event streaming, Apache Pulsar, and how DataStax supports it. Here’s a quick agenda: First, we’ll align on what event streaming is. Even for seasoned developers, it can be confusing at first. I’ll introduce the basic terms, discuss the value of event streaming, and explain how it improves systems.

Next, I’ll introduce Apache Pulsar—what makes it unique, why it’s worth considering, and some of its developer-friendly features. I’ll also cover the DataStax Managed Cloud Platform, which simplifies working with Pulsar. If time permits, I’ll do a quick demo. If not, I’ll share resources where you can watch live demonstrations.

Let’s start with the basics: what is an event? According to Merriam-Webster, an event is “something that happens—an occurrence.” In technical terms, it’s a point in space and time, with three spatial coordinates (X, Y, Z) and a time dimension. Events are immutable—they cannot be erased once they occur. For instance, the birth of a baby or a musical performance happens at a specific time.

Event-driven kind of way of describing messaging and then there's also message-driven in which the sender and receiver have to know each other—in other words, your address has to be known. So, that's kind of like the two differences. But they are a highly decoupled way of passing data from one point to another.

Now, let's take a look. An event is not a new thing. If you kind of look at the event approach, it's basically an event happens, and then you immediately process it. The more traditional way is batch processing. Batch processing has its place in the computing world—you gather all the data that's there and, when you're ready, batch them all up and send them in one batch. That's why it's called batch processing. It has specific uses in our computing world, but, as you can see, with events, you're able to process it right away. The advantage is the real-time aspect of it.

Now let's look at the patterns again. The event-driven kind of way is essentially streaming too, which is "pub-sub." That's very much an architectural pattern. Basically, your client will send the data to the topic. Your publishing client sends it to the topic, and the broker owns the topic and delivers it to whoever subscribes to the topic. When the messages arrive at the topic, the broker delivers them accordingly to the subscribers.

There’s another way: message queuing, which is a form of asynchronous service-to-service communication. It’s a bit more like the sender sending data to some queues. Once it gets into the queue, the receiver picks up the messages from the queue. Once the messages are picked up, they’re gone from the queue.

Event streaming is a step beyond event messaging. What’s driving the change is the need for real-time data to enhance customer experiences and create a competitive advantage. Delivering results to users faster is crucial. Today, with AI and machine learning applications, building data pipelines is essential. Event streaming supports scalability—it enables systems to ingest high volumes of data during peak times and scale back dynamically. Built-in mechanisms handle backpressure to prevent system overload.

Event streaming allows systems to watch for events and act immediately without waiting. Subscribing to specific topics ensures efficiency and low latency in processing high-frequency messages. Comparing traditional batch processing (e.g., ETL processes) to modern event streaming highlights the shift. Batch processing deals with huge data volumes but takes time as it involves disk I/O operations. In contrast, event streaming processes data in memory in real time, transforming it before outputting to a sink.

Now, let’s talk about Apache Pulsar. Pulsar is an open-source software created by Yahoo and later donated to Apache in 2016. It’s designed to be cloud-native, supporting multi-tenancy and separating compute from storage. Pulsar brokers handle message delivery, and Apache BookKeeper manages message storage efficiently. This separation simplifies scalability and ensures message reliability with features like deduplication and retry mechanisms.

Pulsar also includes serverless "Pulsar Functions" for lightweight, real-time data transformation and processing. It supports offloading less active messages to cost-effective storage options like S3 buckets or Google Cloud Storage, improving sustainability and cost efficiency. Pulsar is compatible with multiple programming languages, including Java, Python, Go, and more.

Pulsar’s architecture separates compute and storage, unlike Apache Kafka. This segmentation allows efficient log management, improving scalability. Pulsar also integrates with existing systems like Kafka, JMS, and RabbitMQ through Starlight APIs, ensuring compatibility during migration. It offers features like geo-replication for disaster recovery and Pulsar IO for building efficient data pipelines.

At DataStax, we offer Astro Streaming, a managed Pulsar platform for quick deployment, proof-of-concept projects, and scalability. Open-source Pulsar and enterprise support options are also available. Pulsar has a rich ecosystem of connectors and clients, making it a versatile unifying platform for messaging, streaming, and queuing.

Summary

Find out what event streaming is and how it will influence future software development. Meet Apache Pulsar, an open-source distributed messaging and streaming platform for cloud-native environment. Browse Pulsar’s key characteristics and features that make it stand out from similar solutions.

About Mary

Mary Grygleski, Streaming Developer Advocate at DataStax

Social Media

Videos
card image
Dec 17, 2024
Master Java Profiling: Tools, Techniques, and Real-World Tips

Java profiling allows to rapidly identify and fix performance bottlenecks in your program. In this video we explain what is profiling, introduce popular profiling tools, list their pros and cons, and provide useful tips and code examples.

Videos
card image
Dec 9, 2024
How to use CRaC with Spring Boot in a Docker Container

CRaC (Coordinated Restore at Checkpoint) is an OpenJDK project designed to significantly reduce startup and warmup times of Java applications to milliseconds. This tutorial demonstrates using CRaC with a Spring Boot application running in a Docker container, specifically the Spring Boot Petclinic app (version 3.2 or later).

Further watching

Videos
card image
Jan 22, 2025
JEP 483: Ahead-of-Time Class Loading & Linking. Project Leyden in JDK 24

JEP 483 introduces Ahead-of-Time (AOT) Class Loading and Linking in JDK 24, which enhances Java application startup times by loading and linking classes ahead of time and storing them in a reusable AOT cache. This feature, part of Project Leyden, reduces the JVM's workload during startup without requiring changes to application code, though a training run mimicking production is needed to create an efficient cache. Early tests with a Spring Boot app showed significant improvements, cutting startup time from two seconds to just one second.

Videos
card image
Jan 14, 2025
How to use AppCDS with Spring Boot

This tutorial demonstrates how to use Application Class Data Sharing (AppCDS) and Ahead-of-Time (AOT) processing with Spring Boot applications to reduce startup time by 40–50%. AppCDS creates an archive of parsed classes for faster loading, requiring no code changes, and works both locally and in containers. The tutorial covers building optimized Docker images using Dockerfiles or Buildpacks for efficient deployment and improved performance.

Videos
card image
Dec 28, 2024
JDK 24: The New Features in Java 24

JDK 24 is in Rampdown Phase One, which means, we know all the JEPs targeted to this release. And there are a lot of them, so it is time to discuss this new Java release!