Build Typed AI Agents in Java with Embabel

 

Transcript:

Embable is an agent framework on the JVM that mixes LLMs and domain models. It enables you to integrate sophisticated agentic flows into your application, guarded by strong typing and your own code. Let’s introduce it into an existing application and see it in action.

Embable lets you build AI workflows on the JVM using the concept of goals, actions, and conditions. The highest level is the goal: the result that the agent is trying to achieve on behalf of the user. To achieve this goal, the agent uses actions. Actions are defined in methods annotated with the Action annotation. Some actions can use an LLM, and some are just deterministic steps like database queries, validation, or calculations. That’s plain Java code or Kotlin.

An agent plans the workflow to achieve the goal, but it does so independently of the developer. A plan is formulated dynamically and reassessed after each action. There are also conditions that the agent observes during planning. This architecture is what makes Embable a great choice for enterprise JVM applications that want to benefit from agentic AI.

So what do you get specifically? Sophisticated planning: it adds a real planning step using a non-LLM algorithm. Extensibility: you can add new domain models or actions without having to rewrite the code. Strong typing and object orientation: prompts and code interact through domain objects. LLM mixing: you can combine various models for cost, privacy, or performance trade-offs. In addition, Embable is designed for Spring and the JVM, which means it can benefit from existing enterprise features and be testable end-to-end.

But what’s the difference between Embable and Spring AI? With plain Spring AI, you orchestrate the loop yourself: you write the prompt, call the tool, then write another prompt, call another tool, and so on. With Embable, you declare capabilities (actions) and Embable infers the plan based on input and output types, and replans as it goes. So on the one hand, you don’t have to babysit it as it plans actions to achieve the goal. On the other hand, there are guardrails in your code that help prevent LLM hallucinations.

Embable can be easily integrated into existing applications, and that’s what we are going to do for our demo. We will take an existing app. It is cyberpunk-themed, it stores civilians and their implants, and it collects monitoring logs from implants and gives some stats on the implants. What we want to do is introduce an incident triage workflow. For instance, somebody reports unusual telemetry in some place and time window, and the system turns that into a full incident case with risk assessment, affected implants, blast radius, and a containment plan.

The code of the demo is available on GitHub, the link is in the description. So you can pull the project right now to follow along, or later for copy-and-paste code examples.

Okay, let’s first add the dependencies. You can use Embable with different LLMs or even mix them. We will use it with Ollama, so we need the Embable agent starter for Ollama. We are also going to run this application in a shell, so we will need the Embable shell starter and also Spring Shell starter. You can also use the MCP starter or the basic starter. Since we will run the app via the shell, we need to disable the Spring web application mode and enable shell interactive mode. Also, in the properties file, you can configure Embable. For instance, you can set the default model it will use.

Let’s start with something basic. Right now, we are not showcasing its full power, but rather getting to know the framework. No worries, we will enhance this logic later on. We will add a single action at first, just for simplicity’s sake, and this action is also going to be the end goal. The agent will receive the user input as a string and output an IncidentAssessment object.

For that, it will parse the user input into an IncidentSignal object, find all affected implants in the database, and calculate the risk level. The only LLM-related task here is parsing the input. The database lookup and risk assessment are written deterministically in code. The output of the agent work is also a Java object, which provides more reliability to the agent’s response.

Let’s create a class called IncidentTriageAgent. The Agent annotation on the class level marks this class as an Embable agent component. In practice, it’s also a Spring Bean. The description is metadata used for discovery, documentation, and potentially for planning or selection when multiple agents exist. So Embable knows that this agent can investigate telemetry anomalies.

Then we inject ImplantMonitoringLogService. It’s the tool here, but not an LLM tool, it’s a domain service that queries MongoDB. Embable is totally okay with your action calling a normal service. As I said, we’ll keep it simple at first, and the action is going to be one method that achieves the goal.

Two annotations are important here: the Action annotation means that the method is an executable step in Embable’s world, and the AchievesGoal annotation means that this action is also considered a goal-completing action. When the agent produces the IncidentAssessment, it’s done for the described goal.

The input parameters are the user input (the raw user message from command line, chat, or anywhere else) and the OperationContext, which is the Embable runtime context. This is how you access AI features. In this method, we return an IncidentAssessment object, which is the structured output of the action.

In the method, we first use an LLM to parse the input into the IncidentSignal object. We send the prompt and user message to the default LLM configured for the application, and ask the model to output something that can be serialized into the IncidentSignal class. This is important because instead of getting an unstructured blob of text, you get a typed object with fields like longitude, latitude, local date-time from and to, and so on. So we are basically telling the LLM to behave like a parser.

Then we deterministically query MongoDB logs using our domain service, and then we classify the risk deterministically. After that, we build the IncidentAssessment object based on the data we retrieved, and return it to the user. And this is the output of our command.

But this is too easy. Right now, we are using Embable as a fancy JSON parser, which is a total overkill. We want to truly see its power, and that means goal-driven planning across multiple types, tool boundaries, and adaptive planning. For that, we should turn our current one-step triage into a multi-action incident workflow. In this case, the agent will choose what to do next based on the data it received and what it learns after each action.

First, let’s enrich our incident domain model and add some classes. We have three new enums: HypothesisType, RiskLevel, and StepType. Then we have the IncidentSignal record where we pass longitude, latitude, radius, a time window as LocalDateTime, the metric that we are interested in (where we saw anomalies), and the threshold for this metric.

Then we have the IncidentAssessment class, which consists of the IncidentSignal, the number of logs, and the risk level. Then we have AffectedImplant with serial number, lot number, model, civilian national ID, and anomaly score. We also have RootCauseHypothesis with hypothesis type, confidence, and a short list of evidence. We have EstimatedBlastRadius with number of affected implants, up to five lots, up to five models, a geo summary string, and a time summary string.

Then there is the ContainmentPlan with a list of containment steps, a boolean for whether the plan requires approval, and the estimated blast radius. The containment step is just a string for simplicity and for the demo. And then we have the final class, IncidentCase. It has the ID and the creation date. It also includes the IncidentSignal, IncidentAssessment, the list of affected implants, RootCauseHypothesis, and ContainmentPlan.

Now let’s turn our agent into a pipeline that takes the user report and turns it into the full incident case. First we need to divide our initial method into several, and then add some more actions.

The first method is ParseIncidentSignal. This is the action and the entry point. It takes the raw text from the user and calls the default LLM to parse the input into the IncidentSignal. In the prompt we specify several rules, and they’re very important because they constrain the output. For instance, latitude and longitude should be within valid ranges. For time, we use LocalDateTime format. The metric must be one of the allowed values, and the threshold must be a real number. As a result, we go from unstructured text to a typed IncidentSignal object.

The second step is to pull the logs and compute the risk level. The TriageIncident method takes the IncidentSignal we received, calls ExtractLogs (which queries MongoDB logs with location radius and timestamps), and that comes back as a map where the key is implant serial number and the value is a list of logs for that implant. Then we classify risk with ClassifyRisk. This part is pure Java, no LLM. We count how many implants are involved and how many log entries exceeded the threshold for the chosen metric, then apply deterministic thresholds for critical, high, medium, and low. Finally, we return an IncidentAssessment object.

The third action is to find which implants were affected and rank them. FindAffectedImplants also calls ExtractLogs, then converts each map entry into an AffectedImplant. For each implant we compute an anomaly score and enrich the record with domain data for lot number, model, and civilian ID. Then we sort implants by anomaly score descending. So now we know who is affected and where the worst situation is.

The next step is to generate a root cause hypothesis. Here we use an LLM again for reasoning. The inputs are the IncidentSignal, the risk assessment, and the top 10 affected implants. The LLM must return a typed RootCauseHypothesis with an enum type, confidence, and a short evidence list, intentionally bounded. We don’t want it to write an essay, it must pick a structured hypothesis.

Then we generate a containment plan in the next action. Here we also use an LLM, but with constraints. First we decide whether the plan requires approval: we require approval if the risk is high or critical, or the hypothesis looks like an attack pattern. Then we estimate the blast radius in the EstimateRadius method, which produces how many implants were affected, up to five lots, up to five models, a geo summary string, and a time summary string. Then we call the LLM again and force it into a structured ContainmentPlan. It must provide from four to eight steps with short, imperative instructions. As a result, we get a usable plan.

Finally, the last action is to assemble the incident case, and that is also our goal. This method, BuildIncidentCase, is marked with AchievesGoal. It produces the final output that we are looking for: the IncidentCase object. It bundles everything we learned and assembled: ID, timestamp, the original signal, the assessment, affected implants, hypothesis, and the containment plan. So as a result, one user message becomes a complete incident ticket.

The LLM is used for parsing, hypothesis, and planning the containment. Java handles everything else: querying, scoring, and rules, and that’s the whole point. The agent is smart but not uncontrolled.

We can see here that the program has generated an incident case for us. Here we see the initial incident signal parsed. Then goes the incident assessment with the vital information. Then comes the list of affected implants with serial number, lot numbers, model, civilian national ID, and anomaly score. Then comes the hypothesis with the evidence, generated by the LLM based on the data it received and the values we calculated. And then we have the containment plan with steps, also generated by the LLM based on all available information. And finally we have the estimated blast radius with affected implants, geo summary, and time summary.

In the end, we also get information for the developer: the LLM that was used, the number of tokens, and the total cost. I encourage you to try out Embable in your own project and see how it can add some AI spice to your application without the hype. And also don’t forget to like this video, subscribe to our channel, and until next time.

Summary

In this video, we introduce Embable, a JVM-based agent framework that combines LLMs with strongly typed domain models and deterministic business logic. We integrate it into a cyberpunk-themed application to implement a multi-step incident triage workflow driven by goals and actions. The agent uses LLMs for structured parsing, hypothesis generation, and containment planning, while Java handles querying, scoring, and rule-based decisions. As a result, one user message is transformed into a fully structured incident case with assessment, affected implants, root cause hypothesis, and containment plan.

About Catherine

Java developer passionate about Spring Boot. Writer. Developer Advocate at BellSoft

Social Media

Videos
card image
Jan 29, 2026
JDBC Connection Pools in Microservices. Why They Break Down (and What to Do Instead)

In this livestream, Catherine is joined by Rogerio Robetti, the founder of Open J Proxy, to discuss why traditional JDBC connection pools break down when teams migrate to microservices, and what is a more efficient and reliable approach to organizing database access with microservice architecture.

Videos
card image
Jan 27, 2026
Sizing JDBC Connection Pools for Real Production Load

Many production outages start with connection pool exhaustion. Your app waits seconds for connections while queries take milliseconds; yet, most teams run default settings that collapse under load. This video shows how to configure connection pools that survive real production traffic: sizing based on database limits and thread counts, setting timeouts that prevent cascading failures, and implementing an open source database proxy Open J Proxy for centralized connection management with virtual connection handles, client-side load balancing, and slow query segregation. For senior Java developers, DevOps engineers, and architects who need database performance that holds under pressure.

Further watching

Videos
card image
Feb 27, 2026
Spring Developer Roadmap 2026: What You Need to Know

Spring Boot is powerful. But knowing the framework isn’t the same as understanding backend engineering. In this video, I walk through the roadmap I believe matters for a Spring developer in 2026. We start with data. That means real SQL — CTEs, window functions, normalization trade-offs — and understanding what ACID and BASE actually imply for system guarantees. Spring Data JPA is useful, but you still need to know what happens underneath. Then architecture: microservices vs modular monolith, serverless, CQRS, and when HTTP, gRPC, Kafka, or WebSockets make sense. Not as buzzwords — but as design choices with trade-offs. Security and infrastructure follow: OWASP Top 10, AuthN vs AuthZ, encryption in transit and at rest, Docker, Kubernetes, Infrastructure as Code, and observability with Micrometer, OpenTelemetry, and Grafana. This roadmap isn’t about mastering every tool. It’s about knowing what affects reliability in production.

Videos
card image
Feb 12, 2026
Spring Data MongoDB: From Repositories to Aggregations

Spring Data MongoDB breaks down fast once CRUD meets production—real queries, actual data volumes, analytics. What looks simple at first quickly turns into unreadable repository methods, overfetching, and slow queries. In this video, I walk through building a production-style Spring Boot application using Spring Data MongoDB — starting with basic setup and repositories, then moving into indexing, projections, custom queries, and aggregation pipelines. You'll see how MongoDB's document model changes data design compared to SQL, when embedding helps, and when it becomes a liability. We cover where repository method naming stops scaling, how to use @Query safely, when to switch to MongoTemplate, and how to reduce payload size with projections and DTOs. Finally, we implement real MongoDB aggregations to calculate analytics directly in the database and test everything against a real MongoDB instance using Testcontainers. This is not another MongoDB overview. It's a practical guide to actually using Spring Data MongoDB in production without fighting the database.

Videos
card image
Feb 6, 2026
Backend Developer Roadmap 2026: What You Need to Know

Backend complexity keeps growing, and frameworks can't keep up. In 2026, knowing React or Django isn't enough. You need fundamentals that hold up when systems break, traffic spikes, or your architecture gets rewritten for the third time.I've been building production systems for 15 years. This roadmap covers three areas that separate people who know frameworks from people who can actually architect backend systems: data, architecture, and infrastructure. This is about how to think, not what tools to install.