Build RAG System with Spring AI: No More AI Lies

Transcript:

Ever asked ChatGPT something and it confidently made stuff up? Yeah, it happened a lot to me recently. I bet it happened to you too. The thing is, it's not the only problem. I have to know a lot about our products. But sometimes it's almost impossible to find information because we have so much documentation, so many blog posts, and so many YouTube videos. It's hardly possible to process them all.

Hi, my name is Pasha and I'm a developer advocate for BellSoft, and today we will fight LLM Hallucinations in your product. But first, let's talk about several fundamentals.

First, vectors. This is the base for all the LLM magic. A vector is just a set of numbers. You can think of it as an array of doubles, for example, or as a simple metaphor. Imagine you need to distinguish New York from Tokyo. They have coordinates. In a way, these coordinates are vectors. LLMs operate on multi-dimensional vectors. For New York and Tokyo, there are only two dimensions: longitude and latitude. Vectors for our LLMs are much longer, usually 1,861 dimensions, for example.

The second thing you should know about is a vector database. What's a vector database? Essentially, it's a specialized storage suited specifically to handle vectors. They can be usual databases, for example a Postgres database. It's called PGVector. There are also different kinds of vector databases, for example Milvus or Chroma. And of course, Oracle has support for vectors too.

What does it mean to handle vectors? It means that this database can perform operations on vectors very quickly. One of the most important operations on vectors is called finding distance. One of the most popular ways to find distance between vectors is cosine distance.

Now, here is a small demonstration of what cosine distance is. Cosine distance is the product of two vectors divided by the product of their magnitudes. Now imagine we have three objects: rose, sunflower, and sun. Sunflower is visually closer to rose than to sun. But how does it work for vectors? As you can see, there are three vectors and they have angles between them. The angle between rose and sunflower is smaller than the angle between sunflower and sun, and the cosine distance is also smaller. A smaller cosine distance means more similar meaning.

Now you are probably asking me, Pasha, why are you telling all of this? Why do we care about vectors? Why do we care about cosine distance? Why do we care about databases? The answer is that this is part of the solution to our problem. This is part of the solution to LLM Hallucinations.

If you think about it, browsing through our documents is almost impossible for us, but it's very simple for machines. There is a very simple approach to solve this problem. It's called RAG — retrieval augmented generation.

The workflow looks like this. We take a huge document, or small documents if we have many of them. Documents are usually text, but they might also be images and so on. Then we split every document into chunks. We take every chunk and convert it into a vector. How do we do it? There are special neural networks that can do it for us. For example, OpenAI has one, and there are many others.

When we have a vector, we save this vector to our vector database along with the source text, because a vector is just a set of numbers. It's impossible to decode a vector back into text without knowing what was encoded. We have to preserve the source.

Then when I want to find something in my documents, I don't have to read through them anymore. What I need is to ask a question to the LLM. The LLM gives me a vector for my question. Then I find the closest documents to my question in my vector database. Then I put these documents into the context, and I ask the LLM: given all the information I just provided, please answer my question.

Now the LLM does not need to hallucinate anymore. It knows the exact answer if it is in the documents. If there are no documents, we can say that. If it is not sure, it can answer: “I'm not sure. I don't have enough information.”

Now let's talk about how to implement this whole system in code. Obviously, we'll use Spring.

The first part is uploading our documents to a vector database. As a reminder, we split documents into chunks, convert every chunk into a vector, and save it to the database. Here's how we do it in Spring.

The simplest way is to have one endpoint. In my case, it's a PostMapping with a path, and we send a multipart file. Then I use a document reader, which is a part of the Spring AI suite, to convert the file into a list of documents. In this case, every document is a chunk that will be converted into a vector. Then I call vectorStore add docs.

All the magic is hidden. Spring sends every document to the neural network to convert it into a vector, and then behind the scenes it stores the vector together with the source text in the vector store. Then we respond that everything is done.

To make our bot answer user questions, we add another endpoint, for example a GetMapping, and pass the question text. Then we create a search query with topK build. This part means: please find the top three documents that are closest to our query. This uses cosine distance.

We execute the query on the vector store, and everything is abstracted. It doesn't matter whether it's PGVector, Milvus, or Chroma. We get the documents and convert each document into a specific format. In my case, it's the document title, the document text, and the file name from metadata. I split the documents with a delimiter to make sure the LLM does not mix them up.

Now I provide system instructions. It does not have to be this complex, but this is what I use. You are a helpful assistant. You find information in the data provided below. The format of the data is the title and the text of the document or its part. Documents are separated by a delimiter. If you reference a document, reference it by name.

This way we are sure that the LLM does not hallucinate. We can also find the document by name in storage and read it fully if needed. Always reference the document where you found the information. If there is no information, answer: “Sorry, I don't have such information.”

Then I insert the documents into the prompt. I add the user's question, create a prompt, and call chatClient prompt call. This returns a string that I send back to the user.

This is the easiest and simplest possible RAG workflow. It does not support chatting or advanced features. It is intentionally simple. But now you have all the basic information to know where to dig.

This is your Spring AI. It will not magically make your application smarter than you, but it will make your data useful and help save you from Hallucinations.

If you like the video, like it, subscribe to the channel, leave a comment, and who knows, maybe I'll create the next one. Thank you so much. Pash out.

Build RAG System with Spring AI: No More AI Lies

Transcript:

Summary

Tags

Apr 2, 2026

Java Memory Options You Need in Production

Mar 26, 2026

Java Developer Roadmap 2026: From Basics to Production

Further watching

Apr 30, 2026

Java Flight Recorder Tutorial: How to Profile Java Applications

Apr 22, 2026

Dynamic SQL Queries with Spring Data JPA in 6 Minutes

Apr 8, 2026

Best Oracle Java Alternatives in 2026 Comparison of OpenJDK Distributions

Build RAG System with Spring AI: No More AI Lies

Transcript:

Summary

Social Media

Tags

Apr 2, 2026

Java Memory Options You Need in Production

Mar 26, 2026

Java Developer Roadmap 2026: From Basics to Production

Further watching

Apr 30, 2026

Java Flight Recorder Tutorial: How to Profile Java Applications

Apr 22, 2026

Dynamic SQL Queries with Spring Data JPA in 6 Minutes

Apr 8, 2026

Best Oracle Java Alternatives in 2026 Comparison of OpenJDK Distributions