Spring Data MongoDB: From Repositories to Aggregations
Transcript:
Let’s see how we can set up MongoDB with Spring Boot and use its various features, from indexing to aggregations. The demo I will use is available on GitHub, the link is in the description. The article with copy-and-paste code snippets is also in the description.
MongoDB is a document database. Instead of tables, you store data in JSON-like documents. It’s great for scalability and flexible, data-heavy apps like operational intelligence, real-time analytics, IoT, and e-commerce. But we don’t want to leave the Spring programming model, right? So we use Spring Data MongoDB. It combines MongoDB features with Spring-style repositories and auto-configuration.
In this guide, we build NeuroWatch. It’s a cyberpunk-themed application because yes, I’m that old that tracks civilians. They are implanted with cyberware and implant monitoring logs. It also aggregates health indicators over time.
For this tutorial, you will need Spring Boot 4, or at least Spring Boot 3, and JDK 25 (or at least 17, which is the baseline for Spring Boot 3). I’m using Liberica JDK recommended by Spring. Also, you will need an IDE. I’m using IntelliJ IDEA, Docker, and Docker Compose.
Let’s create a new Spring Boot project with Spring Initializr. We will need dependencies on Docker Compose, Spring Data MongoDB, and Testcontainers. Add the EnableMongoRepositories annotation to your main application class.
Now we need MongoDB running. We are going to spin up a container via Docker Compose. You can see the Docker Compose file on the screen. I’m not going to describe it in detail, but you can clone the repo and it’s there. In the application properties, we set up Spring Data MongoDB host, port, and URL. Also, we enable automatic index creation. Then you simply run docker compose up -d, and our database is up and running and ready to receive our data.
In SQL, you design a schema first, then map relationships with foreign keys and joins. MongoDB stores entities in JSON-like documents. You can nest arrays and subdocuments in the documents without first creating a rigid schema. Accompanying information is stored together in one document and can be indexed for rapid access. There’s no need to predetermine the database schema. MongoDB does that for us. But if you want to enable DB migrations like with Flyway or Liquibase, MongoDB also supports it with Mongock.
In NeuroWatch, we use two collections: one for civilians and another for implant monitoring logs. Implants themselves are stored within the civilian document, so we embed the implant class, but it’s not a separate collection.
To define some class as the MongoDB document, you annotate it with the Document annotation. Also, it’s essential to add the Id annotation for the primary key. In MongoDB, ID should have the type String, ObjectId, or UUID. If you want a custom field name, you can use the Field annotation.
Now, as I said, the implant class is not a MongoDB document and it’s embedded into the civilian document. But of course, you can embed documents in other documents. For instance, if you wanted to make the implant a separate collection for some reason, we would annotate it with Document and then add the list of implants to this civilian document. As usual, there’s another approach: instead of embedding the document, you can reference it with the DBRef annotation. This annotation will be eagerly resolved. Also, you can use the DocumentReference annotation instead of DBRef. It helps you to define more flexibly which fields of the referenced document should be loaded.
But be careful with embedding documents or referencing them. Updating deeply nested documents requires rewriting the whole document. Plus, eagerly fetched documents with DBRef can affect the application performance. On the other hand, lazy loading with DBRef can complicate debugging.
Okay, now let’s look at some more advanced annotations. Let’s start with indexing. MongoDB indexes are data structures that speed up querying, they are like book indices. Indexes are created for frequently queried fields such as national ID and civilian. MongoDB supports compound, hashed, geospatial, and text indexes. Compound indexes use criteria on multiple fields and are defined at the class level. For example, in implant monitoring log, it builds the index over implant serial number ascending and timestamp descending.
Geospatial indexing is where it gets spicy. These indexes help you to find documents within a given distance. For example, we can add the Point field to the implant monitoring log and use the GeospatialIndexed annotation with Geo2dsphere. Point is a class from Spring Data Geo, it represents longitude and latitude. Geospatial index annotation creates an index of type 2dsphere. It enforces the usage of the nearSphere operator when fetching the data. This operator uses the Earth’s curvature to calculate the distance.
MongoDB also provides auditing support with CreatedDate and LastModifiedDate annotations.
Okay, let’s move on to repositories. Spring Data MongoDB gives you a repository. It supports CRUD, paging and sorting, and query by example. When you create repository interfaces for civilian and implant monitoring log, Spring generates basic CRUD methods automatically.
In addition to default CRUD methods, we can perform query by convention. It means that you name the methods in the pattern like findByNationalId, and Spring turns that into MongoDB queries. Same idea for logs: we can find logs by serial number, by serial number and a timestamp after a date, or between two dates. Let’s look at the geospatial query method example. The Near keyword in the method enables you to fetch all implant monitoring logs within the specified distance and point. Of course, this method is for demonstration purposes only. There could be thousands of logs, and it could seriously affect the performance if you fetch all of them. So you should limit the number of fetched logs, like adding the time window or any other filter.
CRUD is pretty straightforward and everything that we are used to. There’s only one note on saving the entities. For saving, MongoDB uses save and insert. Save inserts or updates depending on whether an ID exists. Insert only creates a new document and fails if it already exists, so it protects you from accidental overrides.
Okay. Sometimes the method naming approach is not enough. You may want to query array elements, create complex filters, or avoid unreadable method names. Option one is to use the Query annotation. For instance, we want to find all civilians with implants whose lot number is greater than or equal to n, and here’s how to do that with a Query annotation: we use the elemMatch and gte operators. It roughly translates to “find an element that matches” and “greater than or equal to”.
Option two is to use the template. It is an abstraction layer that helps you to create queries programmatically using Query and Criteria objects. For that, you need a custom repository interface and its implementation with the Impl suffix. Then you build a criteria like “implant.lotNumber greater or equal” and run MongoTemplate. You can also write type-safe queries, for example with a fluent wrapper.
MongoDB projections mean retrieving only the specific fields. It reduces network traffic and helps to prevent accidental data exposure. For example, we want to query all civilians but return only legal name and national ID, and the rest of the fields become null in the mapped objects. You can do that with Query. A more sophisticated approach is to use interface-based projections. In this case, you specify the interface with all the fields that you need, and Spring will automatically map the query results to this interface. Then you simply define a method in the civilian repository and specify CivilianSummary, which is our projection, as the return type.
Note that interface-based projections are not the same as DTOs. DTOs are classes that you write manually with the fields, constructors, and maybe even custom logic. Interface-based projections are interfaces with getter methods. But we can combine projections and DTOs.
Suppose we want to gather statistics on implant performance. For that, we must calculate the average indicators for power usage, CPU usage, and neural latency over a given period of time. For that, we can create a DTO record MonitoringStats for aggregated averages. We can now use it to hold the data we fetched and calculated.
But to perform such a task, we need to master MongoDB aggregations. Aggregations are MongoDB’s way to run calculations on the database side to filter, group, or transform data. This reduces data transfer and client-side computations. To use aggregations, we need to create the aggregation pipeline. An aggregation pipeline is a series of operations performed on the data. You can create aggregation pipelines declaratively in the repository using Aggregation or Query, or programmatically using template and criteria.
So once again, what is our use case? We need to filter implant monitoring logs by serial number and time window. Then we need to calculate the average of certain fields. Then we need to create a projection that contains only four fields: implant serial number and calculated metrics. And then we must return the MonitoringStats object from the database.
First, let’s prepare the grounds. We need to create the implant monitoring log repository custom interface with a custom method aggregateStats. The implant monitoring log repository will extend the new interface, and then we follow the familiar logic. We create the implant monitoring log repository custom Impl class, and there we inject the template bean.
Now it’s time to actually write the aggregateStats method. For filtering the logs, we need the MatchOperation class that will hold the given criteria. To calculate the average, we need the GroupOperation class. Here we group logs by serial number, calculate the average for each group, and give a new name to each resulting metric. Next, we need the ProjectionOperation class. It renames the id field to implant serial number and rounds each average metric to two decimals. Finally, we combine all three stages into a single aggregation pipeline and feed it to the template. The template executes this pipeline against the implant logs collection and returns the MonitoringStats object.
It’s testing time. We will use the DataMongoTest annotation for data layer testing. It loads specific beans such as repositories and documents, but not the whole application context. By default, tests annotated with DataMongoTest use an embedded database, but we will override this behavior and use a Dockerized MongoDB database. We will spin it up with the help of Testcontainers. We have already added the Testcontainers support into the dependencies, and now we also need to add the dependency on JUnit.
In the test class, we use the Testcontainers annotation. We create a MongoDB container and use the ServiceConnection annotation so Spring can wire beans to the container. For isolation, we’ll use a method annotated with BeforeEach to create the test data and also a method AfterEach to drop the collection. In this case, we added the data manually. Another approach is to create a JSON file with all required data and add the Jackson-to-repository populator bean to the config file. This bean will populate the database with the data we provided when the container starts. And finally, we can use the familiar flow to write the tests.
That was a quick introduction to using Spring Data MongoDB. If you found this video useful, don’t forget to like it. Subscribe to our channel for more videos like this. And until next time.





